Kitz Forum

Computers & Hardware => Other Technologies & Hardware => Topic started by: JGO on October 22, 2015, 03:36:23 PM

Title: Voice to Text software
Post by: JGO on October 22, 2015, 03:36:23 PM: Has anyone any practical advice on this subject please ?
Title: Re: Voice to Text software
Post by: loonylion on October 22, 2015, 05:14:32 PM: Dragon naturally speaking by nuance is widely regarded to be the best, and I have used it a fair bit. Catches are it does need to be used in a relatively quiet environment, and does require training, but once its trained up and you're using a decent noise cancelling microphone, its pretty accurate.
Title: Re: Voice to Text software
Post by: AArdvark on October 22, 2015, 08:13:18 PM: Ditto.

FYI the training is not too painful, just reading some set texts etc.
The most important is a good Noise-Cancelling Microphone and learning to speak clearly and consistently with the same diction.
If you train it with your 'Telephone voice' it will not work very well when you use it after a long day or a few pints of 'Best'. ;D ;D

I have used Dragon software a few year ago and it worked well generally but really showed how your voice changes during the day for many reasons, which you do not notice.
(Colds/Illness, Tiredness, Emotional, In a hurry, who you are speaking to, etc etc)
All the changes impact the quality of the Voice to Text conversion and the accuracy, at the end of the day.
I could train the software to get very good results but the next day the accuracy would be greatly variable.
If you talk in a consistent way at all times with little accent or variation of speech tonality/speed it will work quite well.
I discovered I had to adopt a 'Talk to Computer' voice to try to get more consistent results. ;D :D
I had to make my vocabulary more regular and my language constructs simpler to allow the software to use context better.
I had to stop myself from talking faster/slower and try to adopt a regular cadence to my speech rhythm.
You end up getting very aware of how your voice actually sounds, rather than what you hear in your head.

(In your head a 'Great Orator commanding the room' ...... in reality 'Donald Duck's cousin with head in Plastic bucket' ;D :D :D )

This was too much of a faff, after a while, so I stopped using it.
I would expect it should have improved but that is only conjecture on my part. ;D
Title: Re: Voice to Text software
Post by: JGO on October 23, 2015, 06:47:26 AM: Thank you both - useful !
Title: Re: Voice to Text software
Post by: loonylion on October 23, 2015, 01:41:22 PM: I seem to remember being told during the training to speak like a TV/radio newsreader
Title: Re: Voice to Text software
Post by: guest on October 23, 2015, 02:41:42 PM: I find it somewhat amazing that 21 years ago I could use OS/2 v4 (Warp) to both control the desktop and "type" into PM apps without things slowing to a crawl. This was on a 100MHz Pentium IIRC.

Dragon Dictate (via Lernout & Hauspie) is an offshoot of that. I remember having high hopes of it on Windows95 to find it slowed the UI to an unmanageable extent on the same hardware which worked with OS/2 v4.

I really don't see an awful lot of progress in the last 20+ years in terms of making it more usable. Same issues occur now as then....
Title: Re: Voice to Text software
Post by: AArdvark on October 23, 2015, 06:21:32 PM: Quote from: JGO on October 23, 2015, 06:47:26 AM
Thank you both - useful !
If possible get a trial copy to play with.
It really does depend on your voice and your patience. ;D

I acknowledge that I have an accent and my diction can be variable due to health factors etc.
If I spoke like a BBC Newsreader of old, it would work excellently all the time. ;D ;D

Try it and see how it goes.
Also if your use is a set subset of commands and limited vocabulary, it will be better as you can keep training the system to get better at recognising your voice and language you use.
Title: Re: Voice to Text software
Post by: loonylion on October 24, 2015, 02:26:27 PM: Quote from: rizla on October 23, 2015, 02:41:42 PM
I find it somewhat amazing that 21 years ago I could use OS/2 v4 (Warp) to both control the desktop and "type" into PM apps without things slowing to a crawl. This was on a 100MHz Pentium IIRC.

Dragon Dictate (via Lernout & Hauspie) is an offshoot of that. I remember having high hopes of it on Windows95 to find it slowed the UI to an unmanageable extent on the same hardware which worked with OS/2 v4.

I really don't see an awful lot of progress in the last 20+ years in terms of making it more usable. Same issues occur now as then....

I ran naturally speaking on a 400mhz pII equivalent with no issues (win98). Of course I had to overclock the cpu to install it (minimum spec was 500mhz and it actually checked) but once it was installed I dropped the clocks back to 400mhz and it ran fine.
Title: Re: Voice to Text software
Post by: NewtronStar on October 31, 2015, 10:10:49 PM: Have an app on Android tablet called talk it converts voice to text and also can convert the voice and text to other languages.

Its a bit hit & miss put the tablet close to TV on the BBC NEWS 24 channel and it gets around 72% voice to text correct.
Title: Re: Voice to Text software
Post by: sevenlayermuddle on October 31, 2015, 11:19:40 PM: I have recently become owner of on of the 4th Generation Apple TVs, with voice activated 'Siri' remote. It is the first device I have encountered where the voice recognition is more usually useful than useless, in fact it is nearly always spot on. And pretty much instantaneous too. :)

I understand one reason it is so good is there is a second microphone, facing away from the speaker, thus enabling noise-cancelling to get rid of ambient sounds. This really could be a turning point IMHO for a technology that has gradually evolved from little more than a party trick a decade ago.

One thing to be aware of, most of the (good) systems rely on crowd-sourced samples, so everything you say to the microphone may be uploaded to a server somewhere and stored there. Apple make that very clear, other vendors may be less up-front. If it worries you, don't enable it

My own compromise with Apple devices is to disable 'Hey Siri' so that recordings will only be uploaded when I knowingly press the microphone icon.
Title: Re: Voice to Text software
Post by: loonylion on October 31, 2015, 11:22:50 PM: Quote from: sevenlayermuddle on October 31, 2015, 11:19:40 PM
I understand one reason it is so good is there is a second microphone, facing away from the speaker, thus enabling noise-cancelling to get rid of ambient sounds. This really could be a turning point IMHO for a technology that has gradually evolved from little more than a party trick a decade ago.

My phone has 4 microphones, probably for a similar reason, though I don't use voice commands. It in theory improves the quality of phone calls too.
Title: Re: Voice to Text software
Post by: sevenlayermuddle on November 01, 2015, 12:10:24 AM: Quote from: loonylion on October 31, 2015, 11:22:50 PM
My phone has 4 microphones, probably for a similar reason, though I don't use voice commands. It in theory improves the quality of phone calls too.

That is interesting, I wasn't aware that multiple microphones were 'old hat'. :-[

It does however beg the question... What is that enables Apple's Siri to work so well, no training or separate mic's required, when the rest of the industry remains so useless and feeble?
Title: Re: Voice to Text software
Post by: Weaver on November 01, 2015, 01:17:10 AM: @sevenlayermuddle - Has Apple got Nuance on the case, including bits of the ruins of Lernout & Hauspie, who produced some powerful software? Working on speech recognition solidly for 25+ years (less downtime due to the L&H collapse) should get you somewhere. Their software I suspect may be good because of sheer hard graft put into it, and it now finally has the hardware it needs with speeds that were hardly dreamed of in 1995.
Title: Re: Voice to Text software
Post by: sevenlayermuddle on November 01, 2015, 09:13:54 AM: Quote from: Weaver on November 01, 2015, 01:17:10 AM
@sevenlayermuddle - Has Apple got Nuance on the case, including bits of the ruins of Lernout & Hauspie, who produced some powerful software? Working on speech recognition solidly for 25+ years (less downtime due to the L&H collapse) should get you somewhere. Their software I suspect may be good because of sheer hard graft put into it, and it now finally has the hardware it needs with speeds that were hardly dreamed of in 1995.

Not sure if they've ever publicly stated who's technology they use but most people, including a credible-looking Wikipedia page I've just looked at, do seem to suggest Nuance.

But having slept on these thoughts I suspect I was actually missing a glaring advantage with the ATV, which is that its 'successful' recognitions will be based on a much smaller target dictionary, e.g. film names, actor/actress names, 'skip forward 4 minutes' and the likes. It does work, it works incredibly well, but I should imagine it is also much easier. :-[

That said, I find Siri on the phone and watch useful too. Especially for texting, where you can generally assume the recipient will be more forgiving of errors and bad punctuation, the watch does pretty well. For place names it is a different story, especially where spelling is vastly different from pronunciation. ::)
Title: Re: Voice to Text software
Post by: loonylion on November 09, 2015, 03:47:35 AM: Quote from: sevenlayermuddle on November 01, 2015, 12:10:24 AM
Quote from: loonylion on October 31, 2015, 11:22:50 PM
My phone has 4 microphones, probably for a similar reason, though I don't use voice commands. It in theory improves the quality of phone calls too.

That is interesting, I wasn't aware that multiple microphones were 'old hat'. :-[

wouldn't necessarily say old hat, my phone is a Oneplus One, which has only been out a couple of years iirc

accuracy is down the the speech recog engine (probably nuance), the dictionary and the training it's been given.
Title: Re: Voice to Text software
Post by: AArdvark on November 09, 2015, 12:32:02 PM: Quote from: sevenlayermuddle on November 01, 2015, 12:10:24 AM
Quote from: loonylion on October 31, 2015, 11:22:50 PM
My phone has 4 microphones, probably for a similar reason, though I don't use voice commands. It in theory improves the quality of phone calls too.

That is interesting, I wasn't aware that multiple microphones were 'old hat'. :-[

It does however beg the question... What is that enables Apple's Siri to work so well, no training or separate mic's required, when the rest of the industry remains so useless and feeble?

It is not just better hardware, although a cleaner input helps.
The reason it works so well is that it is using huge and complex model(s) that have been trained on millions of peoples voices/inputs and is constantly being improved as it is used.
It is the same idea that MS is using with their Voice Recognition via Cortana.

It is also the reason why Banking Voice Recognition works well because the model is big and the dictionary is small and specific to the needs of the bank.
(Also likely to be running on the mainframe and/or be dedicated hardware.)

I would bet that there is a bit of custom hardware in there somewhere to speed up the searching and matching at Apple also.
(I am sure Apple could 'throw' something together costing 'loose change' ;D )
Don't forget this is done in multiple languages as well, so it will be some big backend.

Maybe on second thoughts, it is better hardware ...... but at both ends. ;D ;D
Title: Re: Voice to Text software
Post by: sevenlayermuddle on November 09, 2015, 02:02:16 PM: With hindsight I regret being so critical of other (than Apple) systems as in all honesty my experiences are out of date. :-[

Sticking with Apple, and despite the fact I'm beginning to sound like a one-man advertising campaign, I wonder whether one reason it is becoming really quite good is that they are turning out devices that strongly encourage real-world use of the technology?

People worry, of course, over the privacy issues of voice data being uploaded and stored on servers. But the upside of that is, I'd imagine, that the more the more the system is used, the better it will get? If so, then providing a system where voice recognition actually fills a need, with strong incentives to use it, ought to make for a better system. :-\

The Apple Watch wouldn't be a great deal of use without Siri IMHO, and so it does get used. And with the new TV, quite apart from searching for films etc by name, justifies itself from the simple ability to say things like 'rewind 25 seconds' during film playback. Even over the *background sound from the film's soundtrack (and I have a very beefy sound system), it works very reliably indeed and quickly becomes second nature for playback control, so it actually gets used too.

edit... * fair to admit, it does seem to reduce the playback volume somewhat during speech input. Still impressive though.
Title: Re: Voice to Text software
Post by: AArdvark on December 09, 2015, 03:31:31 AM: From yet more 'random reading' on the InterWebs!! ;D

Here is some 'Technical' info covering the latest sort of software\methods used for Voice recognition etc (Probably very similar at MS, Google & Apple)

http://googleresearch.blogspot.co.uk/2015/08/the-neural-networks-behind-google-voice.html (http://googleresearch.blogspot.co.uk/2015/08/the-neural-networks-behind-google-voice.html)

Deep Learning in Neural Networks: An Overview by Jürgen Schmidhuber
[From http://people.idsia.ch/~juergen/deep-learning-overview.html (http://people.idsia.ch/~juergen/deep-learning-overview.html) ]
(Paper here --> http://www.idsia.ch/~juergen/DeepLearning8Oct2014.pdf (http://www.idsia.ch/~juergen/DeepLearning8Oct2014.pdf))

Quote
Abstract. In recent years, deep artificial neural networks (including recurrent ones) have won numerous contests in pattern recognition and machine learning. This historical survey compactly summarises relevant work, much of it from the previous millennium. Shallow and deep learners are distinguished by the depth of their credit assignment paths, which are chains of possibly learnable, causal links between actions and effects. I review deep supervised learning (also recapitulating the history of backpropagation), unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

Enough references to keep anyone occupied for the next 6 months at least. :D :D