Voice recognition

venablc · Sep 26, 2006

I am developing a voice recognition application which identifies certain key words from a WAV file and updates a database.

I am currently using the default SR engine which is ok as although the words from the WAV files are not correctly interpreted they are consistant.

The WAV files i am using have been recorded from an automated system so the voice and phrases used are consistent.

Does anyone know how to use a custom SR engine which can be trained to recognise the limited vocabulary from these WAV file correctly?

cjard · Sep 27, 2006

I wonder if Dragon Naturally Speaking has an API you can use - it's rated one of the best speech recog engines, but I dont think it would be cheap!

venablc · Sep 27, 2006

I have seen a similar app which uses the available dragon naturally speaking API but as you say it is not cheap and this app is being developed on a budget!!

SDK v5.1 provides a UI which can be used to train the recognition engine by prompting the user to read sentences through a microphone.

I was hoping there was a way to bypass this UI and train the engine by inputting snippets from the WAV files and defining what they should be reognised as, as part of the application install

Megalith · Sep 28, 2006

i can remember working on a speech recognition program way back in the 80's ( for the BBC micro )

you have several advantages with this, notably the consistancy of the input and the limited vocabulary.

The simplest way to do this would be to create a simple array of all the words with say 50 samples taken at a very low sample rate say one sample in every 50ms in the wav data this should be adequate to differentiate between the presumably limited and consistant vocabulary. it is then simply a case of sampling the input stream at the same rate (1 in every 50ms) comparing the sample against the contents of this array and returning any match ( use a factor of 95% to determine a match) then move to the next element in the data stream resampling and repeat.

you might be able to reduce it down to as low as 10 samples every 200 ms or lower, the more you reduce it the faster the code will run so some playing with these settings would be in order.

Voice recognition

venablc

Member

cjard

Well-known member

venablc

Member

Megalith

Well-known member

Similar threads

Share this page

Latest posts