Article

Browser Based Voice Recognition With Pocketsphinx.js

My latest GreenZeta Original, HeyHilri! The World’s First Politically Inspired Digital Assistant, explores Javascript based voice recognition in the browser. When I started this project my first inclination was to use a web service. Finding a reliable service, that was affordable, was harder than I expected. Some searching led me to the project Pocketsphinx.js.

Pocketsphinx is a mobile port of the full featured CMU Sphinx voice recognition engine. Pocketsphinx.js is a Javascript port of its namesake. Sphinx is keyword based, it won’t magically convert all audio to words. You have to program it with a set of specific words or phrases you’re looking for. Even though its feature set is scaled for mobile, it comes with a wide range of options. The JavaScript comes pre-compiled for English recognition with instructions on adding other languages.

For basic use, Pocketsphinx.js is easy to set up. The included demo has everything you need to capture audio from the browser and parse it for keywords. Pocketsphinx comes with an overwhelming set of options. I found myself glossing over most of them and copying the defaults from the demo.

The most important configuration is setting up your dictionary. The dictionary tells Pocketsphinx what sound combinations represent each keyword you’re looking for. Sphinx uses the CMU Pronouncing Dictionary, which uses alphabetical codes to represent 39 sounds your mouth makes to produce words. For example: “EY” represents a hard A sound and “T” represents a T sound, so “EY T” represents the sounds which make up the word “ate”. The dictionary web site has a search engine to look up representations for words, or you can write your own. There is lots of room for conflicts in homonyms. The previous example EY T could mean “eight” as well as “ate”. You need to be wary of this in your programming. It’s not perfect either, at one point I had to remove the word “right” from my dictionary because 90% of the time is was recognized as “flight”.

As a pure Javascript in-browser voice recognition solution, Pockectsphinx.js works really well. It’s far from Google Now or Siri in terms of natural language interaction. For simple keyword recognition, it’s a great solution. As I said, it comes with a lot of configuration options. With a little work, you can tweak the accuracy far beyond the default settings that I used. Check out the Pocketsphinx.js website for more information on getting started. Don’t forget to try out HeyHilri! for some voice interactive political humor.