Implementing Linux Voice Recognition
Implementing Voice Recognition on Linux may sound like quite a daunting task, and yes, there are many difficulties, and likelyhood of achieving 100% success of translation is almost impossible to achieve. However, achieving even partial success of translating an audio file to text, can be a very useful tool for any business.
Business Uses for linux voice recognition
I came across this whole idea a week or so ago. My goal was to translate the audio from recordings which would then enable me to search a database of translated audio for specific keywords.
Business uses could include :
– Searching for mentions of required text
– Analysis of call centre staff ensuring compliance with their call scripts
– Raising alarm bells from customers who may be abusive to staff
The list is actually quite long, those are just a few that come to me now.
As I said already the issues of 100% success include clarity of the audio, different peoples accents, words that the translation tool cannot find in its dictionary.
Anyway accepting all of the above potential problems, its still fun to try! So here is my method to install pocketsphinx onto CentOS 6 for you to experiment.
Install voice recognition on Linux using pocketsphinx on CentOS 6
Just so you know, I installed this onto an almost ‘clean’ fresh minimal install of CentOS 6 with just the base centos repositories configured as you can see below :
[root@localhost tmp]# yum repolist
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: centos.mirror.srv.co.ge
* extras: centos.mirror.srv.co.ge
* updates: centos.mirror.srv.co.ge
base | 3.7 kB 00:00
extras | 3.4 kB 00:00
extras/primary_db | 33 kB 00:00
updates | 3.4 kB 00:00
repo id repo name status
base CentOS-6 - Base 6,575
extras CentOS-6 - Extras 45
updates CentOS-6 - Updates 652
repolist: 7,272
[root@localhost tmp]#
The program I am installing is called POCKETSPHINX. They have a website I will list later in this post, but for now lets just get it installed and working !!
We need the base package as well as the pocketsphinx addon. I generally put these into /usr/local/src as an area where they can be safely compiled. So download and extract the packages as follows :
wget -O pocketsphinx-5prealpha.tar.gz "http://downloads.sourceforge.net/project/cmusphinx/pocketsphinx/5prealpha/pocketsphinx-5prealpha.tar.gz?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fcmusphinx%2Ffiles%2Fpocketsphinx%2F5prealpha%2F&ts=1447353369&use_mirror=netix"
tar -zxvf pocketsphinx-5prealpha.tar.gz
wget -O sphinxbase-5prealpha.tar.gz "http://downloads.sourceforge.net/project/cmusphinx/sphinxbase/5prealpha/sphinxbase-5prealpha.tar.gz?r=http%3A%2F%2Fsourceforge.net%2Fprojects%2Fcmusphinx%2Ffiles%2Fsphinxbase%2F5prealpha%2F&ts=1447353479&use_mirror=netassist"
tar -zxvf sphinxbase-5prealpha.tar.gz
mv sphinxbase-5prealpha sphinxbase
If the above download links dont work then simply get the latest from the CMUSphinx download page HERE
I then found out that in order to first install sphinxbase it has a couple of dependencies including Python. To install those do this :
yum install bison python-devel.x86_64 pcre-devel.x86_64
The other issue I had was it also requires a version of SWIG which is newer than the one in the standard Centos repositories. To get that I downloaded and compiled the latest version of swig (3.0.7) from their sourceforge page :
wget "http://prdownloads.sourceforge.net/swig/swig-3.0.7.tar.gz"
tar -zxvf swig-3.0.7.tar.gz
cd swig-3.0.7
./configure
make
make install
Now that all the dependencies for sphinxbase were installed I compiled it like this :
cd /usr/local/src/sphinxbase
./configure --enable-fixed --without-lapack
make
make install
In the README for pocketsphinx it says that in order to compile it it was required to have the sphinxbase code within the pocketsphinx src directory. So I copied it like this :
cd /usr/local/src/pocketsphinx-5prealpha
cp -r ../sphinxbase .
And then compiled it like this :
./configure
make clean all
make check
make install
During the ‘make check’ it performs a number of tests of which I had 1 error you can see below :
PASS: test_ps_init
PASS: test_ps_reinit
PASS: test_ps_fwdtree
PASS: test_ps_fwdtree_fwdflat
PASS: test_ps_fwdflat
PASS: test_ps_fwdflat_bestpath
PASS: test_ps_fwdtree_bestpath
FAIL: test_ps_simple
PASS: test_ps_nbest
PASS: test_ps_lattice
PASS: test_ps_set_search
PASS: test_acmod
PASS: test_acmod_grow
PASS: test_fwdtree
PASS: test_fwdflat
PASS: test_fwdtree_fwdflat
PASS: test_fwdtree_bestpath
PASS: test_fwdtree_nbest
PASS: test_pl_fwdtree
PASS: test_ptm_mgau
PASS: test_posterior
PASS: test_fsg
PASS: test_fsg2
PASS: test_fsg3
PASS: test_jsgf
PASS: test_lm_read
PASS: test_dict
PASS: test_dict2pid
PASS: test_senfh
PASS: test_alignment
PASS: test_state_align
PASS: test_mllr
make[5]: Entering directory `/usr/local/src/pocketsphinx-5prealpha/test/unit'
make[5]: Nothing to be done for `all'.
make[5]: Leaving directory `/usr/local/src/pocketsphinx-5prealpha/test/unit'
============================================================================
Testsuite summary for pocketsphinx 5prealpha
============================================================================
# TOTAL: 32
# PASS: 31
# SKIP: 0
# XFAIL: 0
# FAIL: 1
# XPASS: 0
# ERROR: 0
============================================================================
See test/unit/test-suite.log
I spoke to one of the developers at CMUSphinx on their irc channel and he said it was not a problem and to contimue with the ‘make install’ which all worked fine.
You should now be good to do a first test!
Testing Linux Voice Recognition using pocketsphinx
Firstly what you need is an audio file. The audio file should be clear text as much as possible (it does not like too much background noise or music), therefore a recording from a TV news channel is a good place to start.
You need to convert the audio into a format pocketsphinx can read (WAV 16kHz 16-bit mono) which you can do using the media manipulator program FFMPEG (to install ffmpeg read my other tutorial HERE). Here is the command to convert your file :
ffmpeg -i "BBC One BBC News at Six (16 Avril 2015)-9CvYHM_V8Xg.mp4" -acodec pcm_s16le -ac 1 -ar 16000 out.wav
pocketsphinx_continuous was one of the programs installed into /usr/local/bin. It does have a man page, however it is not complete and doesnt say you can make it read from an input file, however you can like this (but it produces a HUGE amount of output to the screen) :
pocketsphinx_continuous -infile "/tmp/out.wav"
What you can see from that output is that it reads the input file in chunks which it then translates. That maybe of use to you, but I just wanted the text. So, to minimise the screen output, I ran it like this :
pocketsphinx_continuous -infile "/tmp/out.wav" -logfn /dev/null
Thats it, you have a translated blob of text from an audio file! As I said earlier here is the link to the CMUSphinx sourceforge page : HERE
The site has a lot of documentation and examples of other use cases so check it out!
As always if you liked this tutorial then please share on Facebook and Twitter, and check out my other Linux tutorials HERE