How To Build A Speech Recognition Bot With Python

Even if you know nothing about speech recognition

---

You may have realized something now.

The overwhelming success of speech-enabled products like Amazon Alexa has proven that some degree of speech support will be an essential aspect of household technology for the foreseeable future.

You can already do almost anything with voice controls. Just say, "I need you to write my speech," and Alexa will find you an effective tool to create a well-structured and persuasive speech without any added stress or anxiety often associated with writing a speech.

In other words, speech-enabled products would be a game changer as that offer a level of interactivity and accessibility that few technologies can match.

No GUI needed.

No texting needed.

No Emojis needed.

It’s all about *speed*.

Speed is a big reason voice is poised to become the next major user interface. Each decade, we’ve embraced a new way to interact with technology. We’ve evolved from character mode to a graphical user interface, to the web, to mobile.

Voice now offers a faster and easier way to communicate and accomplish tasks than mobile apps.

We can either tell Alexa what we need (turn off the lights, adjust the thermostat, and set an alarm — or all of the above using a single utterance like “Alexa, good night”), or you can pull your phone, unlock it, open the right app, and perform the task or tasks.

When you consider habitual use cases — those that keep customers coming back over time — the efficiency gained through voice adds up over time.

“Texting will decline in the future because of Alexa”
— Gary Vaynerchuk

Gary Vaynerchuk: Voice Lets Us Say More Faster

Therefore, that made me very interested in embarking on a new project to build a simple speech recognition with Python.

And of course, I won’t build the code from scratch as that would require massive training data and computing resources to make the speech recognition model accurate in a decent manner.

Instead, I used Google Speech Recognition API to perform the speech-to-text tasks with Python (check out the demo below which I showed you how the speech recognition worked — LIVE!).

By the end of this article, I hope you’ll have a better understanding of how speech recognition works in general and most importantly, how to implement that using Google Speech Recognition API with Python.

Trust me. It’s that simple.

Feel free to check out the source code here if you’re interested.

Let’s get started!

---

Why Google Speech Recognition API?

You may be wondering, “Is this the only API available given the growing demand and popularity of speech recognition?”

The answer is that there are also other APIs available either for free or paid services as below:

`recognize_bing()`: Microsoft Bing Speech
`recognize_google()`: Google Web Speech API
`recognize_google_cloud()`: Google Cloud Speech - requires installation of the google-cloud-speech package
`recognize_houndify()`: Houndify by SoundHound
`recognize_ibm()`: IBM Speech to Text
`recognize_sphinx()`: CMU Sphinx - requires installing PocketSphinx
`recognize_wit()`: Wit.ai

At the end, I chose Google Web Speech API from the SpeechRecognitionlibrary as it has a default API key that is hard-coded into the Speech Recognition library.

That means you can get started right away without having to get authentication with either an API key or a username/password combination for other APIs.

However, the convenience of Google Web Speech API also comes with certain limitations: The API quota for your own keys is 50 requests per day, and there is currently no way to raise this limit.

This fits our use case if we just want to use this API for experimentation purposes. Note that if you’re running an app or a website that’s calling the API consistently, then you may need to consider getting a paid service from either of the APIs above.

Building speech recognition with Python using Google Speech Recognition API

To avoid boring you with technical details on how speech recognition works, you can read this great article that talks about the mechanism in general and how to implement the API.

In the following writing, I’ll show you how I implemented this API step-by-step by following the article.

But first, you need to install SpeechRecognitionlibrary using `pip install SpeechRecognition`.

And we can use the Google Web Speech API that comes from this library itself.

In this implementation, I recorded my voice using own microphone and SpeechRecognizer accessed the microphone (Install PyAudio package to access the microphone) and recognized my voice accordingly.

Check out the code snippet below to understand the full implementation as they are relatively self-explanatory.

Function to recognize speech from microphone

To handle ambient noise, you’ll need to use the `adjust_for_ambient_noise()`method of the `Recognizer` class in order for the library to recognize your voice.

After running the `adjust_for_ambient_noise()`method, wait for a second and let it analyze the audio source collected in order to handle ambient noise and capture correct speech.

Lastly, we need to implement `try and except`block to handle errors such as when the API is unreachable or unresponsive after sending requests, or when our speech is unrecognizable.

To use the function above, you can just implement the block below and…Voilà! You did it! ?

A simple demo on using Google Speech Recognition API

Now that we have the full implementation code ready. It’s time to see how this thing works.

I recorded a short video to show you exactly how the API works from recording my voice to returning that in text format.

Although this may not seem to be too accurate as what we may have expected, this is definitely worth the time to play around with the code and API!

---

Final Thoughts

Thank you for reading.

I hope you now have a better understanding of how speech recognition works in general and most importantly, how to implement that using Google Speech Recognition API with Python.

Feel free to check out the source code here if you’re interested.

I’d also recommend you to try out other APIs to compare the speech-to-text accuracy.

Despite the fact that speech-enabled products are not being widely used in businesses and our day-to-day life at this stage, I truly believe that this technology will disrupt a lot of businesses and how consumers will use products with voice recognition functionalities, sooner or later.

As always, if you have any questions or comments feel free to leave your feedback below or you can always reach me on LinkedIn. Till then, see you in the next post! ?

---

About the Author

Admond Lee is known as one of the highly sought-after data scientists and consultants in helping start-up founders and various companies tackle their problems using data with strong expertise in data science consulting and industry knowledge.

You can connect with him on LinkedIn, Medium, Twitter, and Facebook or book a call appointment with him here if you are looking for data science consulting for your company.Book online with Admond LeeAdmond Lee is known as one of the highly sought-after data scientists and consultants in helping start-up founders and…bit.ly