The secret to powering web apps with full speech recognition
What’s the secret to powering web apps with speech recognition?
The secret is Chrome (or Chromium)Web Speech API. This API, which works with Chromium-based browsers, is fantastic and does all the heavy work for us, leaving us to only care about building better interfaces using voice.
However incredible this API is, as of Nov 2020, it is not widely supported, and that can be an issue depending on your requirements. Here is the current support status. Additionally, it only works online, so you will need a different setup if you are offline.
Naturally, this API is available through JavaScript, and it is not unique or restricted to React. Nonetheless, there’s a greatReact librarythat simplifies the API even more, and it’s what we are going to use today.
Feel free to read the documentation of the Speech Recognition API onMDN Docsif you want to do your implementation on vanilla JS or any other framework.
[Read:Here’s how to make your website more accessible]
Hello world, I’m transcribing
We will start with the basics, and we will build a Hello World app that will transcribe inreal-timewhat the user is saying. Before doing all the good stuff, we need a good working base, so let’s start setting up our project. For simplicity, we will use create-react-app to set up our project.
Next, we will work on the fileApp.js. CRA (create-react-app) creates a good starting point for us. Just kidding, we won’t need any of it, so start with a blankApp.jsfile and code with me.
Before we can do anything, we need theimports:
Pretty easy, right? Let’s see in detail what we are doing, starting with theuseSpeechRecognitionhook.
This hook is responsible for capturing the results of the speech recognition process. It’s our gateway to producing the desire results. In its simplest form, we can extract thetranscriptof what the user is saying when the microphone is enabled as we do here:
Even when we activate the hook, we don’t start immediately listening; for that, we need to interact with the objectSpeechRecognitionthat we imported at the beginning. This object exposes a series of methods that will help us control the speech recognition API, methods to start listening on the microphone, stop, change languages, etc.
Our interface simply exposes two buttons for controlling the microphone status; if you copied the provided code, your interface should look and behave like this:
If you tried the demo application, you might have noticed that you had missing words if you perhaps paused after listening. This is because the library by default sets this behavior, but you can change it by setting the parametercontinuouson thestartListeningmethod, like this:
Compatibility detection
Our app is nice! But what happens if your browser is not supported? Can we have a fallback behavior for those scenarios? Yes, we can. If you need to change your app’s behavior based on whether the speech recognition API is supported or not,react-speech-recognitionhas a method for exactly this purpose. Here is an example:
Detecting commands
So far, we covered how to convert voice into text, but now we will take it one step further by recognizing pre-defined commands in our app. Building this functionality will make it possible for us to build apps that can fully function by voice.
If we need to build a command parser, it could be a lot of work, but thankfully, the speech recognition API already has a built-in command recognition functionality.
To respond when the user says a particular phrase, you can pass in a list of commands to theuseSpeechRecognitionhook. Each command is an object with the following properties:
Here is an example of how to pre-define commands for your application:
Conclusion
Thanks to Chrome’s speech recognition APIs, building voice-activated apps couldn’t be easier and more fun. Hopefully, in the near future, we’ll see this API supported by more browsers and with offline capabilities. Then, it will become a very powerful API that may change the way we build the web.
Thisarticlewas originally published onLive Code StreambyJuan Cruz Martinez(twitter:@bajcmartinez), founder and publisher of Live Code Stream, entrepreneur, developer, author, speaker, and doer of things.
Live Code Streamis also available as a free weekly newsletter. Sign up for updates on everything related to programming, AI, and computer science in general.