Use a spare Android device to provide an offline speech recognition interface.
Proof of Concept. Right now this software works great for my own personal use, but in order to be more broadly useful it requires testing on a wider range of Android devices.
Please open a Pull Request with the results of your own tests, or, better, patches that add support for additional devices.
- A Google Android device ("phone").
- A computer on your local network to which the phone may be connected by USB ("server").
Older versions of any of the following may work, but have not been tested.
- Termux >= 0.118.0.
- Node.js >= 17.7.2 (via the Termux package manager).
- npm package manager >= 8.5.2 (included with Node.js).
- GNU Bash >= 5.1.4.
- Android Debug Bridge (
adb
) >= 1.0.41. - Node.js >= 18.12.1. Versions older than 4.0.0 will definitely not work.
I'd like to have general purpose speech recognition software of decent quality. I like Google's speech recognition software, but don't want my voice leaked to their servers (or anywhere on the internet). I'd also like to not have to pay any money for this.
My Google Pixel 1 phone comes with a feature on its default virtual keyboard for performing text transcription using speech recognition. When an internet connection is detected, the phone will prefer to send my voice to Google's servers for text transcription. However, when the phone is offline, its built-in software is used for entirely local, offline speech recognition and text transcription. It would be very useful to have some means of accessing this for my own purposes. This would let me give an old device a new use, too!
None of this functionality is exposed for programmatic control. The problems are these:
- Recognition can only be initialized by manually tapping the microphone icon on the built-in virtual keyboard.
- The text transcription will be written out to whatever text field happens to be focused.
- These must both be done without giving the phone internet access.
The first problem is solved by simulating the required user input, by connecting the phone to a computer and using adb
, which has a sub-command for simulating a tap event at a particular X/Y coordinate: adb shell input tap X Y
.
The second problem requires capturing the text written by the speech recognition virtual keyboard. This is done by giving focus to a textarea
HTML element, for the keyboard to write into, and whose text input can be detected by its input
JavaScript event. This HTML element is part of a text capture web page which is served by a Node.js web server running on the Android phone itself, using the free software Termux. This local web server also runs a WebSocket server. The computer can then run a small control server that connects to the WebSocket server, and both receive the text transcriptions captured, and instruct the phone to listen for voice input.
Any computer on the network may then issue a request to the control server over TCP, which will trigger the necessary simulated screen taps to make the phone start listening for speech, and then transmit back to the control server the text transcription captured, which is then passed on to the requester.
-
Install Termux via F-Droid (not available on the Google Play Store).
-
Within a Termux terminal, install Node.js using the command:
pkg install nodejs
-
Clone this repository using Termux:
git clone https://github.com/hackergrrl/offline-android-speech-recognition
-
Put the phone into Airplane Mode or otherwise disable Internet access.
-
Run the web server using Termux:
cd offline-android-speech-recognition npm install node server.js
-
Open a web browser and navigate it to http://localhost:9001. The background will turn blue to indicate a positive connection to the web server. Red indicates something is amiss.
-
Tap on the text field to focus it and bring up the virtual keyboard.
-
This will change the keyboard to a UI component with a large microphone icon:
-
The phone setup is now complete.
-
Install the
adb
command as part of the Android Platform SDK Tools, and ensure it is in your shell's$PATH
. -
Install Node.js.
-
Install npm.
-
Clone this repository:
git clone https://github.com/hackergrrl/offline-android-speech-recognition
-
Install dependencies:
cd offline-android-speech-recognition npm install
-
Plug in the phone to the computer using a USB cable.
-
Start the control server, which will connect to the phone's WebSocket server:
node control.js
-
Test the interface by running a command on your own computer like
netcat 192.168.1.103 9003
, to connect to the control server. Typelisten
and press enter. The phone's background will change from blue to green, indicating it is listening for voice input. The transcribed result (or an error) is written back.
The control server exposes API access via a TCP server on port 9003. It accepts one command, terminated by a newline: listen
. This cues the phone to listen for speech in the room or area. After the speech ends, the API will write back the text transcript in lower case. If no speech is heard before the recognition times out (~5 seconds), the text nevermind
is returned. If there is an error, the text returned will be ERROR: <text>
.
Device | Android Version | Works | Notes |
---|---|---|---|
Google Pixel 1 | 10 | ✅ |
This has been working very well for me. I use the programmatic interface for voice dictation on my laptop (piped into xdotool type
for input simulation), as well as for command recognition for various home automation tasks, such as timers and media playback control. I combined this with vosk, which has offline continuous speech recognition of lesser quality, having vosk do wake-word detection (e.g. "computer!") and then activating the Android phone for command recognition.