A best practice for streaming audio from a browser microphone to Dialogflow or Google Cloud STT by using websockets.
Airport SelfService Kiosk demo, to demonstrate how microphone streaming to GCP works, from a web application.
It makes use of the following GCP resources:
- Dialogflow & Knowledge Bases
- Speech to Text
- Text to Speech
- Translate API
- (optionally) App Engine Flex
In this demo, you can start recording your voice, it will display answers on a screen and synthesize the speech.
A working demo can be found here: http://selfservicedesk.appspot.com/
I wrote very extensive blog articles on how to setup your streaming project. Want to exactly learn how this code works? Have a start here:
Blog 1: Introduction to the GCP conversational AI components, and integrating your own voice AI in a web app.
Blog 2: Building a client-side web application which streams audio from a browser microphone to a server.
Blog 3: Building a web server which receives a browser microphone stream and uses Dialogflow or the Speech to Text API for retrieving text results.
Blog 4: Getting Audio Data from Text (Text to Speech) and play it in your browser.
There's a presentation and a video that accompanies the tutorial.
-
apt-get install nodejs -y
-
apt-get npm
sudo npm install -g @angular/cli
-
git clone https://github.com/dialogflow/selfservicekiosk-audio-streaming.git selfservicekiosk
-
Set the PROJECT_ID variable: export PROJECT_ID=[gcp-project-id]
-
Set the project:
gcloud config set project $PROJECT_ID
-
Download the service account key.
-
Assign the key to environment var: GOOGLE_APPLICATION_CREDENTIALS
LINUX/MAC
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service_account.json
WIN
set GOOGLE_APPLICATION_CREDENTIALS=c:\path\to\service_account.json
-
Login:
gcloud auth login
-
Open server/env.txt, change the environment variables and rename the file to server/.env
-
Enable APIs:
gcloud services enable \
appengineflex.googleapis.com \
containerregistry.googleapis.com \
cloudbuild.googleapis.com \
cloudtrace.googleapis.com \
dialogflow.googleapis.com \
logging.googleapis.com \
monitoring.googleapis.com \
sourcerepo.googleapis.com \
speech.googleapis.com \
mediatranslation.googleapis.com \
texttospeech.googleapis.com \
translate.googleapis.com
-
Build the client-side Angular app:
cd client && sudo npm install npm run-script build
-
Start the server Typescript app, which is exposed on port 8080:
cd ../server && sudo npm install npm run-script watch
-
Browse to http://localhost:8080
-
Create a Dialogflow agent at: http://console.dialogflow.com
-
Zip the contents of the dialogflow folder, from this repo.
-
Click settings > Import, and upload the Dialogflow agent zip, you just created.
-
Caution: Knowledge connector settings are not currently included when exporting, importing, or restoring agents.
Make sure you have enabled Beta features in settings.
- Select Knowledge from the left menu.
- Create a Knowledge Base: Airports
- Add the following Knowledge Base FAQs, as text/html documents:
- https://www.panynj.gov/port-authority/en/help-center/faq/airports-faq-help-center.html
- https://www.schiphol.nl/en/before-you-take-off/
- https://www.flysfo.com/faqs
- As a response it requires the following custom payload:
{ "knowledgebase": true, "QUESTION": "$Knowledge.Question[1]", "ANSWER": "$Knowledge.Answer[1]" }
- And to make the Text to Speech version of the answer working add the following Text SSML response:
$Knowledge.Answer[1]
This demo makes heavy use of websockets and
the microphone getUserMedia()
HTML5 API requires
to run over HTTPS. Therefore, I deploy this demo
with a custom runtime, so I can include my own Dockerfile.
-
Edit the app.yaml to tweak the environment variables. Set the correct Project ID.
-
Deploy with:
gcloud app deploy
-
Browse:
gcloud app browse
The selfservice kiosk is a full end to end application. To showcase smaller examples, I've created 6 small demos. Here's how you can get these running:
-
Install the required libraries, run the following command from the examples folder:
npm install
-
Start the simpleserver node app:
npm --EXAMPLE=1 --PORT=8080 --PROJECT_ID=[your-gcp-project-id] run start
To switch to the various examples, edit the EXAMPLE variable to one of these:
- Example 1: Dialogflow Speech Intent Detection
- Example 2: Dialogflow Speech Detection through streaming
- Example 3: Dialogflow Speech Intent Detection with Text to Speech output
- Example 4: Speech to Text Transcribe Recognize Call
- Example 5: Speech to Text Transcribe Streaming Recognize
- Example 6: Text to Speech in a browser
- Browse to http://localhost:8080. Open the inspector, to preview the Dialogflow results object.
The code required for these examples can be found in simpleserver.js for the different Dialogflow & STT calls. - example1.html - example5.html will show the client-side implementations.
Apache 2.0
This is not an official Google product.