AVS Release v1.0b1r7

espressif · Jan 10, 2019 · 2059099 · 2059099
1 parent ae7b660
commit 2059099
Show file tree

Hide file tree

Showing 145 changed files with 1,724 additions and 1,381 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,30 @@
 ## ChangeLog
 
+### v1.0b1r7 (Beta) - 2018-01-08
+
+**Enhancements**
+
+* Support for Google Voice Assistant (GVA) and Google Dialogflow (Alpha release). Please check README of each voice assistant to get started.
+* Support to play custom tones from application.
+* nvs-set CLI supports setting and getting int8_t variable type.
+
+**API Changes**
+
+* Example directories are renamed.
+* Many structures, functions and header files with "alexa" prefix are renamed with "va" prefix. Please follow README-FIRST.md if you want to compile and run older examples with new SDK.
+
+**Bug Fixes**
+
+* Background noise in audio with lower sampling rates.
+* In case of incorrect provisioning parameters, device needed to be reset via command line (via esptool.py on host or nvs-erase on device). Now reset-to-factory push button (Mode) could also be used.
+* Capabilities are announced automatically when an updated firmware with new capabilities is flashed onto the device.
+* More SNTP servers are added for faster time synchronization on device boot-up.
+
+**Known Issues/Improvements**
+
+* Only limited TuneIn radio stations are supported
+* It is largely tested with internet and WiFi connectivity intact throughout its operation. Some issues are seen when device loses connectivity.
+
 ### v1.0b1r6 (Beta) - 2018-12-24
 
 **Enhancements**

diff --git a/README-Alexa.md b/README-Alexa.md
@@ -0,0 +1,36 @@
+# Introduction
+Alexa is Amazon's personal virtual assistant which listens to user's voice commands and responds with appropriate answers. Apart from conversing with the user, Alexa lets you play music from a variety of music streaming services. Alexa also helps you manage to-do lists and allows for voice-assisted shopping from Amazon.
+
+# Device Configuration
+* Before proceeding with device configuration, make sure you have read and followed `README-Getting-Started.md`.
+* By default, when the firmware comes up it is un-provisioned. The device console should display the following lines in this mode:
+```
+I (xxx) conn_mgr_prov: Provisioning started with :
+	service name = ESP-Alexa-xxxx
+	service key =
+```
+You can use either the Android or iOS apps (links below) to provision your device to the desired Wi-Fi Access Point and associate the user's Amazon account with the device.
+* [Android](https://github.com/espressif/esp-avs-sdk/releases)
+* [iOS](https://github.com/espressif/esp-idf-provisioning-ios/tree/versions/avs)
+
+# Demo
+* Once the board boots up and successfully connects to the Wi-Fi network after provisioning, you will see a print "Alexa is ready", after which you can use either use "Rec" button on the board or say "Alexa" to start a conversation. For Tap-to-Talk, press and release the button and speak. The green LED glows when the microphone is active.
+* You can connect any external speaker/headphone with 3.5mm connector to PHONE JACK to listen to responses.
+* you can now ask any command like:
+    * tell me a joke
+    * how is the weather?
+    * will it rain today?
+    * Sing a song
+    * Play TuneIn radio
+    * Set volume level to 7
+* Press and Hold "Mode" button for 3 seconds to reset the board to factory settings.
+
+# Supported music streaming services
+* Amazon Music
+* Saavn (India)
+* Pandora
+* TuneIn Radio
+* iHeart Radio
+
+# Production notes
+* In order to create Alexa-enabled commercial products, Amazon certified acoustic front-end has to be used. Please reach out to Espressif if you are looking to go to production.
diff --git a/README-Dialogflow.md b/README-Dialogflow.md
@@ -0,0 +1,65 @@
+# Introduction 
+Dialogflow (previously known as API.AI) is a voice enabled conversational interface from Google.
+It enables IoT users to include natural language user interface in their applications, services and devices.
+
+The advantages of Dialogflow wrt voice assistants are less complexity, pay as you go pricing, custom wakeword allowed and no certification hassles.
+
+Unlike voice-assistants, Dialogflow let's you configure every step of the conversation, and it won't answer other trivia/questions like voice-assistants typically do. For e.g. A Dialogflow agent for Laundry project will provide information only about the configurable parameters of the laundry (like state, temperature, wash cycle etc.)
+
+This release facilitates the audio communication of ESP32 with a Google Dialogflow agent using its v2beta1 gRPC APIs
+
+# Dialogflow Agent Setup
+You will have to create a Dialogflow account and setup a Dialogflow agent in the cloud. This agent configuration is where you will specify what conversations will you be supporting.
+* Follow this [link](https://dialogflow.com/docs/getting-started) to get started.
+* Create your own agent
+    * You can add intents, entities, actions and parameters as per your agent's requirements.
+    * Build, test your agent and validate the responses using the console on Dialogflow.
+* Optionally, you can add an existing sample agent from [here](https://dialogflow.com/docs/samples) to your Dialogflow account and use the same.
+Note:
+> Make sure that "Set this intent as end of conversation" is enabled under "Responses" tab in each intent of your Dialogflow agent so that the device can use this information to close the interaction with user
+
+# Device Configuration
+* Your device needs to be configured with the correct credentials to talk to the above project.
+* Navigate to this [link](https://console.cloud.google.com/apis/dashboard)
+    * Select the agent created above as the project
+    * Go to Credentials section (on the left)
+        * Credentials -> Create credentials -> OAuth client ID -> Other
+        * OAuth consent screen -> Fill in the Support email (and other details as required) -> Save
+        * Credentials -> OAuth 2.0 client IDs -> Download the created OAuth client ID file (`client_secret_<client-id>.json`)
+* Follow the steps specified in this [link](https://developers.google.com/assistant/sdk/guides/library/python/embed/install-sample#generate_credentials)
+    * While using this step, use the following command instead of the one specified:
+```
+google-oauthlib-tool --scope https://www.googleapis.com/auth/cloud-platform \
+    --save --headless --client-secrets /path/to/client_secret_<client-id>.json
+```
+
+* Modify the example application provided in this SDK, to add the project ID (from `client_secret_<client-id>.json` file) in the `project_name` member of `device_config` before making a call to `dialogflow_init()`
+* Once you download credentials.json, you can use the following commands on the device console to set the client ID, client secret and refresh token on the device.
+```
+[Enter]
+>> nvs-set avs refreshToken string <refresh_token_from_credentials.json>
+>> nvs-set avs clientId string <client_id_from_credentials.json>
+>> nvs-set avs clientSecret string <client_secret_from_credentials.json>
+```
+* Use below CLI command to configure device's station interface
+```
+[Enter]
+>> wifi-set <ssid> <passphrase>
+```
+
+# Demo
+* Once the board successfully connects to the Wi-Fi network, you can either use "Rec" button on the board or say "Alexa" to start a conversation. (The current example only supports the _Alexa_ wakeword. Support for other wakeword will be available soon.) For Tap-to-Talk, press and release the button and speak.
+* For an example Laundry project, one can set below configurable parameters while creating the project as described above:
+    * State: On/Start or Off/Stop
+    * Temperature: Valid temperature values
+    * Wash Cycle: Heavy, Medium or Light
+* Now you can wake the device using either its wakeword ("Alexa") or pressing "Rec" button and say command like:
+    * Start the laundry with temperature 68 and heavy wash cycle
+* You can also initiate multi-turn conversations like:
+    * Turn on the laundry
+    * (Dialogflow: At what temperature) 75
+    * (Dialogflow: What is the wash cycle) Light wash cycle
+* The Assistant's language can be changed by setting an appropriate code string va_cfg->device_config.device_language in app_main.c. List of valid code strings can be found [here](https://dialogflow.com/docs/reference/language).
+
+NOTE:
+> Once a multi-turn conversation is in-progress you do not need to press "Rec" button or say wakeword for every turn.
diff --git a/README-FIRST.md b/README-FIRST.md
@@ -1,3 +1,12 @@
+## Upgrading to v1.0b1r7 (Beta)
+
+* Most API names have changed from alexa\_ to va\_. Please follow these steps to upgrade.
+* Download script v1_0r6Tov1_0r7_update_script.sh available under "releases" tab, and copy it in SDK's root directory (esp-voice-assistants).
+* Copy your example in examples directory.
+* Run the script using below commands:
+    * cd /path/to/esp-voice-assistants
+    * ./v1_0r6Tov1_0r7_update_script.sh
+
 ## Upgrading to v1.0b1r6 (Beta)
 
 * Release v1.0b1r6 changes the way WiFi credentials are stored in NVS. Hence when you upgrade your device to latest SDK, your device would go into provisioning mode again, and needs to be re-provisioned via provisioning app.
diff --git a/README-GVA.md b/README-GVA.md
@@ -0,0 +1,37 @@
+# Introduction
+Google Voice Assistant(GVA) is Google's version of a personal voice assistant. GVA is multilingual and allows users to converse in their preferred language. Apart from general queries, it allows users to check on the traffic conditions, emails, weather conditions and much more.
+
+# Project Setup
+* Before proceeding with this section, make sure you have read and followed `README-Getting-Started.md`.
+* Follow steps specified in this [link](https://developers.google.com/assistant/sdk/guides/library/python/embed/config-dev-project-and-account) and execute the following sections:
+  * Configure an Actions Console project
+  * Set activity controls for your account
+  * Register the Device Model using the registration UI
+  * Generate credentials
+
+# Device Configuration
+* In the project setup steps above, you would also have generated credentials to be configured in the device.
+* Once you download credentials.json, you can use below commands on device console to set client ID, client secret and refresh token on the device.
+```
+[Enter]
+>> nvs-set avs refreshToken string <refresh_token_from_credentials.json>
+>> nvs-set avs clientId string <client_id_from_credentials.json>
+>> nvs-set avs clientSecret string <client_secret_from_credentials.json>
+```
+* Use below CLI command to configure device's station interface
+```
+[Enter]
+>> wifi-set <ssid> <passphrase>
+```
+
+# Demo
+* Once the board successfully connects to the Wi-Fi network, you can use either use "Rec" button on the board or say "Alexa" to start a conversation. (The current example only supports the _Alexa_ wakeword. Support for other wakeword will be available soon.) For Tap-to-Talk, press and release the button and speak.
+* You can connect any external speaker/headphone with 3.5mm connector to PHONE JACK to listen to responses.
+* you can now ask any command like:
+    * tell me a joke
+    * how is the weather?
+    * will it rain today?
+    * Sing a song
+    * Set volume level to 7
+* Press and Hold "Mode" button for 3 seconds to reset the board to factory settings
+* Assistant's language can be changed by setting appropriate code string va_cfg->device_config.device_language in app_main.c. List of valid codes strings can be found [here](https://developers.google.com/actions/localization/languages-locales).
diff --git a/README-Getting-Started.md b/README-Getting-Started.md
@@ -0,0 +1,39 @@
+# Prerequisites
+Please prepare your host system with the toolchain. Please see http://esp-idf.readthedocs.io/en/latest/get-started/index.html for the host setup.
+
+# Supported Hardware
+lyrat and lyrat_sr application supports ESP32 based LyraT v4.1, LyraT v4.2 and LyraT v4.3
+lyratd_msc_sr application supports ESP32 based LyraTD_MSC v2.0, LyraTD_MSC v2.1
+
+# Prepare Images
+
+## Clone all the repositories
+```
+$ git clone --recursive https://github.com/espressif/esp-idf.git
+
+$ cd esp-idf; git checkout release/v3.1; cd ..
+
+$ git clone https://github.com/espressif/esp-avs-sdk.git
+```
+
+## Apply patches on esp-idf
+```
+$ cd esp-idf
+
+$ git apply ../esp-avs-sdk/esp-idf-patches/memset-i2s-dma-buffers-zero.patch
+
+$ git apply ../esp-avs-sdk/esp-idf-patches/esp-tls-Add-support-for-global-CA-store.-All-mbedtls.patch
+```
+
+## Build and flash the project
+```
+$ cd esp-avs-sdk/examples/<example_board_directory>
+
+$ export IDF_PATH=/path/to/esp-idf
+
+$ export ESPPORT=/dev/cu.SLAB_USBtoUART (or /dev/ttyUSB0 or /dev/ttyUSB1 on Linux or COMxx on MinGW)
+
+$ make -j 8 flash VOICE_ASSISTANT=<alexa/gva/dialogflow> monitor
+```
+NOTE:
+> lyrat app only supports Tap-to-talk whereas lyrat_sr and lyratd_msc_sr apps support both, "Alexa" wakeword and tap-to-talk.
diff --git a/README.md b/README.md
@@ -1,29 +1,46 @@
 ## Overview
 
-The ESP-Alexa SDK provides an implementation of Amazon's Alexa Voice Service endpoint for ESP32 microcontroller. This facilitates the developers to evaluate ESP32 based Alexa integrated devices like speakers and IoT devices. Please refer to [Changelog](CHANGELOG.md) to track release changes and known-issues.
+The ESP-Voice-Assistant SDK provides an implementation of Amazon's Alexa Voice Service, Google Voice Assistant and Google's conversational interface (aka Dialogflow) for ESP32 microcontroller. This facilitates the developers to evaluate ESP32 based voice assistant/s integrated devices like speakers and IoT devices. Please refer to [Changelog](CHANGELOG.md) to track release changes and known-issues.
 
 ### About SDK
 
-The SDK contains pre-built library of Alexa SDK along with sources of some of the utility components such as audio pipeline and connection manager. The SDK supports all major features of Alexa such as:
-* Basic Alexa conversation
-* Alexa dialogues and multi-turn
-* Audio Streaming and Playback: Saavn, Amazon music, TuneIn (Only limited stations are supported as of now)
-* Audio Book Support: Kindle, Audible
-* Volume control via Alexa command
-* Seek support for Audible
-* Alerts/Timers, Reminders, Notifications
-
-For now, Tap-To-Talk is the only interaction mode supported on LyraT.
+The SDK contains pre-built libraries for Alexa, GVA and Dialogflow along with sources of some of the utility components such as audio pipeline and connection manager. Below are the list of features supported for each voice assistant:
+* **Alexa**:
+    * Basic Alexa conversation
+    * Alexa dialogues and multi-turn
+    * Audio Streaming and Playback: Saavn, Amazon music, TuneIn (Only limited stations are supported as of now)
+    * Audio Book Support: Kindle, Audible
+    * Volume control via Alexa command
+    * Seek support for Audible
+    * Alerts/Timers, Reminders, Notifications
+
+* **Google Voice Assistant**:
+    * Basic conversation
+    * Multi-turn conversations
+    * Getting weather reports
+    * Multiple language support
+
+* **Google Dialogflow**:
+    * Basic conversation
+    * Multi-turn conversations
+    * Configure and control connected devices via voice, e.g "Turn the light on"
+    * Multiple language support
 
 ## Supported Hardware
 
-Release supports following hardware platforms:
+The SDK supports the following hardware platforms:
 * [ESP32-LyraT](https://www.espressif.com/en/products/hardware/esp32-lyrat)
 * [ESP32-LyraTD-MSC](https://www.espressif.com/en/products/hardware/esp32-lyratd-msc)
 
-The SDK can easily be extended to other ESP32 based audio platforms that have SPIRAM availability.
+The following list of acoustic front-ends is also supported. Please contact Espressif to enable acccess to these solutions.
+* DSPG DBMD5
+* Intel s1000
+* Synaptics CX20921
 
 ## Getting started
 
-* When flashing the SDK for the first time, it is recommended to do `make erase_flash` to wipe out entire flash and start out fresh.
-* Please refer to example READMEs to get started with flashing, provisioning and Alexa interactions.
+* Follow the `README-Getting-Started.md` to clone the required repositories and to compile and flash the firmware.
+  * When flashing the SDK for the first time, it is recommended to do `make erase_flash` to wipe out entire flash and start out fresh.
+* Go through `README-<voice_assistant>.md` to know how to provision the device and to get authentication tokens from respective authorization server and flash them onto the device.
+* Check example application's README for board or example specific changes that might be required.
+* If you are updating from previous release, please check `README-FIRST.md` to know about any specific actions that needs to be taken while upgrading.
diff --git a/components/codecs/include/audio_codec.h b/components/codecs/include/audio_codec.h
@@ -41,9 +41,22 @@ typedef enum {
     CODEC_TYPE_MP3 = 1,
     CODEC_TYPE_AAC,
     CODEC_TYPE_MP4,
-    CODEC_TYPE_OPUS
+    CODEC_TYPE_OPUS,
+    CODEC_TYPE_FLAC,
+    CODEC_TYPE_AMR
 } audio_codec_identifier_t;
 
+/* Audio type */
+typedef enum {
+    AUDIO_TYPE_UNKNOWN,
+    AUDIO_TYPE_WAV,
+    AUDIO_TYPE_AMRNB,
+    AUDIO_TYPE_AMRWB,
+    AUDIO_TYPE_M4A,
+    AUDIO_TYPE_AAC,
+    AUDIO_TYPE_TSAAC
+} audio_type_t;
+
 typedef enum {
     CODEC_STATE_INIT = 1,
     CODEC_STATE_RUNNING,
@@ -52,6 +65,12 @@ typedef enum {
     CODEC_STATE_DESTROYED,
 } audio_codec_state_t;
 
+enum {
+    CODEC_FAIL = 0,
+    CODEC_DONE = 1,
+    CODEC_OK
+};
+
 typedef struct audio_codec_audio_info {
     int sampling_freq;
     int channels;

diff --git a/components/codecs/include/resampling.h b/components/codecs/include/resampling.h
@@ -38,7 +38,8 @@ typedef struct {
     short inpcm[INPCM_DELAY_SIZE * 2];  ///the pcm value of last time calling.  maximum should be 6: 48000/8000;
     int innum;  /// the total input pcm number
     int outnum; /// the total outnum pcm number
-    float hp_mem[4]; ///for filter, the first two is for first channel, the last two is for second channel
+    float hp_mem[4]; /// for filter, the first two is for first channel, the last two is for second channel
+    int resample_ratio; /// (out_freq * 512)/in_freq. if prev resample ratio is not same as current, reset whole structure.
 } audio_resample_config_t;
 
 /**

diff --git a/components/codecs/lib/libcodecs.a b/components/codecs/lib/libcodecs.a
diff --git a/components/esp-alexa/component.mk b/components/esp-alexa/component.mk
diff --git a/components/esp-alexa/include/tone.h b/components/esp-alexa/include/tone.h
diff --git a/components/esp-voice-assistant/component.mk b/components/esp-voice-assistant/component.mk
@@ -0,0 +1,16 @@
+#
+# Component Makefile
+#
+
+COMPONENT_ADD_INCLUDEDIRS := include
+
+ifeq ("$(VOICE_ASSISTANT)", "gva")
+VA_LIB_PATH := $(COMPONENT_PATH)/lib/libgva.a
+else ifeq ("$(VOICE_ASSISTANT)", "dialogflow")
+VA_LIB_PATH := $(COMPONENT_PATH)/lib/libdialogflow.a
+else
+VA_LIB_PATH := $(COMPONENT_PATH)/lib/libalexa.a
+endif
+
+COMPONENT_ADD_LDFLAGS += $(VA_LIB_PATH)
+COMPONENT_ADD_LINKER_DEPS += $(VA_LIB_PATH)
diff --git a/components/esp-alexa/include/alerts.h → ...ents/esp-voice-assistant/include/alerts.h b/components/esp-alexa/include/alerts.h → ...ents/esp-voice-assistant/include/alerts.h