- 中文版本
- Basic Example:
This example demonstrates the process of reading environmental sound data from the microphone, analyzing and processing the data with VAD to determine whether it is noise or human speech, and finally outputting the result.
The data stream in the pipeline is as follows:
mic ---> codec_chip ---> i2s_stream ---> filter ---> raw ---> vad_process ---> output
This example runs on the boards that are marked with a green checkbox in the table. Please remember to select the board in menuconfig as discussed in Section Configuration below.
This example supports IDF release/v3.3 and later branches. By default, it runs on ADF's built-in branch $ADF_PATH/esp-idf
.
The default board for this example is ESP32-Lyrat V4.3
, if you need to run this example on other development boards, select the board in menuconfig, such as ESP32-Lyrat-Mini V1.1
.
menuconfig > Audio HAL > ESP32-Lyrat-Mini V1.1
Build the project and flash it to the board, then run monitor tool to view serial output (replace PORT
with your board's serial port name):
idf.py -p PORT flash monitor
To exit the serial monitor, type Ctrl-]
.
See ESP-IDF Programming Guide for full steps to configure and build an ESP-IDF project.
- The example starts running and automatically detecting the surrounding background environment sound. The log is as follows:
I (0) cpu_start: App cpu up.
I (367) heap_init: Initializing. RAM available for dynamic allocation:
I (374) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
I (380) heap_init: At 3FFB3B50 len 0002C4B0 (177 KiB): DRAM
I (386) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (393) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (399) heap_init: At 4008C240 len 00013DC0 (79 KiB): IRAM
I (405) cpu_start: Pro cpu start user code
I (424) spi_flash: detected chip: gd
I (424) spi_flash: flash io: dio
W (425) spi_flash: Detected size(8192k) larger than the size in the binary image header(2048k). Using the size in the binary image header.
I (434) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.
I (445) EXAMPLE-VAD: [ 1 ] Start codec chip
I (475) EXAMPLE-VAD: [ 2 ] Create audio pipeline for recording
I (475) EXAMPLE-VAD: [2.1] Create i2s stream to read audio data from codec chip
I (505) EXAMPLE-VAD: [2.2] Create filter to resample audio data
I (505) EXAMPLE-VAD: [2.3] Create raw to receive data
I (505) EXAMPLE-VAD: [ 3 ] Register all elements to audio pipeline
I (505) EXAMPLE-VAD: [ 4 ] Link elements together [codec_chip]-->i2s_stream-->filter-->raw-->[VAD]
I (515) EXAMPLE-VAD: [ 5 ] Start audio_pipeline
I (555) EXAMPLE-VAD: [ 6 ] Initialize VAD handle
- If someone is talking around at this time and the program detects the speech signal, it will prompt
Speech detected
. The log is as follows:
I (445) EXAMPLE-VAD: [ 1 ] Start codec chip
I (475) EXAMPLE-VAD: [ 2 ] Create audio pipeline for recording
I (475) EXAMPLE-VAD: [2.1] Create i2s stream to read audio data from codec chip
I (505) EXAMPLE-VAD: [2.2] Create filter to resample audio data
I (505) EXAMPLE-VAD: [2.3] Create raw to receive data
I (505) EXAMPLE-VAD: [ 3 ] Register all elements to audio pipeline
I (505) EXAMPLE-VAD: [ 4 ] Link elements together [codec_chip]-->i2s_stream-->filter-->raw-->[VAD]
I (515) EXAMPLE-VAD: [ 5 ] Start audio_pipeline
I (555) EXAMPLE-VAD: [ 6 ] Initialize VAD handle
I (7835) EXAMPLE-VAD: Speech detected
I (7865) EXAMPLE-VAD: Speech detected
I (7895) EXAMPLE-VAD: Speech detected
I (7925) EXAMPLE-VAD: Speech detected
I (7955) EXAMPLE-VAD: Speech detected
I (7985) EXAMPLE-VAD: Speech detected
I (8015) EXAMPLE-VAD: Speech detected
I (8045) EXAMPLE-VAD: Speech detected
I (8065) EXAMPLE-VAD: Speech detected
I (8105) EXAMPLE-VAD: Speech detected
I (8135) EXAMPLE-VAD: Speech detected
I (8165) EXAMPLE-VAD: Speech detected
I (8195) EXAMPLE-VAD: Speech detected
I (8215) EXAMPLE-VAD: Speech detected
I (8245) EXAMPLE-VAD: Speech detected
I (8285) EXAMPLE-VAD: Speech detected
I (8315) EXAMPLE-VAD: Speech detected
I (8345) EXAMPLE-VAD: Speech detected
I (8375) EXAMPLE-VAD: Speech detected
I (8395) EXAMPLE-VAD: Speech detected
I (8425) EXAMPLE-VAD: Speech detected
I (8465) EXAMPLE-VAD: Speech detected
I (8495) EXAMPLE-VAD: Speech detected
I (8525) EXAMPLE-VAD: Speech detected
I (8545) EXAMPLE-VAD: Speech detected
A complete log is as follows:
rst:0x1 (POWERON_RESET),boot:0x1f (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:7152
load:0x40078000,len:13212
load:0x40080400,len:4568
0x40080400: _init at ??:?
entry 0x400806f4
I (27) boot: ESP-IDF v4.2.2 2nd stage bootloader
I (27) boot: compile time 10:29:24
I (27) boot: chip revision: 3
I (30) boot_comm: chip revision: 3, min. bootloader chip revision: 0
I (37) boot.esp32: SPI Speed : 40MHz
I (42) boot.esp32: SPI Mode : DIO
I (46) boot.esp32: SPI Flash Size : 2MB
I (51) boot: Enabling RNG early entropy source...
I (56) boot: Partition Table:
I (60) boot: ## Label Usage Type ST Offset Length
I (67) boot: 0 nvs WiFi data 01 02 00009000 00006000
I (75) boot: 1 phy_init RF data 01 01 0000f000 00001000
I (82) boot: 2 factory factory app 00 00 00010000 00100000
I (90) boot: End of partition table
I (94) boot_comm: chip revision: 3, min. application chip revision: 0
I (101) esp_image: segment 0: paddr=0x00010020 vaddr=0x3f400020 size=0x1c268 (115304) map
I (154) esp_image: segment 1: paddr=0x0002c290 vaddr=0x3ffb0000 size=0x021b8 ( 8632) load
I (158) esp_image: segment 2: paddr=0x0002e450 vaddr=0x40080000 size=0x01bc8 ( 7112) load
0x40080000: _WindowOverflow4 at /hengyongchao/esp-idfs/esp-idf-v4.2.2/components/freertos/xtensa/xtensa_vectors.S:1730
I (164) esp_image: segment 3: paddr=0x00030020 vaddr=0x400d0020 size=0x4ff28 (327464) map
0x400d0020: _stext at ??:?
I (294) esp_image: segment 4: paddr=0x0007ff50 vaddr=0x40081bc8 size=0x0a678 ( 42616) load
0x40081bc8: esp_crosscore_int_send at /hengyongchao/esp-idfs/esp-idf-v4.2.2/components/esp32/crosscore_int.c:109
I (320) boot: Loaded app from partition at offset 0x10000
I (320) boot: Disabling RNG early entropy source...
I (320) cpu_start: Pro cpu up.
I (324) cpu_start: Application information:
I (329) cpu_start: Project name: example_vad
I (334) cpu_start: App version: v2.2-195-gf0079c11
I (340) cpu_start: Compile time: Sep 18 2021 10:29:18
I (346) cpu_start: ELF file SHA256: 1c0b620a29a11e10...
I (352) cpu_start: ESP-IDF: v4.2.2
I (357) cpu_start: Starting app cpu, entry point is 0x40081960
0x40081960: call_start_cpu1 at /hengyongchao/esp-idfs/esp-idf-v4.2.2/components/esp32/cpu_start.c:287
I (0) cpu_start: App cpu up.
I (367) heap_init: Initializing. RAM available for dynamic allocation:
I (374) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
I (380) heap_init: At 3FFB3B50 len 0002C4B0 (177 KiB): DRAM
I (386) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (393) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (399) heap_init: At 4008C240 len 00013DC0 (79 KiB): IRAM
I (405) cpu_start: Pro cpu start user code
I (424) spi_flash: detected chip: gd
I (424) spi_flash: flash io: dio
W (425) spi_flash: Detected size(8192k) larger than the size in the binary image header(2048k). Using the size in the binary image header.
I (434) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.
I (445) EXAMPLE-VAD: [ 1 ] Start codec chip
I (475) EXAMPLE-VAD: [ 2 ] Create audio pipeline for recording
I (475) EXAMPLE-VAD: [2.1] Create i2s stream to read audio data from codec chip
I (505) EXAMPLE-VAD: [2.2] Create filter to resample audio data
I (505) EXAMPLE-VAD: [2.3] Create raw to receive data
I (505) EXAMPLE-VAD: [ 3 ] Register all elements to audio pipeline
I (505) EXAMPLE-VAD: [ 4 ] Link elements together [codec_chip]-->i2s_stream-->filter-->raw-->[VAD]
I (515) EXAMPLE-VAD: [ 5 ] Start audio_pipeline
I (555) EXAMPLE-VAD: [ 6 ] Initialize VAD handle
I (7835) EXAMPLE-VAD: Speech detected
I (7865) EXAMPLE-VAD: Speech detected
I (7895) EXAMPLE-VAD: Speech detected
I (7925) EXAMPLE-VAD: Speech detected
I (7955) EXAMPLE-VAD: Speech detected
I (7985) EXAMPLE-VAD: Speech detected
I (8015) EXAMPLE-VAD: Speech detected
I (8045) EXAMPLE-VAD: Speech detected
I (8065) EXAMPLE-VAD: Speech detected
I (8105) EXAMPLE-VAD: Speech detected
I (8135) EXAMPLE-VAD: Speech detected
I (8165) EXAMPLE-VAD: Speech detected
I (8195) EXAMPLE-VAD: Speech detected
I (8215) EXAMPLE-VAD: Speech detected
I (8245) EXAMPLE-VAD: Speech detected
I (8285) EXAMPLE-VAD: Speech detected
I (8315) EXAMPLE-VAD: Speech detected
I (8345) EXAMPLE-VAD: Speech detected
I (8375) EXAMPLE-VAD: Speech detected
I (8395) EXAMPLE-VAD: Speech detected
I (8425) EXAMPLE-VAD: Speech detected
I (8465) EXAMPLE-VAD: Speech detected
I (8495) EXAMPLE-VAD: Speech detected
I (8525) EXAMPLE-VAD: Speech detected
I (8545) EXAMPLE-VAD: Speech detected
Some random sounds that have similar frequency as human speech may be reported as speech, such as the sound of hitting a chessboard and music with certain tones.
Please use the following feedback channels:
- For technical queries, go to the esp32.com forum
- For a feature request or bug report, create a GitHub issue
We will get back to you as soon as possible.