Skip to content

Commit

Permalink
4.3.0 (#187)
Browse files Browse the repository at this point in the history
  • Loading branch information
NeonDaniel authored Dec 18, 2023
2 parents df95146 + b9c3c64 commit ac55f6a
Show file tree
Hide file tree
Showing 7 changed files with 291 additions and 104 deletions.
44 changes: 22 additions & 22 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,60 +1,60 @@
# Changelog

## [4.2.0](https://github.com/NeonGeckoCom/neon_speech/tree/4.2.0) (2023-10-27)
## [4.2.1a7](https://github.com/NeonGeckoCom/neon_speech/tree/4.2.1a7) (2023-12-13)

[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.1.1a6...4.2.0)
[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.2.1a6...4.2.1a7)

**Fixed bugs:**
**Merged pull requests:**

- \[BUG\] Docker `start_listening` resource missing [\#170](https://github.com/NeonGeckoCom/neon_speech/issues/170)
- Update neon-utils dependency to stable release [\#186](https://github.com/NeonGeckoCom/neon_speech/pull/186) ([NeonDaniel](https://github.com/NeonDaniel))

## [4.1.1a6](https://github.com/NeonGeckoCom/neon_speech/tree/4.1.1a6) (2023-10-26)
## [4.2.1a6](https://github.com/NeonGeckoCom/neon_speech/tree/4.2.1a6) (2023-11-29)

[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.1.1a5...4.1.1a6)
[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.2.1a5...4.2.1a6)

**Merged pull requests:**

- OVOS Dinkum Listener Backwards Compat [\#178](https://github.com/NeonGeckoCom/neon_speech/pull/178) ([NeonDaniel](https://github.com/NeonDaniel))
- Override ovos.language.stt handler for server/API usage [\#185](https://github.com/NeonGeckoCom/neon_speech/pull/185) ([NeonDaniel](https://github.com/NeonDaniel))

## [4.1.1a5](https://github.com/NeonGeckoCom/neon_speech/tree/4.1.1a5) (2023-10-26)
## [4.2.1a5](https://github.com/NeonGeckoCom/neon_speech/tree/4.2.1a5) (2023-11-22)

[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.1.1a4...4.1.1a5)
[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.2.1a4...4.2.1a5)

**Merged pull requests:**

- Stable dependencies for release [\#177](https://github.com/NeonGeckoCom/neon_speech/pull/177) ([NeonDaniel](https://github.com/NeonDaniel))
- Update global config on local user STT language change [\#184](https://github.com/NeonGeckoCom/neon_speech/pull/184) ([NeonDaniel](https://github.com/NeonDaniel))

## [4.1.1a4](https://github.com/NeonGeckoCom/neon_speech/tree/4.1.1a4) (2023-10-13)
## [4.2.1a4](https://github.com/NeonGeckoCom/neon_speech/tree/4.2.1a4) (2023-11-22)

[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.1.1a3...4.1.1a4)
[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.2.1a3...4.2.1a4)

**Merged pull requests:**

- Update Dinkum Listener dependency [\#176](https://github.com/NeonGeckoCom/neon_speech/pull/176) ([NeonDaniel](https://github.com/NeonDaniel))
- Add timing metrics [\#183](https://github.com/NeonGeckoCom/neon_speech/pull/183) ([NeonDaniel](https://github.com/NeonDaniel))

## [4.1.1a3](https://github.com/NeonGeckoCom/neon_speech/tree/4.1.1a3) (2023-10-03)
## [4.2.1a3](https://github.com/NeonGeckoCom/neon_speech/tree/4.2.1a3) (2023-11-14)

[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.1.1a2...4.1.1a3)
[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.2.1a2...4.2.1a3)

**Merged pull requests:**

- Add timing metrics for minerva testing [\#175](https://github.com/NeonGeckoCom/neon_speech/pull/175) ([NeonDaniel](https://github.com/NeonDaniel))
- Improved timing context handling with unit tests [\#182](https://github.com/NeonGeckoCom/neon_speech/pull/182) ([NeonDaniel](https://github.com/NeonDaniel))

## [4.1.1a2](https://github.com/NeonGeckoCom/neon_speech/tree/4.1.1a2) (2023-07-28)
## [4.2.1a2](https://github.com/NeonGeckoCom/neon_speech/tree/4.2.1a2) (2023-11-10)

[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.1.1a1...4.1.1a2)
[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.2.1a1...4.2.1a2)

**Merged pull requests:**

- Kubernetes/No-audio server compat. [\#174](https://github.com/NeonGeckoCom/neon_speech/pull/174) ([NeonDaniel](https://github.com/NeonDaniel))
- Add timing metrics for audio input to handler in speech service [\#181](https://github.com/NeonGeckoCom/neon_speech/pull/181) ([NeonDaniel](https://github.com/NeonDaniel))

## [4.1.1a1](https://github.com/NeonGeckoCom/neon_speech/tree/4.1.1a1) (2023-07-27)
## [4.2.1a1](https://github.com/NeonGeckoCom/neon_speech/tree/4.2.1a1) (2023-11-09)

[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.1.0...4.1.1a1)
[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.2.0...4.2.1a1)

**Merged pull requests:**

- Update container config handling and resolve logged warnings [\#173](https://github.com/NeonGeckoCom/neon_speech/pull/173) ([NeonDaniel](https://github.com/NeonDaniel))
- Resample API input wav audio to ensure format matches listener config [\#180](https://github.com/NeonGeckoCom/neon_speech/pull/180) ([NeonDaniel](https://github.com/NeonDaniel))



Expand Down
3 changes: 3 additions & 0 deletions neon_speech/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,6 @@
# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

# Import to ensure patched class is applied
from neon_speech.transformers import NeonAudioTransformerService
163 changes: 127 additions & 36 deletions neon_speech/service.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@
# SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

import os
from typing import Dict

import ovos_dinkum_listener.plugins

from tempfile import mkstemp
Expand Down Expand Up @@ -80,8 +82,6 @@ def on_started():


class NeonSpeechClient(OVOSDinkumVoiceService):
_stopwatch = Stopwatch("get_stt")

def __init__(self, ready_hook=on_ready, error_hook=on_error,
stopping_hook=on_stopping, alive_hook=on_alive,
started_hook=on_started, watchdog=lambda: None,
Expand Down Expand Up @@ -112,6 +112,8 @@ def __init__(self, ready_hook=on_ready, error_hook=on_error,
watchdog=watchdog)
self.daemon = daemonic
self.config.bus = self.bus
self._stt_stopwatch = Stopwatch("get_stt", allow_reporting=True,
bus=self.bus)
from neon_utils.signal_utils import init_signal_handlers, \
init_signal_bus
init_signal_bus(self.bus)
Expand All @@ -133,6 +135,37 @@ def __init__(self, ready_hook=on_ready, error_hook=on_error,
LOG.info("Skipping api_stt init")
self.api_stt = None

def _record_begin(self):
self._stt_stopwatch.start()
OVOSDinkumVoiceService._record_begin(self)

def _stt_text(self, text: str, stt_context: dict):
self._stt_stopwatch.stop()
stt_context.setdefault("timing", dict())
stt_context["timing"]["get_stt"] = self._stt_stopwatch.time

# This is where the first Message of the interaction is created
OVOSDinkumVoiceService._stt_text(self, text, stt_context)
self._stt_stopwatch.report()

def _save_stt(self, audio_bytes, stt_meta, save_path=None):
stopwatch = Stopwatch("save_audio", True, self.bus)
with stopwatch:
path = OVOSDinkumVoiceService._save_stt(self, audio_bytes, stt_meta,
save_path)
stt_meta.setdefault('timing', dict())
stt_meta['timing']['save_audio'] = stopwatch.time
return path

def _save_ww(self, audio_bytes, ww_meta, save_path=None):
stopwatch = Stopwatch("save_ww", True, self.bus)
with stopwatch:
path = OVOSDinkumVoiceService._save_ww(self, audio_bytes, ww_meta,
save_path)
ww_meta.setdefault('timing', dict())
ww_meta['timing']['save_ww'] = stopwatch.time
return path

def _validate_message_context(self, message: Message, native_sources=None):
if message.context.get('destination') and \
"audio" not in message.context['destination']:
Expand Down Expand Up @@ -188,6 +221,16 @@ def register_event_handlers(self):
self.bus.on("neon.enable_wake_word", self.handle_enable_wake_word)
self.bus.on("neon.disable_wake_word", self.handle_disable_wake_word)

def _handle_get_languages_stt(self, message):
if self.config.get('listener', {}).get('enable_voice_loop', True):
return OVOSDinkumVoiceService._handle_get_languages_stt(self,
message)
# For server use, get the API STT langs
stt_langs = self.api_stt.available_languages or \
[self.config.get('lang') or 'en-us']
LOG.debug(f"Got stt_langs: {stt_langs}")
self.bus.emit(message.response({'langs': list(stt_langs)}))

def handle_disable_wake_word(self, message: Message):
"""
Disable a wake word. If the requested wake word is the only one enabled,
Expand Down Expand Up @@ -295,10 +338,18 @@ def handle_profile_update(self, message):
:param message: Message associated with profile update
"""
updated_profile = message.data.get("profile")
if updated_profile["user"]["username"] == \
if updated_profile["user"]["username"] != \
self._default_user["user"]["username"]:
apply_local_user_profile_updates(updated_profile,
self._default_user)
LOG.info(f"Ignoring profile update for "
f"{updated_profile['user']['username']}")
return
apply_local_user_profile_updates(updated_profile,
self._default_user)
if updated_profile.get("speech", {}).get("stt_language"):
new_stt_lang = updated_profile["speech"]["stt_language"]
if new_stt_lang != self.config['lang']:
from neon_speech.utils import patch_config
patch_config({"lang": new_stt_lang})

def handle_wake_words_state(self, message):
"""
Expand Down Expand Up @@ -327,31 +378,46 @@ def handle_get_stt(self, message: Message):
Emits a response to the sender with stt data or error data
:param message: Message associated with request
"""
received_time = time()
if message.data.get("audio_data"):
wav_file_path = self._write_encoded_file(
message.data.pop("audio_data"))
else:
wav_file_path = message.data.get("audio_file")
lang = message.data.get("lang")
ident = message.context.get("ident") or "neon.get_stt.response"

message.context.setdefault("timing", dict())
LOG.info(f"Handling STT request: {ident}")
if not wav_file_path:
message.context['timing']['response_sent'] = time()
self.bus.emit(message.reply(
ident, data={"error": f"audio_file not specified!"}))
return

if not os.path.isfile(wav_file_path):
message.context['timing']['response_sent'] = time()
self.bus.emit(message.reply(
ident, data={"error": f"{wav_file_path} Not found!"}))

try:

_, parser_data, transcriptions = \
self._get_stt_from_file(wav_file_path, lang)
timing = parser_data.pop('timing')
message.context["timing"] = {**message.context["timing"], **timing}
sent_time = message.context["timing"].get("client_sent",
received_time)
if received_time != sent_time:
message.context['timing']['client_to_core'] = \
received_time - sent_time
message.context['timing']['response_sent'] = time()
self.bus.emit(message.reply(ident,
data={"parser_data": parser_data,
"transcripts": transcriptions}))
except Exception as e:
LOG.error(e)
message.context['timing']['response_sent'] = time()
self.bus.emit(message.reply(ident, data={"error": repr(e)}))

def handle_audio_input(self, message):
Expand All @@ -370,11 +436,18 @@ def build_context(msg: Message):
'username': self._default_user["user"]["username"] or
"local",
'user_profiles': [self._default_user.content]}
ctx = {**defaults, **ctx, 'destination': ['skills'],
'timing': {'start': msg.data.get('time'),
'transcribed': time()}}
ctx = {**defaults, **ctx, 'destination': ['skills']}
ctx['timing'] = {**ctx.get('timing', {}),
**{'start': msg.data.get('time'),
'transcribed': time()}}
return ctx

received_time = time()
sent_time = message.context.get("timing", {}).get("client_sent",
received_time)
if received_time != sent_time:
message.context['timing']['client_to_core'] = \
received_time - sent_time
ident = message.context.get("ident") or "neon.audio_input.response"
LOG.info(f"Handling audio input: {ident}")
if message.data.get("audio_data"):
Expand All @@ -384,18 +457,23 @@ def build_context(msg: Message):
wav_file_path = message.data.get("audio_file")
lang = message.data.get("lang")
try:
with self._stopwatch:
_, parser_data, transcriptions = \
self._get_stt_from_file(wav_file_path, lang)
# _=transformed audio_data
_, parser_data, transcriptions = \
self._get_stt_from_file(wav_file_path, lang)
timing = parser_data.pop('timing')
message.context["audio_parser_data"] = parser_data
message.context.setdefault('timing', dict())
message.context['timing'] = {**timing, **message.context['timing']}
context = build_context(message)
context['timing']['get_stt'] = self._stopwatch.time
data = {
"utterances": transcriptions,
"lang": message.data.get("lang", "en-us")
}
# Send a new message to the skills module with proper routing ctx
handled = self._emit_utterance_to_skills(Message(
'recognizer_loop:utterance', data, context))

# Reply to original message with transcription/audio parser data
self.bus.emit(message.reply(ident,
data={"parser_data": parser_data,
"transcripts": transcriptions,
Expand Down Expand Up @@ -423,7 +501,7 @@ def handle_offline(self, _):
Handle notification to operate in offline mode
"""
LOG.info("Offline mode selected, Reloading STT Plugin")
config = dict(self.config)
config: Dict[str, dict] = dict(self.config)
if config['stt'].get('offline_module'):
config['stt']['module'] = config['stt'].get('offline_module')
self.voice_loop.stt = STTFactory.create(config)
Expand Down Expand Up @@ -456,35 +534,48 @@ def _get_stt_from_file(self, wav_file: str,
:return: (AudioData of object, extracted context, transcriptions)
"""
from neon_utils.file_utils import get_audio_file_stream
lang = lang or 'en-us' # TODO: read default from config
segment = AudioSegment.from_file(wav_file)
_stopwatch = Stopwatch()
lang = lang or self.config.get('lang')
desired_sample_rate = self.config['listener'].get('sample_rate', 16000)
desired_sample_width = self.config['listener'].get('sample_width', 2)
segment = (AudioSegment.from_file(wav_file).set_channels(1)
.set_frame_rate(desired_sample_rate)
.set_sample_width(desired_sample_width))
LOG.debug(f"Audio fr={segment.frame_rate},sw={segment.sample_width},"
f"fw={segment.frame_width},ch={segment.channels}")
audio_data = AudioData(segment.raw_data, segment.frame_rate,
segment.sample_width)
audio_stream = get_audio_file_stream(wav_file)
if not self.api_stt:
raise RuntimeError("api_stt not initialized."
" is `listener['enable_stt_api'] set to False?")
if hasattr(self.api_stt, 'stream_start'):
if self.lock.acquire(True, 30):
LOG.info(f"Starting STT processing (lang={lang}): {wav_file}")
self.api_stt.stream_start(lang)
while True:
try:
data = audio_stream.read(1024)
self.api_stt.stream_data(data)
except EOFError:
break
transcriptions = self.api_stt.stream_stop()
self.lock.release()
with _stopwatch:
if hasattr(self.api_stt, 'stream_start'):
audio_stream = get_audio_file_stream(wav_file, desired_sample_rate)
if self.lock.acquire(True, 30):
LOG.info(f"Starting STT processing (lang={lang}): {wav_file}")
self.api_stt.stream_start(lang)
while True:
try:
data = audio_stream.read(1024)
self.api_stt.stream_data(data)
except EOFError:
break
transcriptions = self.api_stt.stream_stop()
self.lock.release()
else:
LOG.error(f"Timed out acquiring lock, not processing: {wav_file}")
transcriptions = []
else:
LOG.error(f"Timed out acquiring lock, not processing: {wav_file}")
transcriptions = []
else:
transcriptions = self.api_stt.execute(audio_data, lang)
if isinstance(transcriptions, str):
LOG.warning("Transcriptions is a str, no alternatives provided")
transcriptions = [transcriptions]
audio, audio_context = self.transformers.transform(audio_data)
transcriptions = self.api_stt.execute(audio_data, lang)
if isinstance(transcriptions, str):
LOG.warning("Transcriptions is a str, no alternatives provided")
transcriptions = [transcriptions]

get_stt = float(_stopwatch.time)
with _stopwatch:
audio, audio_context = self.transformers.transform(audio_data)
audio_context["timing"] = {"get_stt": get_stt,
"transform_audio": _stopwatch.time}
LOG.info(f"Transcribed: {transcriptions}")
return audio, audio_context, transcriptions

Expand Down
Loading

0 comments on commit ac55f6a

Please sign in to comment.