4.3.0 (#187)

NeonGeckoCom · Dec 18, 2023 · ac55f6a · ac55f6a
2 parents df95146 + b9c3c64
commit ac55f6a
Show file tree

Hide file tree

Showing 7 changed files with 291 additions and 104 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,60 +1,60 @@
 # Changelog
 
-## [4.2.0](https://github.com/NeonGeckoCom/neon_speech/tree/4.2.0) (2023-10-27)
+## [4.2.1a7](https://github.com/NeonGeckoCom/neon_speech/tree/4.2.1a7) (2023-12-13)
 
-[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.1.1a6...4.2.0)
+[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.2.1a6...4.2.1a7)
 
-**Fixed bugs:**
+**Merged pull requests:**
 
-- \[BUG\] Docker `start_listening` resource missing [\#170](https://github.com/NeonGeckoCom/neon_speech/issues/170)
+- Update neon-utils dependency to stable release [\#186](https://github.com/NeonGeckoCom/neon_speech/pull/186) ([NeonDaniel](https://github.com/NeonDaniel))
 
-## [4.1.1a6](https://github.com/NeonGeckoCom/neon_speech/tree/4.1.1a6) (2023-10-26)
+## [4.2.1a6](https://github.com/NeonGeckoCom/neon_speech/tree/4.2.1a6) (2023-11-29)
 
-[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.1.1a5...4.1.1a6)
+[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.2.1a5...4.2.1a6)
 
 **Merged pull requests:**
 
-- OVOS Dinkum Listener Backwards Compat [\#178](https://github.com/NeonGeckoCom/neon_speech/pull/178) ([NeonDaniel](https://github.com/NeonDaniel))
+- Override ovos.language.stt handler for server/API usage [\#185](https://github.com/NeonGeckoCom/neon_speech/pull/185) ([NeonDaniel](https://github.com/NeonDaniel))
 
-## [4.1.1a5](https://github.com/NeonGeckoCom/neon_speech/tree/4.1.1a5) (2023-10-26)
+## [4.2.1a5](https://github.com/NeonGeckoCom/neon_speech/tree/4.2.1a5) (2023-11-22)
 
-[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.1.1a4...4.1.1a5)
+[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.2.1a4...4.2.1a5)
 
 **Merged pull requests:**
 
-- Stable dependencies for release [\#177](https://github.com/NeonGeckoCom/neon_speech/pull/177) ([NeonDaniel](https://github.com/NeonDaniel))
+- Update global config on local user STT language change [\#184](https://github.com/NeonGeckoCom/neon_speech/pull/184) ([NeonDaniel](https://github.com/NeonDaniel))
 
-## [4.1.1a4](https://github.com/NeonGeckoCom/neon_speech/tree/4.1.1a4) (2023-10-13)
+## [4.2.1a4](https://github.com/NeonGeckoCom/neon_speech/tree/4.2.1a4) (2023-11-22)
 
-[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.1.1a3...4.1.1a4)
+[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.2.1a3...4.2.1a4)
 
 **Merged pull requests:**
 
-- Update Dinkum Listener dependency [\#176](https://github.com/NeonGeckoCom/neon_speech/pull/176) ([NeonDaniel](https://github.com/NeonDaniel))
+- Add timing metrics [\#183](https://github.com/NeonGeckoCom/neon_speech/pull/183) ([NeonDaniel](https://github.com/NeonDaniel))
 
-## [4.1.1a3](https://github.com/NeonGeckoCom/neon_speech/tree/4.1.1a3) (2023-10-03)
+## [4.2.1a3](https://github.com/NeonGeckoCom/neon_speech/tree/4.2.1a3) (2023-11-14)
 
-[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.1.1a2...4.1.1a3)
+[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.2.1a2...4.2.1a3)
 
 **Merged pull requests:**
 
-- Add timing metrics for minerva testing [\#175](https://github.com/NeonGeckoCom/neon_speech/pull/175) ([NeonDaniel](https://github.com/NeonDaniel))
+- Improved timing context handling with unit tests [\#182](https://github.com/NeonGeckoCom/neon_speech/pull/182) ([NeonDaniel](https://github.com/NeonDaniel))
 
-## [4.1.1a2](https://github.com/NeonGeckoCom/neon_speech/tree/4.1.1a2) (2023-07-28)
+## [4.2.1a2](https://github.com/NeonGeckoCom/neon_speech/tree/4.2.1a2) (2023-11-10)
 
-[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.1.1a1...4.1.1a2)
+[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.2.1a1...4.2.1a2)
 
 **Merged pull requests:**
 
-- Kubernetes/No-audio server compat. [\#174](https://github.com/NeonGeckoCom/neon_speech/pull/174) ([NeonDaniel](https://github.com/NeonDaniel))
+- Add timing metrics for audio input to handler in speech service [\#181](https://github.com/NeonGeckoCom/neon_speech/pull/181) ([NeonDaniel](https://github.com/NeonDaniel))
 
-## [4.1.1a1](https://github.com/NeonGeckoCom/neon_speech/tree/4.1.1a1) (2023-07-27)
+## [4.2.1a1](https://github.com/NeonGeckoCom/neon_speech/tree/4.2.1a1) (2023-11-09)
 
-[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.1.0...4.1.1a1)
+[Full Changelog](https://github.com/NeonGeckoCom/neon_speech/compare/4.2.0...4.2.1a1)
 
 **Merged pull requests:**
 
-- Update container config handling and resolve logged warnings [\#173](https://github.com/NeonGeckoCom/neon_speech/pull/173) ([NeonDaniel](https://github.com/NeonDaniel))
+- Resample API input wav audio to ensure format matches listener config [\#180](https://github.com/NeonGeckoCom/neon_speech/pull/180) ([NeonDaniel](https://github.com/NeonDaniel))
 
 
 

diff --git a/neon_speech/__init__.py b/neon_speech/__init__.py
@@ -25,3 +25,6 @@
 # LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
 # NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 # SOFTWARE,  EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+# Import to ensure patched class is applied
+from neon_speech.transformers import NeonAudioTransformerService
diff --git a/neon_speech/service.py b/neon_speech/service.py
@@ -27,6 +27,8 @@
 # SOFTWARE,  EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 
 import os
+from typing import Dict
+
 import ovos_dinkum_listener.plugins
 
 from tempfile import mkstemp
@@ -80,8 +82,6 @@ def on_started():
 
 
 class NeonSpeechClient(OVOSDinkumVoiceService):
-    _stopwatch = Stopwatch("get_stt")
-
     def __init__(self, ready_hook=on_ready, error_hook=on_error,
                  stopping_hook=on_stopping, alive_hook=on_alive,
                  started_hook=on_started, watchdog=lambda: None,
@@ -112,6 +112,8 @@ def __init__(self, ready_hook=on_ready, error_hook=on_error,
                                         watchdog=watchdog)
         self.daemon = daemonic
         self.config.bus = self.bus
+        self._stt_stopwatch = Stopwatch("get_stt", allow_reporting=True,
+                                        bus=self.bus)
         from neon_utils.signal_utils import init_signal_handlers, \
             init_signal_bus
         init_signal_bus(self.bus)
@@ -133,6 +135,37 @@ def __init__(self, ready_hook=on_ready, error_hook=on_error,
             LOG.info("Skipping api_stt init")
             self.api_stt = None
 
+    def _record_begin(self):
+        self._stt_stopwatch.start()
+        OVOSDinkumVoiceService._record_begin(self)
+
+    def _stt_text(self, text: str, stt_context: dict):
+        self._stt_stopwatch.stop()
+        stt_context.setdefault("timing", dict())
+        stt_context["timing"]["get_stt"] = self._stt_stopwatch.time
+
+        # This is where the first Message of the interaction is created
+        OVOSDinkumVoiceService._stt_text(self, text, stt_context)
+        self._stt_stopwatch.report()
+
+    def _save_stt(self, audio_bytes, stt_meta, save_path=None):
+        stopwatch = Stopwatch("save_audio", True, self.bus)
+        with stopwatch:
+            path = OVOSDinkumVoiceService._save_stt(self, audio_bytes, stt_meta,
+                                                    save_path)
+        stt_meta.setdefault('timing', dict())
+        stt_meta['timing']['save_audio'] = stopwatch.time
+        return path
+
+    def _save_ww(self, audio_bytes, ww_meta, save_path=None):
+        stopwatch = Stopwatch("save_ww", True, self.bus)
+        with stopwatch:
+            path = OVOSDinkumVoiceService._save_ww(self, audio_bytes, ww_meta,
+                                                   save_path)
+        ww_meta.setdefault('timing', dict())
+        ww_meta['timing']['save_ww'] = stopwatch.time
+        return path
+
     def _validate_message_context(self, message: Message, native_sources=None):
         if message.context.get('destination') and \
                 "audio" not in message.context['destination']:
@@ -188,6 +221,16 @@ def register_event_handlers(self):
         self.bus.on("neon.enable_wake_word", self.handle_enable_wake_word)
         self.bus.on("neon.disable_wake_word", self.handle_disable_wake_word)
 
+    def _handle_get_languages_stt(self, message):
+        if self.config.get('listener', {}).get('enable_voice_loop', True):
+            return OVOSDinkumVoiceService._handle_get_languages_stt(self,
+                                                                    message)
+        # For server use, get the API STT langs
+        stt_langs = self.api_stt.available_languages or \
+            [self.config.get('lang') or 'en-us']
+        LOG.debug(f"Got stt_langs: {stt_langs}")
+        self.bus.emit(message.response({'langs': list(stt_langs)}))
+
     def handle_disable_wake_word(self, message: Message):
         """
         Disable a wake word. If the requested wake word is the only one enabled,
@@ -295,10 +338,18 @@ def handle_profile_update(self, message):
         :param message: Message associated with profile update
         """
         updated_profile = message.data.get("profile")
-        if updated_profile["user"]["username"] == \
+        if updated_profile["user"]["username"] != \
                 self._default_user["user"]["username"]:
-            apply_local_user_profile_updates(updated_profile,
-                                             self._default_user)
+            LOG.info(f"Ignoring profile update for "
+                     f"{updated_profile['user']['username']}")
+            return
+        apply_local_user_profile_updates(updated_profile,
+                                         self._default_user)
+        if updated_profile.get("speech", {}).get("stt_language"):
+            new_stt_lang = updated_profile["speech"]["stt_language"]
+            if new_stt_lang != self.config['lang']:
+                from neon_speech.utils import patch_config
+                patch_config({"lang": new_stt_lang})
 
     def handle_wake_words_state(self, message):
         """
@@ -327,31 +378,46 @@ def handle_get_stt(self, message: Message):
         Emits a response to the sender with stt data or error data
         :param message: Message associated with request
         """
+        received_time = time()
         if message.data.get("audio_data"):
             wav_file_path = self._write_encoded_file(
                 message.data.pop("audio_data"))
         else:
             wav_file_path = message.data.get("audio_file")
         lang = message.data.get("lang")
         ident = message.context.get("ident") or "neon.get_stt.response"
+
+        message.context.setdefault("timing", dict())
         LOG.info(f"Handling STT request: {ident}")
         if not wav_file_path:
+            message.context['timing']['response_sent'] = time()
             self.bus.emit(message.reply(
                 ident, data={"error": f"audio_file not specified!"}))
             return
 
         if not os.path.isfile(wav_file_path):
+            message.context['timing']['response_sent'] = time()
             self.bus.emit(message.reply(
                 ident, data={"error": f"{wav_file_path} Not found!"}))
 
         try:
+
             _, parser_data, transcriptions = \
                 self._get_stt_from_file(wav_file_path, lang)
+            timing = parser_data.pop('timing')
+            message.context["timing"] = {**message.context["timing"], **timing}
+            sent_time = message.context["timing"].get("client_sent",
+                                                      received_time)
+            if received_time != sent_time:
+                message.context['timing']['client_to_core'] = \
+                    received_time - sent_time
+            message.context['timing']['response_sent'] = time()
             self.bus.emit(message.reply(ident,
                                         data={"parser_data": parser_data,
                                               "transcripts": transcriptions}))
         except Exception as e:
             LOG.error(e)
+            message.context['timing']['response_sent'] = time()
             self.bus.emit(message.reply(ident, data={"error": repr(e)}))
 
     def handle_audio_input(self, message):
@@ -370,11 +436,18 @@ def build_context(msg: Message):
                         'username': self._default_user["user"]["username"] or
                         "local",
                         'user_profiles': [self._default_user.content]}
-            ctx = {**defaults, **ctx, 'destination': ['skills'],
-                   'timing': {'start': msg.data.get('time'),
-                              'transcribed': time()}}
+            ctx = {**defaults, **ctx, 'destination': ['skills']}
+            ctx['timing'] = {**ctx.get('timing', {}),
+                             **{'start': msg.data.get('time'),
+                                'transcribed': time()}}
             return ctx
 
+        received_time = time()
+        sent_time = message.context.get("timing", {}).get("client_sent",
+                                                          received_time)
+        if received_time != sent_time:
+            message.context['timing']['client_to_core'] = \
+                received_time - sent_time
         ident = message.context.get("ident") or "neon.audio_input.response"
         LOG.info(f"Handling audio input: {ident}")
         if message.data.get("audio_data"):
@@ -384,18 +457,23 @@ def build_context(msg: Message):
             wav_file_path = message.data.get("audio_file")
         lang = message.data.get("lang")
         try:
-            with self._stopwatch:
-                _, parser_data, transcriptions = \
-                    self._get_stt_from_file(wav_file_path, lang)
+            # _=transformed audio_data
+            _, parser_data, transcriptions = \
+                self._get_stt_from_file(wav_file_path, lang)
+            timing = parser_data.pop('timing')
             message.context["audio_parser_data"] = parser_data
+            message.context.setdefault('timing', dict())
+            message.context['timing'] = {**timing, **message.context['timing']}
             context = build_context(message)
-            context['timing']['get_stt'] = self._stopwatch.time
             data = {
                 "utterances": transcriptions,
                 "lang": message.data.get("lang", "en-us")
             }
+            # Send a new message to the skills module with proper routing ctx
             handled = self._emit_utterance_to_skills(Message(
                 'recognizer_loop:utterance', data, context))
+
+            # Reply to original message with transcription/audio parser data
             self.bus.emit(message.reply(ident,
                                         data={"parser_data": parser_data,
                                               "transcripts": transcriptions,
@@ -423,7 +501,7 @@ def handle_offline(self, _):
         Handle notification to operate in offline mode
         """
         LOG.info("Offline mode selected, Reloading STT Plugin")
-        config = dict(self.config)
+        config: Dict[str, dict] = dict(self.config)
         if config['stt'].get('offline_module'):
             config['stt']['module'] = config['stt'].get('offline_module')
             self.voice_loop.stt = STTFactory.create(config)
@@ -456,35 +534,48 @@ def _get_stt_from_file(self, wav_file: str,
         :return: (AudioData of object, extracted context, transcriptions)
         """
         from neon_utils.file_utils import get_audio_file_stream
-        lang = lang or 'en-us'  # TODO: read default from config
-        segment = AudioSegment.from_file(wav_file)
+        _stopwatch = Stopwatch()
+        lang = lang or self.config.get('lang')
+        desired_sample_rate = self.config['listener'].get('sample_rate', 16000)
+        desired_sample_width = self.config['listener'].get('sample_width', 2)
+        segment = (AudioSegment.from_file(wav_file).set_channels(1)
+                   .set_frame_rate(desired_sample_rate)
+                   .set_sample_width(desired_sample_width))
+        LOG.debug(f"Audio fr={segment.frame_rate},sw={segment.sample_width},"
+                  f"fw={segment.frame_width},ch={segment.channels}")
         audio_data = AudioData(segment.raw_data, segment.frame_rate,
                                segment.sample_width)
-        audio_stream = get_audio_file_stream(wav_file)
         if not self.api_stt:
             raise RuntimeError("api_stt not initialized."
                                " is `listener['enable_stt_api'] set to False?")
-        if hasattr(self.api_stt, 'stream_start'):
-            if self.lock.acquire(True, 30):
-                LOG.info(f"Starting STT processing (lang={lang}): {wav_file}")
-                self.api_stt.stream_start(lang)
-                while True:
-                    try:
-                        data = audio_stream.read(1024)
-                        self.api_stt.stream_data(data)
-                    except EOFError:
-                        break
-                transcriptions = self.api_stt.stream_stop()
-                self.lock.release()
+        with _stopwatch:
+            if hasattr(self.api_stt, 'stream_start'):
+                audio_stream = get_audio_file_stream(wav_file, desired_sample_rate)
+                if self.lock.acquire(True, 30):
+                    LOG.info(f"Starting STT processing (lang={lang}): {wav_file}")
+                    self.api_stt.stream_start(lang)
+                    while True:
+                        try:
+                            data = audio_stream.read(1024)
+                            self.api_stt.stream_data(data)
+                        except EOFError:
+                            break
+                    transcriptions = self.api_stt.stream_stop()
+                    self.lock.release()
+                else:
+                    LOG.error(f"Timed out acquiring lock, not processing: {wav_file}")
+                    transcriptions = []
             else:
-                LOG.error(f"Timed out acquiring lock, not processing: {wav_file}")
-                transcriptions = []
-        else:
-            transcriptions = self.api_stt.execute(audio_data, lang)
-        if isinstance(transcriptions, str):
-            LOG.warning("Transcriptions is a str, no alternatives provided")
-            transcriptions = [transcriptions]
-        audio, audio_context = self.transformers.transform(audio_data)
+                transcriptions = self.api_stt.execute(audio_data, lang)
+            if isinstance(transcriptions, str):
+                LOG.warning("Transcriptions is a str, no alternatives provided")
+                transcriptions = [transcriptions]
+
+        get_stt = float(_stopwatch.time)
+        with _stopwatch:
+            audio, audio_context = self.transformers.transform(audio_data)
+        audio_context["timing"] = {"get_stt": get_stt,
+                                   "transform_audio": _stopwatch.time}
         LOG.info(f"Transcribed: {transcriptions}")
         return audio, audio_context, transcriptions