Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat/lang_detection_plugin #19

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from
Open

Conversation

JarbasAl
Copy link
Member

@JarbasAl JarbasAl commented Apr 20, 2024

  • remove unfinished streaming stt implementation
  • put the multi model loader behind a --multi flag, assume plugins are multilingual unless told otherwise
  • add lang detection plugin
  • lang detection endpoint
  • if lang == "auto" in the stt request, detection is performed before STT

needs OpenVoiceOS/ovos-plugin-manager#220

Summary by CodeRabbit

  • New Features

    • Introduced new command-line options for ovos-stt-server, including --lang-engine for audio language detection and --multi for multi-language plugin instances.
    • Added options in Gradio web UI for custom title, description, info, and badge.
    • New /lang_detect endpoint for language detection.
  • Improvements

    • Enhanced clarity of existing command-line options.
    • Simplified Gradio web UI related flags.
  • Bug Fixes

    • Corrected file references to ensure workflows and setup scripts operate correctly.

@JarbasAl
Copy link
Member Author

@coderabbitai review

Copy link

coderabbitai bot commented Jul 16, 2024

Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

coderabbitai bot commented Jul 16, 2024

Walkthrough

The recent updates enhance the ovos-stt-server tool by refining command line options, improving language detection features, and ensuring consistency in file references. Key additions include options for specifying language detection plugins and multi-language support, along with a /lang_detect endpoint for language detection. Dependencies have also been updated to support these new features.

Changes

File(s) Change Summary
.github/workflows/publish_alpha.yml, setup.py Updated file references to ensure consistency with the README.md file.
README.md Reorganized and updated command line options for ovos-stt-server, introducing new options for language detection and multi-language support.
ovos_stt_http_server/__init__.py, ovos_stt_http_server/... Added language detection features, updated model container logic to handle language plugins, and enhanced the create_app function to support new command line options.
requirements/requirements.txt Updated version specifications for ovos-plugin-manager and ovos-utils, and added flask to the requirements.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI
    participant STTServer
    participant LangPlugin
    participant MultiModelContainer

    User->>CLI: Run ovos-stt-server with lang detection
    CLI->>STTServer: Start server with --lang-engine and --multi options
    STTServer->>LangPlugin: Load language detection plugin
    STTServer->>MultiModelContainer: Initialize with multi-language support
    User->>STTServer: Send audio for processing
    STTServer->>LangPlugin: Detect language
    LangPlugin->>STTServer: Return detected language
    STTServer->>MultiModelContainer: Process audio with detected language model
    MultiModelContainer->>User: Return transcription result
Loading

Poem

In the realm of code, a server blooms, 🌸
Speaking languages, banishing gloom. 🌍
Options refined, plugins anew,
Multi-language magic, just for you. ✨
With Flask in tow, dependencies align,
OVOS STT shines, a marvel divine. 🚀

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Outside diff range, codebase verification and nitpick comments (2)
requirements/requirements.txt (1)

5-6: Addition of flask to requirements.

flask has been added without a version specification. It's crucial to specify version constraints to avoid potential future compatibility issues.

- flask
+ flask~=2.0
README.md (1)

52-52: Grammatical correction needed in example usage.

The abbreviation "e.g." should be used with periods as indicated by LanguageTool.

- eg `ovos-stt-server --engine ovos-stt-plugin-fasterwhisper --lang-engine ovos-audio-transformer-plugin-fasterwhisper`
+ e.g., `ovos-stt-server --engine ovos-stt-plugin-fasterwhisper --lang-engine ovos-audio-transformer-plugin-fasterwhisper`
Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between ad5743a and 84505cb.

Files selected for processing (6)
  • .github/workflows/publish_alpha.yml (1 hunks)
  • README.md (1 hunks)
  • ovos_stt_http_server/init.py (3 hunks)
  • ovos_stt_http_server/main.py (2 hunks)
  • requirements/requirements.txt (1 hunks)
  • setup.py (1 hunks)
Files skipped from review due to trivial changes (2)
  • .github/workflows/publish_alpha.yml
  • setup.py
Additional context used
LanguageTool
README.md

[uncategorized] ~51-~51: The abbreviation “e.g.” (= for example) requires two periods.
Context: ... Note: ffmpeg is required for Gradio eg `ovos-stt-server --engine ovos-stt-plug...

(E_G)

Additional comments not posted (5)
requirements/requirements.txt (1)

1-1: Version specification for ovos-plugin-manager updated.

The version change from ~=0.0.18 to ~=0.0, >=0.0.26a15 broadens the acceptable version range and specifies a minimum version. This should be verified to ensure compatibility with the new features introduced in the PR.

README.md (1)

33-48: Updated command line options for ovos-stt-server.

The changes to the command line options, including the addition of --lang-engine and --multi, align with the PR's objectives to enhance language detection capabilities. Ensure that these changes are reflected in the user documentation and help outputs.

Verification successful

Updated command line options for ovos-stt-server have been successfully documented.

The new command line options --lang-engine and --multi are present in the README.md file, ensuring that the documentation is consistent with the changes.

  • README.md: Lines mentioning --lang-engine and --multi.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that the new command line options are documented.

# Test: Search for mentions of the new options in the documentation.
rg --type markdown $'--lang-engine'

Length of output: 450


Script:

#!/bin/bash
# Description: Verify that the new command line options are documented.

# Corrected Test: Search for mentions of the new options in the documentation.
rg --type markdown '--lang-engine'
rg --type markdown '--multi'

Length of output: 916


Script:

#!/bin/bash
# Description: Verify that the new command line options are documented.

# Corrected Test: Search for mentions of the new options in the documentation.
rg --type markdown -- '--lang-engine'
rg --type markdown -- '--multi'

Length of output: 317

ovos_stt_http_server/__main__.py (2)

24-30: Addition of --lang-engine and --multi arguments.

The new arguments are added correctly and are marked as required for --lang-engine and optional for --multi. This aligns with the PR's objectives to support multilingual plugins and language detection.


44-46: Modification in start_stt_server function call.

The function now includes lang_engine and multi arguments, which is necessary for the new features. Verify that these changes are compatible with the rest of the application, especially in how the start_stt_server function is defined and used elsewhere.

Verification successful

Modification in start_stt_server function call verified.

The start_stt_server function definition in ovos_stt_http_server/__init__.py includes the lang_engine and multi arguments, and the only other call to this function in ovos_stt_http_server/__main__.py correctly handles these arguments.

  • ovos_stt_http_server/__init__.py:

    def start_stt_server(engine: str,
                         lang_engine: str = None,
                         multi: bool = False,
                         has_gradio: bool = False) -> (FastAPI, ModelContainer):
        app, engine = create_app(engine, lang_engine, multi, has_gradio)
        return app, engine
  • ovos_stt_http_server/__main__.py:

    server, engine = start_stt_server(args.engine, lang_engine=args.lang_engine,
                                      multi=bool(args.multi),
                                      has_gradio=bool(args.gradio))
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of `start_stt_server` with new arguments.

# Test: Search for the function definition and other calls to ensure they handle the new arguments.
rg --type python $'start_stt_server'

Length of output: 66


Script:

#!/bin/bash
# Description: Verify the usage of `start_stt_server` with new arguments.

# Search for the function definition of `start_stt_server`.
ast-grep --lang python --pattern $'def start_stt_server($_, $_, $_, $_):\n    $$$'

# Search for other calls to `start_stt_server` in the codebase.
rg 'start_stt_server'

Length of output: 368


Script:

#!/bin/bash
# Description: Verify the usage of `start_stt_server` with new arguments.

# Search for the full function definition of `start_stt_server` in `ovos_stt_http_server/__init__.py`.
rg 'def start_stt_server' -A 10 ovos_stt_http_server/__init__.py

# Search for other calls to `start_stt_server` in the codebase.
rg 'start_stt_server' --context 5

Length of output: 2875

ovos_stt_http_server/__init__.py (1)

113-117: New /lang_detect endpoint added.

The endpoint for language detection has been implemented correctly. Ensure that this endpoint is included in the API documentation and that its functionality is covered by integration tests.

requirements/requirements.txt Show resolved Hide resolved
Comment on lines +26 to +43
def __init__(self, plugin: str, lang_plugin: str = None, config: dict = None):
plugin = load_stt_plugin(plugin)
self.lang_plugin = None
if not plugin:
raise ValueError(f"Failed to load STT: {plugin}")
if lang_plugin:
lang_plugin = load_audio_transformer_plugin(lang_plugin)
if not lang_plugin:
raise ValueError(f"Failed to load lang detection plugin: {plugin}")
assert issubclass(lang_plugin, AudioLanguageDetector)
self.lang_plugin = lang_plugin()
self.engine = plugin(config)

def process_audio(self, audio: AudioData, lang: str):
if lang == "auto":
lang, prob = self.lang_plugin.detect(audio)
return self.engine.execute(audio, language=lang) or ""

Copy link

@coderabbitai coderabbitai bot Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modifications in ModelContainer and MultiModelContainer.

The constructors have been modified to load language plugins and handle configurations. The process_audio methods have been updated to use language detection when lang is set to "auto". These changes are crucial for the new language detection feature and should be thoroughly tested.

Would you like me to help with writing unit tests for these changes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JarbasAl, here are the unit tests for the changes in ModelContainer and MultiModelContainer:

import unittest
from unittest.mock import patch, MagicMock
from speech_recognition import AudioData
from ovos_plugin_manager.stt import load_stt_plugin
from ovos_plugin_manager.audio_transformers import load_audio_transformer_plugin, AudioLanguageDetector

class TestModelContainer(unittest.TestCase):
    def setUp(self):
        self.audio_data = AudioData(b'\x00\x01', 16000, 2)

    @patch('ovos_plugin_manager.stt.load_stt_plugin')
    @patch('ovos_plugin_manager.audio_transformers.load_audio_transformer_plugin')
    def test_init(self, mock_load_audio, mock_load_stt):
        mock_stt_plugin = MagicMock()
        mock_lang_plugin = MagicMock()
        mock_load_stt.return_value = mock_stt_plugin
        mock_load_audio.return_value = mock_lang_plugin

        # Test successful initialization
        container = ModelContainer('dummy_stt', 'dummy_lang')
        self.assertIsNotNone(container.engine)
        self.assertIsNotNone(container.lang_plugin)

        # Test initialization failure
        mock_load_stt.return_value = None
        with self.assertRaises(ValueError):
            ModelContainer('dummy_stt', 'dummy_lang')

    @patch('ovos_plugin_manager.audio_transformers.load_audio_transformer_plugin')
    def test_process_audio_auto_lang(self, mock_load_audio):
        mock_lang_plugin = MagicMock()
        mock_lang_plugin.detect.return_value = ('en', 0.99)
        mock_load_audio.return_value = mock_lang_plugin
        container = ModelContainer('dummy_stt', 'dummy_lang')
        container.lang_plugin = mock_lang_plugin
        container.engine = MagicMock()
        container.engine.execute.return_value = "processed audio"

        result = container.process_audio(self.audio_data, 'auto')
        self.assertEqual(result, "processed audio")
        mock_lang_plugin.detect.assert_called_once_with(self.audio_data)

class TestMultiModelContainer(unittest.TestCase):
    def setUp(self):
        self.audio_data = AudioData(b'\x00\x01', 16000, 2)

    @patch('ovos_plugin_manager.stt.load_stt_plugin')
    @patch('ovos_plugin_manager.audio_transformers.load_audio_transformer_plugin')
    def test_init(self, mock_load_audio, mock_load_stt):
        mock_stt_plugin = MagicMock()
        mock_lang_plugin = MagicMock()
        mock_load_stt.return_value = mock_stt_plugin
        mock_load_audio.return_value = mock_lang_plugin

        # Test successful initialization
        container = MultiModelContainer('dummy_stt', 'dummy_lang')
        self.assertIsNotNone(container.plugin_class)
        self.assertIsNotNone(container.lang_plugin)

        # Test initialization failure
        mock_load_stt.return_value = None
        with self.assertRaises(ValueError):
            MultiModelContainer('dummy_stt', 'dummy_lang')

    @patch('ovos_plugin_manager.audio_transformers.load_audio_transformer_plugin')
    def test_process_audio_auto_lang(self, mock_load_audio):
        mock_lang_plugin = MagicMock()
        mock_lang_plugin.detect.return_value = ('en', 0.99)
        mock_load_audio.return_value = mock_lang_plugin
        container = MultiModelContainer('dummy_stt', 'dummy_lang')
        container.lang_plugin = mock_lang_plugin
        container.engines = {'en': MagicMock()}
        container.engines['en'].execute.return_value = "processed audio"

        result = container.process_audio(self.audio_data, 'auto')
        self.assertEqual(result, "processed audio")
        mock_lang_plugin.detect.assert_called_once_with(self.audio_data.get_wav_data())

if __name__ == '__main__':
    unittest.main()

These tests cover the initialization and processing logic in both ModelContainer and MultiModelContainer, ensuring that the language detection and STT processing work as expected when the language is set to "auto".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant