Added Speech Recognition with Hugging Face from Node.js to Python #321

Waterberry71 · 2024-12-15T20:46:49Z

Main Contributor: Waterberry71 (Jacob)
Participant: Edumanu82 (Eduardo)

What does this PR do?

This PR adds a new template for "Speech Recognition with Hugging Face" which has python and node.js support now.

The Main Package includes:

My Setup file

* Run "python setup.py" and it will automatically generate a database with a collection 
   and its ids as well as a bucket so you can put audios files for testing .mp3 or .wav
* this file is also utilized to check if a database or bucket and its within items are 
   already existing or not

My AppwriteService file

- Loads environment variables from a .env file
- Appwrite Service Class_ is made in which the Setup file fetches
     - Has the actual functionality to produce Databases and Storage services for interacting
       with local Appwrite server
- Initializes the AppwriteService class with an API key
- Sets up the Appwrite client with endpoint, project ID, and API key
- Defines a method create_Recognition_Entry
     - This creates a new space for results to be stored in the database
     - In this case, this is where the recognized speech text would be after a successful attempt

My Main File

- THE entry point to testing the integration with automatic speech recognition method from 
  Huggingface and my Appwrite local server
     -Requires the mock test provided here to actually execute
- Initializes necessary services and starts the audio processing workflow
- This file is to ensure all necessary configurations and services are set up before processing audio

Added Utility File

 -Ensures that all required keys are present in a given dictionary-like object
 -Prevents runtime errors by validating the presence of necessary fields before proceeding
 -Checks if each key is present in the object
 -Collects any missing keys in a list
 -Basically provides a simple and reusable way to validate input data
       -The main file uses the utils's throw_if_missing function to validate required fields 
        before processing the audio file

Outside of the Main Process

Added .gitignore file to ignore unnecessary files.
Added A README.md to document the usage, configuration, and environment variables needed
Added requirements file
Added docker-compose.yml (generated when local server appwrite runs smoothly with no issues)

Test Plan

Install Dependencies
pip install appwrite
pip install huggingface_hub
pip install python-dotenv
Run Docker for Appwrite on terminal within the parent directory of that template
cd python/speech_recognition_with_huggingface

Source: https://appwrite.io/docs/advanced/self-hosting

docker run -it --rm
--volume /var/run/docker.sock:/var/run/docker.sock
--volume "$(pwd)"/appwrite:/usr/src/code/appwrite:rw
--entrypoint="install"
appwrite/appwrite:1.6.0

For replication purposes, make sure to use Default Recommendations when you see like (port 80, port 443, localhost, etc)

After installation, use port 80 for example to sign up and create an account
Retrieve your project id and key only

Environment Setup
Objective: Ensure all environment variables are correctly set.

Verify .env file contains:
    APPWRITE_ENDPOINT= (Navigate towards settings in your project within the local appwrite server)
    APPWRITE_API_KEY= (Navigate towards settings in your project within the local appwrite server)
    APPWRITE_PROJECT_ID= (Create a project after you signed in to your local server)
    HUGGINGFACE_ACCESS_TOKEN= [ Create your token:
                                Go to https://huggingface.co/docs/hub/en/security-tokens ]
    APPWRITE_DATABASE_ID= (Created when running setup.py)
    APPWRITE_COLLECTION_ID= (Created when running setup.py)
    APPWRITE_BUCKET_ID= (Created when running setup.py)
    APPWRITE_FILE_ID= [ After running setup.py, go to look at local server -> storage -> bucket 
                        -> add file -> upload -> retrieve id ]

Use this request to execute main.py in order to get things running

Mock Test for this template (WIP)

import asyncio
import json

class MockRequest:
def __init__(self, method, body_json, headers):
      self.method = method
      self.body_json = body_json
      self.headers = headers

class MockResponse:
  def json(self, data, status=200):
      data = json.dumps(data, indent=4)
      print(f"Response: {status}, Data: {data}")
      return data
      
req = MockRequest(
  "POST",
  {"fileId": "Enter your APPWRITE_FILE_ID", "bucketId": "speech_recognition"},
  {"x-appwrite-key": "Put your appwrite secret key here"}
)

res = MockResponse()
log = print
error = print

asyncio.run(process_audio(req, res, log, error))

Test Result (WIP):

My Hypothesis:
Perhaps the content of my .wav file is corrupted, make sure the test file you are using is supported
There could also be an issue in File Retrieval

I will continue debugging.

PR related

The structure of the main operation files here were used as reference in Waterberry's object detection with hugging face template pull request.

At least for this and for efficiency, this was possible because they are both using the same template API with a difference of specificity.

Have you read the Contributing Guidelines on issues?

Thoroughly yes.

Resources

Created my cited guide for navigating how to run Appwrite locally
Received feedback from team afterwards:
https://docs.google.com/document/d/1uPj4TdY5sdGFFG8uy-g47OhRXsYx1DoBchHDZu6cM2E/edit?usp=sharing

Waterberry71 added 2 commits December 15, 2024 18:20

Speech recognition with huggingface implemented for Appwrite in python

1af8159

Added gitignore file

878e531

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Speech Recognition with Hugging Face from Node.js to Python #321

Added Speech Recognition with Hugging Face from Node.js to Python #321

Waterberry71 commented Dec 15, 2024 •

edited

Loading

Added Speech Recognition with Hugging Face from Node.js to Python #321

Are you sure you want to change the base?

Added Speech Recognition with Hugging Face from Node.js to Python #321

Conversation

Waterberry71 commented Dec 15, 2024 • edited Loading

What does this PR do?

The Main Package includes:

Outside of the Main Process

Test Plan

PR related

Have you read the Contributing Guidelines on issues?

Resources

Waterberry71 commented Dec 15, 2024 •

edited

Loading