Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(deepgram): missing STT options #295

Merged

Conversation

sachin4840
Copy link
Contributor

Deepgram has lots of feature which are helpful for speech to text

https://developers.deepgram.com/docs/stt-streaming-feature-overview

Below feature are missing in Deepgram STT plugin

Dictation : https://developers.deepgram.com/docs/dictation
Numerals : https://developers.deepgram.com/docs/numerals
Diarization: https://developers.deepgram.com/docs/diarization

#294

Copy link

changeset-bot bot commented Feb 6, 2025

🦋 Changeset detected

Latest commit: b67f573

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@livekit/agents-plugin-deepgram Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@CLAassistant
Copy link

CLAassistant commented Feb 6, 2025

CLA assistant check
All committers have signed the CLA.

@nbsp nbsp changed the title feat(Deepgram STT): Deepgram more STT options feat(deepgram): missing STT options Feb 6, 2025
@nbsp nbsp merged commit c33f797 into livekit:next Feb 6, 2025
5 checks passed
@github-actions github-actions bot mentioned this pull request Feb 25, 2025
@arthberman
Copy link

arthberman commented Mar 14, 2025

@sachin4840 where should we access the speaker when diarize is on true ? my payload look like this
code here
payload:

{
  type: 2,
  alternatives: [
    {
      language: 'en',
      startTime: 17.37,
      endTime: 20.25,
      confidence: 0.9760742,
      text: 'Making stuff more people can participate in design process'
    }
  ]
}

thks!

@sachin4840
Copy link
Contributor Author

@arthberman i think deepgram will recognize speaker changes. Each word in the transcript will be assigned a speaker number starting at 0

Docs: https://developers.deepgram.com/reference/speech-to-text-api/listen-streaming#request.query.diarize

@arthberman
Copy link

arthberman commented Mar 15, 2025

@sachin4840 yeah by using the deepgram sdk directly I have:

{
  transcript: 'How are you today?',
  confidence: 1,
  words: [
    {
      word: 'how',
      start: 0,
      end: 0.32,
      confidence: 1,
      speaker: 0,
      punctuated_word: 'How'
    },
    {
      word: 'are',
      start: 0.32,
      end: 0.48,
      confidence: 1,
      speaker: 0,
      punctuated_word: 'are'
    },
    {
      word: 'you',
      start: 0.48,
      end: 0.56,
      confidence: 1,
      speaker: 0,
      punctuated_word: 'you'
    },
    {
      word: 'today',
      start: 0.56,
      end: 0.79999995,
      confidence: 0.99853516,
      speaker: 0,
      punctuated_word: 'today?'
    }
  ]
}

i'm not able to get the speaker through the plugin. will open a pr on this

@sachin4840
Copy link
Contributor Author

Oh Okay
Got the issue
will try this and update here

@arthberman
Copy link

check #331

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants