A Python client library for interacting with Bhashini services, including Automatic Speech Recognition (ASR), Neural Machine Translation (NMT), and Text-to-Speech (TTS) for 13 indian languages.
- Test the API online here and create a free API key and USER ID from profile.
- For Detailed API DOCS click here
- ASR (Automatic Speech Recognition): Convert speech audio to text.
- Supported Formats:
.wav
,.mp3
,.flac
,.ogg
- Sampling Rates:
8000 Hz
,16000 Hz
,48000 Hz
- Supported Formats:
- Translation: Translate text between supported language pairs.
- TTS (Text-to-Speech): Convert text to speech audio with gender options.
- Python 3.x
requests
library
Install the required Python package:
pip install requests
Copy the BhashiniClient
class into your project or save it as a module (e.g., bhashini_client.py
).
Initialize the BhashiniClient
with your user credentials and pipeline ID.
from bhashini_client import BhashiniClient
USER_ID = 'your_user_id'
API_KEY = 'your_ulca_api_key'
PIPELINE_ID = 'your_pipeline_id' # Optional, defaults to predefined PIPELINE_ID
client = BhashiniClient(USER_ID, API_KEY, PIPELINE_ID)
Get free api key and user id from here
# Initialize the client
client = BhashiniClient(USER_ID, API_KEY)
# Perform ASR
with open('speech.wav', 'rb') as f:
audio_content = f.read()
asr_result = client.asr(
audio_content,
source_language='hi',
audio_format='wav', # Supported formats: 'wav', 'mp3', 'flac', 'ogg'
sampling_rate=16000 # Supported rates: 8000, 16000, 48000
)
print("Transcribed Text:", asr_result['pipelineResponse'][0]['output'][0]['source'])
# Initialize the client
client = BhashiniClient(USER_ID, API_KEY)
# Translate text
text = 'मेरा नाम विहिर है।'
translation_result = client.translate(text, source_language='hi', target_language='en')
print("Translated Text:", translation_result['pipelineResponse'][0]['output'][0]['target'])
# Initialize the client
client = BhashiniClient(USER_ID, API_KEY)
# Convert text to speech
text = 'नमस्ते दुनिया'
tts_result = client.tts(
text,
source_language='hi',
gender='female',
sampling_rate=8000 # Supported rates: 8000, 16000, 48000
)
# Save the audio file
audio_base64 = tts_result['pipelineResponse'][0]['audio'][0]['audioContent']
audio_data = base64.b64decode(audio_base64)
with open('output.wav', 'wb') as f:
f.write(audio_data)
Lists the available languages for a specified task.
-
Parameters:
task_type
(str): The task type ('asr'
,'translation'
, or'tts'
).
-
Returns:
list
ordict
: A list of languages or a dictionary of language pairs for translation.
Usage Example:
# FOR TTS
tts_languages = client.list_available_languages('tts')
print("TTS Languages:", tts_languages)
# For ASR
asr_languages = client.list_available_languages('asr')
print("ASR Languages:", asr_languages)
# For Translation
translation_languages = client.list_available_languages('translation')
print("Translation Languages:", translation_languages)
TTS Languages: ['en', 'as', 'brx', 'gu', 'hi', 'kn', 'ml', 'mni', 'mr', 'or', 'pa', 'ta', 'te', 'bn']
ASR Languages: ['bn', 'en', 'gu', 'hi', 'kn', 'ml', 'mr', 'or', 'pa', 'sa', 'ta', 'te', 'ur']
Translation Languages: {'bn': ['en', 'as', 'brx', 'gu', 'hi', 'kn', 'ml', 'mni', 'mr', 'or', 'pa', 'ta', 'te'], 'en': ['as', 'bn', 'brx', 'gu', 'hi', 'kn', 'ml', 'mni', 'mr', 'or', 'pa', 'ta', 'te'], 'gu': ['en', 'as', 'bn', 'brx', 'hi', 'kn', 'ml', 'mni', 'mr', 'or', 'pa', 'ta', 'te'], 'hi': ['en', 'as', 'bn', 'brx', 'gu', 'kn', 'ml', 'mni', 'mr', 'or', 'pa', 'ta', 'te'], 'kn': ['en', 'as', 'bn', 'brx', 'gu', 'hi', 'ml', 'mni', 'mr', 'or', 'pa', 'ta', 'te'], 'ml': ['en', 'as', 'bn', 'brx', 'gu', 'hi', 'kn', 'mni', 'mr', 'or', 'pa', 'ta', 'te'], 'mr': ['en', 'as', 'bn', 'brx', 'gu', 'hi', 'kn', 'ml', 'mni', 'or', 'pa', 'ta', 'te'], 'or': ['en', 'as', 'bn', 'brx', 'gu', 'hi', 'kn', 'ml', 'mni', 'mr', 'pa', 'ta', 'te'], 'pa': ['en', 'as', 'bn', 'brx', 'gu', 'hi', 'kn', 'ml', 'mni', 'mr', 'or', 'ta', 'te'], 'sa': ['en', 'as', 'bn', 'brx', 'gu', 'hi', 'kn', 'ml', 'mni', 'mr', 'or', 'pa', 'ta', 'te'], 'ta': ['en', 'as', 'bn', 'brx', 'gu', 'hi', 'kn', 'ml', 'mni', 'mr', 'or', 'pa', 'te'], 'te': ['en', 'as', 'bn', 'brx', 'gu', 'hi', 'kn', 'ml', 'mni', 'mr', 'or', 'pa', 'ta'], 'ur': ['en', 'as', 'bn', 'brx', 'gu', 'hi', 'kn', 'ml', 'mni', 'mr', 'or', 'pa', 'ta', 'te']}
Retrieves supported voices for TTS in the specified language.
-
Parameters:
source_language
(str): Language code (e.g.,'hi'
for Hindi).
-
Returns:
list
: Supported voices (['male', 'female']
).
Usage Example:
voices = client.get_supported_voices('hi')
print("Supported Voices for Hindi TTS:", voices)
Performs ASR on the provided audio content.
-
Parameters:
audio_content
(bytes): Audio content in bytes.source_language
(str): Language code of the audio.audio_format
(str): Audio format ('wav'
,'mp3'
,'flac'
,'ogg'
).sampling_rate
(int): Sampling rate in Hz (8000
,16000
,48000
).
-
Returns:
dict
: ASR response from the API.
Usage Example:
with open('audio.wav', 'rb') as f:
audio_content = f.read()
asr_result = client.asr(
audio_content,
source_language='hi',
audio_format='wav', # Supported formats: 'wav', 'mp3', 'flac', 'ogg'
sampling_rate=16000 # Supported rates: 8000, 16000, 48000
)
print("ASR Result:", asr_result)
Translates text from the source language to the target language.
-
Parameters:
text
(str): Text to translate.source_language
(str): Source language code.target_language
(str): Target language code.
-
Returns:
dict
: Translation response from the API.
Usage Example:
translation_result = client.translate(
'मेरा नाम विहिर है।',
source_language='hi',
target_language='en'
)
print("Translation Result:", translation_result)
Converts text to speech in the specified language and gender.
-
Parameters:
text
(str): Text to convert.source_language
(str): Language code.gender
(str):'male'
or'female'
.sampling_rate
(int): Sampling rate in Hz (8000
,16000
,48000
).
-
Returns:
dict
: TTS response from the API.
Usage Example:
tts_result = client.tts(
'હેલો વર્લ્ડ',
source_language='gu',
gender='female',
sampling_rate=16000 # Supported rates: 8000, 16000, 48000
)
# Save the audio output
audio_base64 = tts_result['pipelineResponse'][0]['audio'][0]['audioContent']
audio_data = base64.b64decode(audio_base64)
with open('output_audio.wav', 'wb') as f:
f.write(audio_data)
This project is licensed under the MIT License.
Feel free to contribute to this project by submitting issues or pull requests.