- Overview
- Live Server
- Prerequisites
- Running with Docker Compose
- Setting Up the Development Environment
- Downloading Translation Models
- Running with FastAPI Server
- Evaluating Results
- Build Docker Image
- References
- Contributing
- License
- FAQ
This project sets up an Indic translation server using Docker Compose, allowing translation between various languages including English, Kannada, Hindi, and others. It utilizes models from AI4Bharat to perform translations.
Here is the list of languages supported by the IndicTrans2 models:
Assamese (asm_Beng) | Kashmiri (Arabic) (kas_Arab) | Punjabi (pan_Guru) |
Bengali (ben_Beng) | Kashmiri (Devanagari) (kas_Deva) | Sanskrit (san_Deva) |
Bodo (brx_Deva) | Maithili (mai_Deva) | Santali (sat_Olck) |
Dogri (doi_Deva) | Malayalam (mal_Mlym) | Sindhi (Arabic) (snd_Arab) |
English (eng_Latn) | Marathi (mar_Deva) | Sindhi (Devanagari) (snd_Deva) |
Konkani (gom_Deva) | Manipuri (Bengali) (mni_Beng) | Tamil (tam_Taml) |
Gujarati (guj_Gujr) | Manipuri (Meitei) (mni_Mtei) | Telugu (tel_Telu) |
Hindi (hin_Deva) | Nepali (npi_Deva) | Urdu (urd_Arab) |
Kannada (kan_Knda) | Odia (ory_Orya) |
We have hosted an Translation service for Indian languages. The service is available in two modes:
- With curl
You can test the service using curl
commands. Below are examples for both service modes:
curl -X 'POST' \
'https://gaganyatri-translate-indic-server-cpu.hf.space/translate?src_lang=kan_Knda&tgt_lang=eng_Latn&device_type=cpu' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"sentences": [
"ನಮಸ್ಕಾರ, ಹೇಗಿದ್ದೀರಾ?", "ಶುಭೋದಯ!"
],
"src_lang": "kan_Knda",
"tgt_lang": "eng_Latn"
}'
curl -X 'POST' \
'https://gaganyatri-translate-indic-server.hf.space/translate?src_lang=kan_Knda&tgt_lang=eng_Latn&device_type=gpu' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"sentences": [
"ನಮಸ್ಕಾರ, ಹೇಗಿದ್ದೀರಾ?", "ಶುಭೋದಯ!"
],
"src_lang": "kan_Knda",
"tgt_lang": "eng_Latn"
}'
- Via Swagger UI
- Docker and Docker Compose installed on your machine.
- Python 3.x installed for the development environment.
- Internet access to download translation models.
- Start the server:
docker compose -f compose.yaml up -d
-
Create a virtual environment:
python -m venv venv
-
Activate the virtual environment:
source venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
- Collection Models on HuggingFace - IndicTrans2
Below is a table summarizing the available models for different translation tasks:
Task | Variant | Model Name | VRAM Size | Download Command |
---|---|---|---|---|
Indic to English | 200M (distilled) | indictrans2-indic-en-dist-200M | 950 MB | huggingface-cli download ai4bharat/indictrans2-indic-en-dist-200M |
1B (base) | indictrans2-indic-en-1B | 4.5 GB | huggingface-cli download ai4bharat/indictrans2-indic-en-1B |
|
English to Indic | 200M (distilled) | indictrans2-en-indic-dist-200M | 950 MB | huggingface-cli download ai4bharat/indictrans2-en-indic-dist-200M |
1B (base) | indictrans2-en-indic-1B | 4.5 GB | huggingface-cli download ai4bharat/indictrans2-en-indic-1B |
|
Indic to Indic | 320M (distilled) | indictrans2-indic-indic-dist-320M | 950 MB | huggingface-cli download ai4bharat/indictrans2-indic-indic-dist-320M |
1B (base) | indictrans2-indic-indic-1B | 4.5 GB | huggingface-cli download ai4bharat/indictrans2-indic-indic-1B |
You can run the server using FastAPI:
- with GPU
python src/translate_api.py --port 7860 --host 0.0.0.0 --device cuda --use_distilled
- with CPU only
python src/translate_api.py --port 7860 --host 0.0.0.0 --device cpu --use_distilled
You can evaluate the translation results using curl
commands. Here are some examples:
curl -X 'POST' \
'http://localhost:7860/translate?tgt_lang=kan_Knda&src_lang=eng_Latn&device_type=cuda' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"sentences": [
"Hello, how are you?", "Good morning!"
],
"src_lang": "eng_Latn",
"tgt_lang": "kan_Knda"
}'
Response:
{
"translations": [
"ಹಲೋ, ಹೇಗಿದ್ದೀರಿ? ",
"ಶುಭೋದಯ! "
]
}
curl -X 'POST' \
'http://localhost:7860/translate?src_lang=kan_Knda&tgt_lang=eng_Latn&device_type=cuda' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"sentences": [
"ನಮಸ್ಕಾರ, ಹೇಗಿದ್ದೀರಾ?", "ಶುಭೋದಯ!"
],
"src_lang": "kan_Knda",
"tgt_lang": "eng_Latn"
}'
Response:
{
"translations": ["Hello, how are you?", "Good morning!"]
}
curl -X 'POST' \
'http://localhost:7860/translate?src_lang=kan_Knda&tgt_lang=eng_Latn&device_type=cuda' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"sentences": [
"ನಮಸ್ಕಾರ, ಹೇಗಿದ್ದೀರಾ?", "ಶುಭೋದಯ!"
],
"src_lang": "kan_Knda",
"tgt_lang": "eng_Latn"
}'
curl -X 'POST' \
'http://localhost:7860/translate?src_lang=kan_Knda&tgt_lang=eng_Latn&device_type=cpu' \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"sentences": [
"ನಮಸ್ಕಾರ, ಹೇಗಿದ್ದೀರಾ?", "ಶುಭೋದಯ!"
],
"src_lang": "kan_Knda",
"tgt_lang": "eng_Latn"
}'
{
"translations": [
"Hello, how are you?",
"Good morning!"
]
}
- GPU
docker build -t slabstech/indic_translate_server -f Dockerfile .
- CPU only
docker build -t slabstech/indic_translate_server_ -f Dockerfile.cpu .
- IndicTrans2 Paper
- AI4Bharat IndicTrans2 Model
- AI4Bharat IndicTrans2 GitHub Repository
- IndicTransToolkit
- Extra - pip install git+https://github.com/VarunGumma/IndicTransToolkit.git
We welcome contributions! Please read the CONTRIBUTING.md file for guidelines on how to contribute to this project.
Also you can join the discord group to collaborate
This project is licensed under the MIT License - see the LICENSE file for details.
Q: How do I change the source and target languages?
A: Modify the compose.yaml
file to set the SRC_LANG
and TGT_LANG
variables as needed.
Q: How do I download the translation models?
A: Use the huggingface-cli
commands provided in the Downloading Translation Models section.
Q: How do I run the server locally?
A: Follow the instructions in the Running with FastAPI Server section.
This README provides a comprehensive guide to setting up and running the Indic Translate Server. For more details, refer to the linked resources.
@article{gala2023indictrans,
title={IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages},
author={Jay Gala and Pranjal A Chitale and A K Raghavan and Varun Gumma and Sumanth Doddapaneni and Aswanth Kumar M and Janki Atul Nawale and Anupama Sujatha and Ratish Puduppully and Vivek Raghavan and Pratyush Kumar and Mitesh M Khapra and Raj Dabre and Anoop Kunchukuttan},
journal={Transactions on Machine Learning Research},
issn={2835-8856},
year={2023},
url={https://openreview.net/forum?id=vfT4YuzAYA},
note={}
}