IndicTrans is a Transformer-4X model trained on samanantar dataset. Two models are available which can translate from Indic to English and English to Indic. The model can perform translations for 11 lanaguages: Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, Telugu.
The Indic-Indic model is now available for download
The models are now available for download
- Indic-English model can be downloaded from here
- English-Indic model can be downloaded from here
- Indic-Indic can be downloaded from here
The instructions for running inference can be found at IndicTrans GitHub repository
IndicTrans is trained with Samanantar dataset which covers 11 language pairs.The amount of pretraining data for each language pair is listed below:
|Language Pair||# Sentence Pairs|
In total, the training data has 46.9M sentence pairs.
We evaluate IndicTrans model on a WAT2021, WAT2020, WMT, UFAL, PMI. Here are the results that we obtain: