IndicXlit


IndicXlit is a Transformer-based multilingual transliteration model (with ~11M parameters) for romanised to Indic script conversion, supporting 21 languages from the Indian subcontinent.

Download Model

English-Indic transliteration model can be downloaded here.

Usage

All information and instructions pertaining to using the IndicXlit model can be found on the IndicXlit GitHub repository.

Model Details

IndicXlit is trained on Aksharantar dataset which covers 21 language pairs. The volume of training, validation and test splits for each language pair is as listed below:

Subset as-en bn-en brx-en gu-en hi-en kn-en ks-en kok-en mai-en ml-en mni-en mr-en ne-en or-en pa-en san-en sd-en ta-en te-en ur-en
Training 179K 1231K 36K 1143K 1299K 2907K 47K 613K 283K 4101K 10K 1453K 2397K 346K 515K 1813K 60K 3231K 2430K 699K
Validation 4K 11K 3K 12K 6K 7K 4K 4K 4K 8K 3K 8K 3K 3K 9K 3K 8K 9K 8K 12K
Test 5531 5009 4136 7768 5693 6396 7707 5093 5512 6911 4925 6573 4133 4256 4316 5334 - 4682 4567 4463

In total, the Aksharantar dataset contains ~26M word pairs.

Benchmarking

We evaluate IndicXlit model on the Dakshina testset and compare our results with the Dakshina paper. Results are as mentioned:

Language bn gu hi kn ml mr pa sd si ta te urd Avg
Dakshina 49.40 49.50 50.00 66.20 58.30 49.70 40.90 33.20 54.70 65.70 67.60 36.70 51.83
IndicXlit 55.49 62.02 60.56 77.18 63.56 64.85 47.24 48.56 63.91 68.10 73.38 42.12 60.58

The paper also contains strong baseline results on the Aksharantar testset.

Contributors

Citing

If you are using IndicXlit, please cite the following paper:

@misc{madhani2022aksharantar,
      title={Aksharantar: Towards Building Open Transliteration Tools for the Next Billion Users}, 
      author={Yash Madhani and Sushane Parthan and Priyanka Bedekar and Ruchi Khapra and Anoop Kunchukuttan and Pratyush Kumar and Mitesh Shantadevi Khapra},
      year={2022},
      eprint={},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

License

The IndicXlit code and model are released under the MIT License.