ABSTRACT
This project presents the development and fine-tuning of a multilingual neural machine translation system aimed at translating English text into Kannada, a low-resource language. The approach utilizes Meta AI’s pre-trained NLLB-200-distilled-600M model, which supports high-quality translation across multiple languages. Leveraging the Hugging Face Transformers and Datasets libraries, a custom parallel corpus was prepared and processed using tokenization and padding techniques. The model was trained using the Seq2SeqTrainer with appropriate training arguments, optimization strategies, and evaluation metrics. Key translation quality metrics—BLEU, chrF, and TER—were employed to quantitatively assess the performance of the system. The best-performing model checkpoint was saved and reloaded for final evaluation. This project demonstrates the efficacy of fine-tuning large language models for low-resource language translation and contributes toward the development of inclusive AI-driven language technologies.
Keywords: Neural Machine Translation (NMT), Seq2Seq Trainer, BLEU Score, chrF Metric, TER (Translation Edit Rate)
OBJECTIVE
The primary objective of this project is to develop and fine-tune a multilingual neural machine translation model capable of accurately translating text from English to Kannada, a low-resource South Indian language. This is achieved through the adaptation of Meta AI’s NLLB-200-distilled-600M model using a custom parallel corpus and modern training strategies.
The specific objectives of the study are as follows:
1. To utilize a pre-trained multilingual language model (NLLB-200) that supports translation between English and Kannada.
2. To prepare and preprocess a parallel dataset consisting of English-Kannada sentence pairs in JSON format.
3. To fine-tune the NLLB model on the custom dataset using the Hugging Face transformers and datasets libraries.
4. To evaluate the performance of the fine-tuned model using standardized translation metrics such as BLEU, chrF, and TER.
5. To save and export the best-performing model for future deployment and real-world translation tasks.
Through this work, the project aims to bridge the digital language divide by contributing a reliable and scalable solution for Kannada translation, supporting the broader goals of linguistic inclusivity and AI democratization.
• Demo Video
• Complete project
• Full project report
• Source code
• Complete project support by online
• Life time access
• Execution Guidelines
• Immediate (Download)
REQUIREMENTS SPECIFICATION
HARDWARE REQUIREMENTS
Pc
SOFTWARE REQUIREMENTS
Python 3.11
Google colab
LIBRARIES USED
Transformers
Torch
Os
Google.colab
Evaluate
Streamlit
1. Immediate Download Online
Only logged-in users can leave a review.
No reviews yet. Be the first to review this product!