Abstract
In the era of rapid information growth, efficiently processing news content in multiple languages has become increasingly important. This project presents a hybrid News Classification and Summarization System capable of handling both English and Kannada news articles. The system leverages natural language processing (NLP) techniques, including text cleaning, translation, TF-IDF vectorization, and machine learning classification, alongside transformer-based models (Pegasus) for abstractive summarization. Kannada text is first translated to English for improved classification accuracy and then translated back for user comprehension. Additionally, a keyword-based flood detection module highlights urgent news content. The system provides a unified platform for news categorization, summarization, and bilingual support, enabling users to quickly grasp critical information while ensuring accessibility across languages. Experimental evaluation demonstrates satisfactory performance in terms of accuracy and summarization quality, making it suitable for real-time news processing applications.
Keywords:
News Classification, Text Summarization, Natural Language Processing (NLP), Pegasus Transformer, Kannada-English Translation, Machine Learning, TF-IDF Vectorization, Bilingual News Processing.
Introduction
With the exponential growth of digital content, news articles are being generated in multiple languages at an unprecedented rate. The ability to efficiently process and extract meaningful information from these articles is crucial for users, organizations, and government agencies alike. Traditional manual methods of news analysis are time-consuming, error-prone, and unable to scale with the volume of data generated daily.
This project addresses these challenges by developing a bilingual News Classification and Summarization System that supports both English and Kannada languages. The system combines machine learning techniques for news categorization with state-of-the-art transformer models for abstractive summarization, providing a comprehensive solution for information extraction. Kannada text is translated to English to leverage the higher accuracy of existing English NLP models, after which summaries and categories are translated back to Kannada to ensure accessibility for regional language users.
Additionally, the system integrates a keyword-based alert mechanism for critical events such as floods, allowing urgent news to be identified and highlighted promptly. By automating news classification, summarization, and bilingual support, this project enables users to quickly comprehend large volumes of news content while maintaining linguistic inclusivity. The implementation demonstrates the synergy of translation, machine learning, and deep learning models in creating an efficient and scalable news processing platform.
Objective of the project
The primary objective of this project is to develop a bilingual News Classification and Summarization System that efficiently processes news content in English and Kannada. The specific objectives are as follows:
1. Automated News Classification:
o Classify news articles into predefined categories using machine learning techniques for both English and Kannada texts.
2. Text Summarization:
o Generate concise, meaningful summaries of news articles using transformer-based models, enabling users to grasp key information quickly.
3. Bilingual Support:
o Implement accurate Kannada-to-English and English-to-Kannada translation, allowing seamless processing and accessibility across languages.
4. Critical Event Detection:
o Identify urgent or disaster-related news (e.g., floods) using keyword-based and machine learning approaches for timely alerts.
5. Integration of NLP Techniques:
o Combine TF-IDF vectorization, machine learning classifiers, text cleaning, translation, and transformer-based summarization in a single workflow for efficient news processing.
6. User-Friendly Platform:
o Develop a web interface that allows users to input news text, select language, and receive categorized summaries with minimal effort.
7. Scalability and Real-Time Processing:
o Ensure that the system can handle large volumes of news articles efficiently, providing real-time categorization and summarization.
• Demo Video
• Complete project
• Full project report
• Source code
• Complete project support by online
• Life time access
• Execution Guidelines
• Immediate (Download)
System Requirements
Hardware Requirements
The system requires a modern computing environment to ensure smooth execution and accurate processing of multilingual news. The hardware specifications include a processor of at least Intel i5 (8th Generation) or AMD Ryzen 5, with a minimum of 16 GB RAM to efficiently handle large text datasets and transformer-based summarization models. An NVIDIA GPU with CUDA support, such as GTX 1650 or higher, is recommended to accelerate deep learning inference. Adequate storage of 500 GB HDD or 256 GB SSD is necessary to store datasets, trained models, and vectorizer files. A display with a resolution of 1366 × 768 or higher is required for proper visualization of the web interface, and a stable internet connection is essential for accessing online translation services and downloading pre-trained models.
Software Requirements
The system is implemented using Python 3.8 or higher, leveraging various libraries and frameworks for machine learning, natural language processing, and web development. Flask is used to develop the web interface and handle user interactions, while scikit-learn provides the tools for TF-IDF vectorization and Logistic Regression-based news classification. PyTorch and the Transformers library are employed for the Pegasus-based summarization model. Pandas and NumPy facilitate efficient data manipulation, and deep-translator enables Kannada-English translation. Joblib is used for saving and loading trained models, and SQLite3 manages user registration and login data. Additional utilities such as re handle text preprocessing, while werkzeug.security ensures secure password hashing and authentication. The development and testing can be performed using IDEs like VS Code or PyCharm, with datasets in CSV format (news.csv for English news and kannada_news.csv for Kannada news).
1. Immediate Download Online
Only logged-in users can leave a review.
No reviews yet. Be the first to review this product!