Skip to content

This document details the application of NLP techniques for text mining, model training, evaluation, and error analysis on dataset.

Notifications You must be signed in to change notification settings

karnagetm/Text-Mining-Project

Repository files navigation

Text Mining Work Project

Overview:

This project focuses on applying Natural Language Processing (NLP) techniques to process and analyze textual data. The main objective is to extract meaningful patterns and insights from the text data, utilizing various NLP methodologies.

Features:

  • N-gram token analysis
  • Logistic Regression classification
  • Model performance evaluation with confusion matrices and classification reports
  • Error analysis and discussion for model predictions

Data:

The datasets used in this project are derived from the 20 Newsgroups dataset, which is a collection of approximately 20,000 newsgroup documents, partitioned across 20 different newsgroups.

Prerequisites:

Before running this project, ensure you have the following dependencies installed:

  • Python 3.x
  • NumPy
  • scikit-learn
  • NLTK
  • Matplotlib
  • Seaborn

Usage:

To run the analysis, execute the Jupyter Notebook V2.3_Karan_patel_Task.ipynb. Make sure to follow the steps in the notebook, which include data preprocessing, model training, prediction, evaluation and run all nodes chronologically.

Results:

The results of the project include various metrics such as accuracy, precision, recall, and F1 score, as well as visualizations like confusion matrices and graphs illustrating model performance.

Final Version:

Acknowledgments:

  • Dataset Source
  • Contributors to the scikit-learn, NLTK, and related libraries.

About

This document details the application of NLP techniques for text mining, model training, evaluation, and error analysis on dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published