Skip to content

A NLP-based project that processes Wikipedia text data to generate word clouds and perform text mining. Built with Flask, this application enables visualizing text data from a CSV file using interactive web-based features.

License

Notifications You must be signed in to change notification settings

giraydorukyurt7/WIKIMEDIA-WORDCLOUD-GENERATOR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WIKIMEDIA WORDCLOUD GENERATOR

Dataset consists of 10859 gathered from Wikimedia.

Project Overview

The Wikimedia Wordcloud Generator is a Python-based web application that allows users to generate word clouds and bar plots from a dataset collected from Wikimedia. The application uses the power of Natural Language Processing (NLP) to clean, process, and visualize textual data from Wikipedia articles.

The dataset consists of 10,859 entries gathered from Wikimedia, and it allows users to explore various text mining techniques like title generation, word cloud creation, and bar plot visualization based on term frequency.

Deployment Status

The project is deploy-ready, but due to the high memory usage required for processing the dataset (specifically when generating word clouds from such a large volume of text), deploying it to a free-tier cloud server is not feasible at this moment. Free cloud services generally impose strict memory and computational limits that prevent the full processing of large datasets.


Features

  • Generate Title: Generate a title based on the provided parameters.
  • Gather Page: Gather the page from the dataset.
  • Data Cleaning: Demonstrates the cleaning process.
  • Bar Plot Generation: Create a bar plot for term frequency data.
  • Word Cloud Generation: Generate a word cloud of terms from the dataset.

Installation

Requirements

  • Python 3.7 or higher
  • Flask
  • Other Python dependencies listed in requirements.txt

Steps to Run the Project Locally

  1. Clone the repository:
    git clone https://github.com/giraydorukyurt7/WikiMedia-WordCloud-Generator.git
    cd WikiMedia-WordCloud-Generator
  2. Set up a virtual environment:
    python -m venv venv
  3. Activate the virtual environment: On Windows:
    venv\\Scripts\\activate
    On Mac/Linux:
    source venv/bin/activate
  4. Install the dependencies:
    pip install -r requirements.txt
  5. Run the Flask application:
    python WikiMedia_Text_Mining_NLP/wikimedia_text_mining_nlp.py
  6. Open a browser and navigate to http://127.0.0.1:5000/ to view the app.

Usage

Once the app is running, you can interact with the following sections:

Generate Title

  • Page No: Select the page number to use for title generation.
  • Use All: Check this option to use the entire dataset.
  • Title Size: Specify the size of the generated title.

Get Page

  • Page No: Specify the page number you want to extract text from.
  • Clean Page: Clean the extracted page text.

Get Bar Plot

  • Page No: Specify the page number for which you want to generate a bar plot.
  • Use All: Use the entire dataset.
  • Minimum Term Frequency: Set the minimum frequency of terms to be displayed.

Get Word Cloud

  • Page No: Specify the page number for word cloud generation.
  • Use All: Generate a word cloud using the entire dataset.

About

A NLP-based project that processes Wikipedia text data to generate word clouds and perform text mining. Built with Flask, this application enables visualizing text data from a CSV file using interactive web-based features.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published