Dataset consists of 10859 gathered from Wikimedia.
The Wikimedia Wordcloud Generator is a Python-based web application that allows users to generate word clouds and bar plots from a dataset collected from Wikimedia. The application uses the power of Natural Language Processing (NLP) to clean, process, and visualize textual data from Wikipedia articles.
The dataset consists of 10,859 entries gathered from Wikimedia, and it allows users to explore various text mining techniques like title generation, word cloud creation, and bar plot visualization based on term frequency.
The project is deploy-ready, but due to the high memory usage required for processing the dataset (specifically when generating word clouds from such a large volume of text), deploying it to a free-tier cloud server is not feasible at this moment. Free cloud services generally impose strict memory and computational limits that prevent the full processing of large datasets.
- Generate Title: Generate a title based on the provided parameters.
- Gather Page: Gather the page from the dataset.
- Data Cleaning: Demonstrates the cleaning process.
- Bar Plot Generation: Create a bar plot for term frequency data.
- Word Cloud Generation: Generate a word cloud of terms from the dataset.
- Python 3.7 or higher
- Flask
- Other Python dependencies listed in
requirements.txt
- Clone the repository:
git clone https://github.com/giraydorukyurt7/WikiMedia-WordCloud-Generator.git cd WikiMedia-WordCloud-Generator
- Set up a virtual environment:
python -m venv venv
- Activate the virtual environment:
On Windows:
On Mac/Linux:
venv\\Scripts\\activate
source venv/bin/activate
- Install the dependencies:
pip install -r requirements.txt
- Run the Flask application:
python WikiMedia_Text_Mining_NLP/wikimedia_text_mining_nlp.py
- Open a browser and navigate to http://127.0.0.1:5000/ to view the app.
Once the app is running, you can interact with the following sections:
- Page No: Select the page number to use for title generation.
- Use All: Check this option to use the entire dataset.
- Title Size: Specify the size of the generated title.
- Page No: Specify the page number you want to extract text from.
- Clean Page: Clean the extracted page text.
- Page No: Specify the page number for which you want to generate a bar plot.
- Use All: Use the entire dataset.
- Minimum Term Frequency: Set the minimum frequency of terms to be displayed.
- Page No: Specify the page number for word cloud generation.
- Use All: Generate a word cloud using the entire dataset.