Skip to content

Commit a353b4d

Browse files
committed
added readme
close #8
1 parent 58210d3 commit a353b4d

File tree

2 files changed

+87
-2
lines changed

2 files changed

+87
-2
lines changed

.gitignore

+3-2
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
1-
/docs
2-
__pycache__
1+
docs
2+
__pycache__
3+
output

Readme.md

+84
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Text Preprocessing Project
2+
3+
Welcome to the Text Preprocessing Project! This project is designed to help you preprocess text data using various techniques like tokenization, lemmatization, stemming, and more. It includes a simple Tkinter GUI that allows users to input text and apply different preprocessing functions to it.
4+
5+
## Project Structure
6+
7+
- **Functions/**: This directory includes Python scripts for various text preprocessing functions:
8+
- `voices_script.py`: For handling different voice processing tasks.
9+
- `clauses.py`: For clause-based text processing.
10+
- `tenses.py`: For identifying and modifying tenses in text.
11+
- `pos_tagging.py`: For part-of-speech tagging.
12+
- `lemmatize_script.py`: For lemmatization.
13+
- `stem_script.py`: For stemming.
14+
- `tokenize_script.py`: For tokenization.
15+
- **nltk_data/**: Contains necessary NLTK data used for various text processing tasks:
16+
- **corpora/**: Includes corpora like stopwords and WordNet.
17+
- **taggers/**: Contains the averaged perceptron tagger.
18+
- **tokenizers/**: Includes the Punkt tokenizer.
19+
- **.gitignore**: Specifies files and directories to be ignored by Git.
20+
- **Readme.md**: This file.
21+
- **requirements.txt**: Lists the Python packages required to run the project.
22+
- **run.py**: The main script to run the Tkinter GUI.
23+
24+
## Features
25+
26+
- **Tokenization**: Breaks down text into words or sentences.
27+
- **Lemmatization**: Converts words to their base or dictionary form.
28+
- **Stemming**: Reduces words to their root form.
29+
- **POS Tagging**: Tags parts of speech for each word in the text.
30+
- **Tense Handling**: Identifies and modifies tenses in the text.
31+
- **Clause Processing**: Analyzes and processes clauses within the text.
32+
- **Voice Processing**: Handles different voice-related text transformations.
33+
34+
## Installation
35+
36+
To get started with this project, follow these steps:
37+
38+
1. **Clone the Repository**
39+
```bash
40+
git clone https://github.com/SumitRajam/Text_Preprocessor.git
41+
cd Text_Preprocessor
42+
```
43+
2. **Install Dependencies**\
44+
Create a virtual environment and install the required packages
45+
```bash
46+
python -m venv venv
47+
source venv/bin/activate # On Windows use `venv\Scripts\activate`
48+
pip install -r requirements.txt
49+
```
50+
51+
## Usage
52+
1. **Run tkinter GUI**\
53+
Start the application by running:
54+
```bash
55+
python run.py
56+
```
57+
The Tkinter GUI will open, allowing you to input text and apply various preprocessing functions.
58+
59+
2. **Using the GUI**
60+
- **Input Text:** Enter the text you want to preprocess.
61+
- **Select a Function:** Choose a preprocessing function from the available buttons.
62+
- **View Output:** Click the selected button to apply the function and view the results.
63+
64+
## Contributing
65+
We welcome contributions to enhance this project! If you'd like to contribute, please follow these steps:
66+
1. **Fork the Repository:** Click the "Fork" button on the top-right corner of this page.
67+
2. **Create a Branch:** Create a new branch for your feature or bug fix.
68+
```bash
69+
git checkout -b feature/your-feature-name
70+
```
71+
3. **Make Changes:** Implement your changes and commit them.
72+
```bash
73+
git add .
74+
git commit -m "Add your commit message"
75+
```
76+
4. **Push Changes:** Push your changes to your forked repository.
77+
```bash
78+
git push origin feature/your-feature-name
79+
```
80+
5. **Create a Pull Request:** Go to the original repository and create a pull request from your forked repository.
81+
82+
83+
84+
# Happy preprocessing! ⚙️💬🧠🖥️

0 commit comments

Comments
 (0)