Welcome to our Python desktop application for sentiment analysis, Using our model XLM-RoBERTa-German-Sentiment model. Our application provides sentiment analysis across 8 languages, with focus on the German language. This tool is for anyone interested in uncovering insights from textual data.
You can refer to the sentiment analysis model details on Hugging Face
Refer to the paper for more information about the training methodology and the results of the model used in the application and the Design and Diagrams of the application.
-
Sentiment analysis across 8 languages, specializing in German with 87% F1 score.
-
Utilizes the robust XLM-RoBERTa architecture and fine tuned with German dataset contains many domains, the dataset is subset of German Bert's Dataset.
-
The Model-View-Controller (MVC) design pattern has been implemented to separate thefront-end and back-end code into different components, this separation facilitates more manageable changes and updates to each side, reducing the risk of interference between the two components.
-
For the database, I'm using PostgreSQL and implementing Object-Relational Mapping (ORM) which adds a layer of abstraction over the database operations, allowing to work with the data and tables as objects rather than SQL queries, this is implemented using SQLAlchemy in Python, also it contains some useful abstract database operations creating a set of classes that encapsulate all database-related interactions, this abstraction allows for more manageable, modular, and maintainable code.
- Clone the repository:
git clone https://github.com/ssary/German-Sentiment-Analysis
- Change the directory to the repo folder with:
cd '.\XLM-RoBERTa model\'
- Create virtual environment "myenv" with:
python -m venv myenv.
- To activate the virtual environment:
source myenv/Scripts/activate
or with:
source myenv/bin/activate
- Install dependencies:
pip install -r requirements.txt
-
Change the Database URL in the settings.env to your database URL, the format of postgresql is
DATABASE_URL="postgresql://USERNAME:YOUR_PASSWORD@HOST:PORT/DATABASE_NAME
where you change USERNAME, YOUR_PASSWORD, HOST usually is localhost, PORT is usually 5432 and DATABASE_NAME with the corresponding values. -
Launch the application:
python controller.py
Add review text with any of these 8 language (German, Arabic, English, French, Hindi, Italian, portuguese, Spanish) and you'll get positive, negative or neutral for the review.
- Fine tuning the model on HPC
- Testing our 200K model
- Testing Before Fine Tuning on German Bert Dataset
- Changing German Bert dataset structure to fit for the training
Here is comparison between the model before fine tuning and after fine tuning, the F1 accuracy increased by 10%, achieving 87% accuracy on German Bert Dataset.
We extend our heartfelt gratitude to Oliver Guhr for developing the German-language dataset utilized in training our model. This dataset, available on GitHub, has been instrumental in enhancing our model's performance. For more details on the dataset this GitHub repository
- For more on the XLM-RoBERTa architecture and its advantages, see the RoBERTa paper.
- Our model's fine-tuning and training are based on the principles outlined in the xlm-t paper.
For any inquiries or further information, feel free to contact me at [email protected].