GitHub - Ivpe1975/XLM-R-sentiment-analysis

__   __ _     ___  ___      ______   _____            _   _                      _      ___              _           _     
\ \ / /| |    |  \/  |      | ___ \ /  ___|          | | (_)                    | |    / _ \            | |         (_)    
 \ V / | |    | .  . |______| |_/ / \ `--.  ___ _ __ | |_ _ _ __ ___   ___ _ __ | |_  / /_\ \_ __   __ _| |_   _ ___ _ ___ 
 /   \ | |    | |\/| |______|    /   `--. \/ _ \ '_ \| __| | '_ ` _ \ / _ \ '_ \| __| |  _  | '_ \ / _` | | | | / __| / __|
/ /^\ \| |____| |  | |      | |\ \  /\__/ /  __/ | | | |_| | | | | | |  __/ | | | |_  | | | | | | | (_| | | |_| \__ \ \__ \
\/   \/\_____/\_|  |_/      \_| \_| \____/ \___|_| |_|\__|_|_| |_| |_|\___|_| |_|\__| \_| |_/_| |_|\__,_|_|\__, |___/_|___/
                                                                                                            __/ |          
                                                                                                           |___/

Instructions to reproduce the resutls:

Start with downloading the Amazon reviews dataset from AWS( https://registry.opendata.aws/amazon-reviews-ml). It was too big to host on this github repo. Place the json folder into the same folder as the scripts. We have created the /json/dev/ folder in this repo to demonstrate where the files should be.
You can reproduce the baseline model by running the baseline.py script. This will yield a baseline_metrics.txt file which will hold the F1-scores and the accuracies.
Running any of the roberta_xx.py scripts will yield a model fine tuned on language xx and a results_xx.txt file which will hold all the F1-scores/accuracies for the fine-tuned model. The roberta_xx.job files are there if you are running it on the HPC cluster.
The lang2vec notebook contains all of the calculations for the mean R^2 values of the distance types. Note that the F1-scores are manually input into the notebook from the results_xx.txt files by defining the results vector.
The analysis notebook contains the pipeline for our quantatative and qualitative analysis. Before running it you will need to run roberta_de_test.py in order to get the wrong_ids, y_true and the y_pred vectors. This python script relies on a saved model so you will need to have run roberta_de.py In this case this is analysis of a sample model(fine-tuned on German predicting English). In order to get confusion matricies for the other languages/models the roberta_de_test.py script will have to be adjusted to whichever target and fine-tuned languages you want.

The results themselves are availible in the pdf document in the root folder.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
json		json
Analysis.ipynb		Analysis.ipynb
CITATION.cff		CITATION.cff
NLP_2nd_year_final.pdf		NLP_2nd_year_final.pdf
README.md		README.md
baseline.job		baseline.job
baseline.py		baseline.py
baseline_metrics.txt		baseline_metrics.txt
distill.py		distill.py
lang2vec.ipynb		lang2vec.ipynb
r2values.txt		r2values.txt
results.zip		results.zip
results_de.txt		results_de.txt
results_en.txt		results_en.txt
results_es.txt		results_es.txt
results_fr.txt		results_fr.txt
results_ja.txt		results_ja.txt
results_zh.txt		results_zh.txt
roberta_de.job		roberta_de.job
roberta_de.py		roberta_de.py
roberta_de_test.job		roberta_de_test.job
roberta_de_test.py		roberta_de_test.py
roberta_en.job		roberta_en.job
roberta_en.py		roberta_en.py
roberta_es.job		roberta_es.job
roberta_es.py		roberta_es.py
roberta_fr.job		roberta_fr.job
roberta_fr.py		roberta_fr.py
roberta_ja.job		roberta_ja.job
roberta_ja.py		roberta_ja.py
roberta_modules.job		roberta_modules.job
roberta_modules.py		roberta_modules.py
roberta_test.job		roberta_test.job
roberta_test.py		roberta_test.py
roberta_zh.job		roberta_zh.job
roberta_zh.py		roberta_zh.py
y_pred.txt		y_pred.txt
y_true.txt		y_true.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Contributors 2

Languages

Ivpe1975/XLM-R-sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages