To avoid environment issues you can create a new conda env using the following line:
conda create --name <env> --file requirements.txt
This project is divided into two main segments:
- Hate Speech Detection - under the module
detection
- Hate Networks - under the model
hate_networks
This is the main module of the thesis. It contains both the post level models (PLMs) and the user level models (ULMs) to detect hate speech.
The configuration of the execution for this module is under the file config.detection_config.py
. Under this file you can control what dataset is running for the specific execution.
There are four entry points for this module. To execute each of the following experiments from the root dir of the project run:
`python detection/experiments/{experiment_file_name}.py`
The entry points reside under experiments
directory in the following files:
-
Post-level experiments
post_level__experiment.py
- run this file to execute a specific experiment of a model on a specific data set as configured inconfig.detection_config.py
. To run this file set the parametermultiple_experiments
toFalse
underpost_level_execution_config
inconfig.detection_config.py
file.post_level__multiple_experiments.py
- run this file to execute multiple experiments of several models and compare them. To run this file set the parametermultiple_experiments
toTrue
underpost_level_execution_config
inconfig.detection_config.py
file.
Important note: to run BertFineTuning model make sure you are using a gpu.
-
User-level experiments
user_level__experiments.py
- run this file to execute the user level experiment. It will load the PLM from thepost_level_execution_config
config and predict the probabilities of all of the posts by all users in the given data to contain hate speech. Then it will run the FFNN using the streams of data by the user itself, his followees and his followers, together with network features.- Important note: one must run the hate network module with the desired dataset before running this code as it uses some output from it.
user_level__threshold_models.py
- Run this file to execute threshold-based ULMs.- Important note: one must run the experiment file
user_level__experiments.py
with the desired dataset before running this code as it uses some output from it.
- Important note: one must run the experiment file
-
The outputs of the models' executions will be saved under
detection/outputs/{data_name}_{model_name}
.
In this module we create networks promoting hate using data from social networks that present engagement between users, i.e., mentions, retweets.
Using this module you can create unsupervised segmentation of users to various communities based on the set configuration.
The unsupervised methods are topic models (LDA/NMF) and Word2Vec modeling based on the users' texts.
Using these methods you are able to color the users in the reconstructed user-network.
You can also color the users using the predictions created using the user-classifier from the detection
module.
This can be done by setting the param plot_supervised_networks
to true in general_conf
under hate_networks_config.py
file and using the user_pred
param under path_conf for each dataset.
Important notes:
-
The entry point for this module is the function
main()
under the filemain.py
. To execute the code from the root dir of the project run:python hate_networks/main.py
-
The execution of this code is configured using the configuration dict under the
config
module in the filehate_networks_config.py
. -
The outputs of the constructed network and the relevant files that are created with it will be saved under
hate_networks/outputs/{data_name}_networks
.