bachelor thesis

TOPIC: Comparison of Approaches to Text Classification

The project was developed in Arch Linux and should work in any standard distribution, so long all dependencies are installed.

Dependencies:

Install the following software (for most distributions the name of a package in repositories should be the same):

wget
git
shell
python>=3.7 (tested on python3.7.2)
pip
aspell with english, german and french dictionary

Pip is usually a package python-pip or python3-pip. It is important to install the pip version for python3. It is also important to have python of a version at least 3.7. Version 3.6 or bellow (python3 in Debian Stretch) will not work.

Aspell and dictionaries are usually packages aspell and aspell-{en,fr,de}. https://github.com/pythonhttps://github.com/python Optionally, it is possible to run the project in a virtual environment. python3 can be replaced with the intended version of the python interpreter. Must be at least 3.7.

virtualenv venv --python=python3
cd venv
source ./bin/activate

Clone the repository and change working directory to the root:

git clone https://github.com/knezi/NPRG045
cd NPRG045

Install python dependencies from pip:

pip install `cat pip_deps`

Execution:

Everything can be operated by make. Make expects the dataset from Yelp to be extracted in ../data/dataset.

Next create the directory graphs:

mkdir graphs

For running the pipeline with sample data execute the following line. Note this does not need aspell as a dependency, because the data is already preprocessed.

make run_sample

For a full run:

make run

It will first preprocess and denormalize data and then process them. Repeated run will only rerun the second part. make run runs all experiments in directory experiments.

Resulting data can be found in graphs/current_timestamp

Name		Name	Last commit message	Last commit date
Latest commit History 210 Commits
classifiers		classifiers
data		data
denormalization		denormalization
diplomky		diplomky
experiments		experiments
geneea_server		geneea_server
preprocessors		preprocessors
thesis_data		thesis_data
utils		utils
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
add_tex.sh		add_tex.sh
compare_langs.py		compare_langs.py
crop_geneea.py		crop_geneea.py
exceptions.py		exceptions.py
load_data.py		load_data.py
my_statistics.py		my_statistics.py
my_unittests.py		my_unittests.py
pip_deps		pip_deps
process_data.py		process_data.py
run_unittests.sh		run_unittests.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

bachelor thesis

Dependencies:

Execution:

About

Releases

Packages

Contributors 2

Languages

knezi/bachelor_thesis

Folders and files

Latest commit

History

Repository files navigation

bachelor thesis

Dependencies:

Execution:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages