The following steps describe the setup to run the code locally.
Run these lines one after another in your command line:
pip install fasttext~=0.9.2 javalang~=0.13.0 pycparser~=2.21 comment-parser~=1.2.4 esprima~=4.0.1 XlsxWriter~=3.0.1 spacy~=3.1.1 pydantic~=1.8.2 typing-extensions~=4.2.0 nltk~=3.2.5 numpy~=1.22.3 sklearn~=0.0 scikit-learn~=1.1.1 pandas~=1.1.5 joblib~=1.1.0 autograd~=1.3 torch~=1.13.1 transformers~=4.26.1 scipy~=1.8.1 pyemd~=0.5.1 gensim~=3.6.0
python -m spacy download it_core_news_lg
python -m spacy download en_core_web_lg
python -m nltk.downloader stopwords
python -m nltk.downloader punkt
python -m nltk.downloader wordnet
fasttext
is the dependency for the fasttext word embedding libraryjavalang
is the java AST parserpycparser
is the C AST parsercomment-parser
is used to extract comments from C codeesprima
is used for JSP parsingXlsxWriter
is needed to create excel files (to write evaluation results)spacy
is needed for its lemmatizernltk
is needed for its stopwords, tokenizer and lemmatizer/stemmer- the others are common scientific python libs that are compatible with the before mentioned
The fastText model files (for english and italian) are not included in this repository. The model files can be found on the website of fastText. For the paper we used cc.en.300.bin
and cc.it.300.bin
.
The unixcoder model files are also not included. They can be downloaded from Hugging Face hub, model name microsoft/unixcoder-base
.