This repository is used to develop the LinuxDownloader of the SemanGit Project. To use it download the zip and execute ./LinuxDownloader
For the main project repository, please visit the SemanGit repository
The Downloader will guide you through all important steps for processing and will ensure fault tolerant generation of the dataset. The steps include:
- Confirmation of License Agreements
- Installation of Dependencies (curl, pigz, jre, zipper)
- Selection of Datasets to generate
- Download (automatic)
- Unzipping (automatic)
- Conversion (automatic)
- Merging Dataset (automatic)
Options include:
- -h, -help print the help file
- -skip_install - skips the depencency checks and the installation of packages.
- -keep_everything - keeps the output of all intermediate steps (4-7). (Attention: This results in a huge storage overhead)
- -output_dir=<absolute_path> - change the storage location for all steps and the final output
- -converter_options="-option1 -option2 ... " - passes options to the converter. A list of available options can be found in the folder Converter