digipathos_downloader
is a Python package for downloading Embrapa's Digipathos Dataset. In case you didn't know, the dataset contains images and descriptions of several vegetables diseases. The project is based on george-un's. Kudos to him! Without his implemented logic, surely this project would not be possible.
This lib is capable of fetching the original zipped files paths, and download and unzip them.
Available functions:
create_basic_folder_structure
: creates a folder for the dataset (dataset_dir/) and another one for the downloads (tmp/). It callscreate_dir
to do it.fetch_zips_table
: fetches metadata from the files.download_zips
: uses the metadata to download the files themselves. Callsdownload_zip
several times.validate_downloads
: validates the scheduled amount of downloads were done correctly.unpack_zips
: unzips the files in the tmp/ folder. Callsunpack_zip
several times.remove_tmp_dir
: deletes the downloads folder.get_dataset
: orchestrates the download end-to-end.
Not available:
(currently broken and untested): called during CLI. Inherited from the original project.main
Currently, CLI use is not available. The original project was meant to be called using CLI. Since this one aims to be employed in code, There is no plan to add CLI use.
In order to install the lib in a Linux based system, run:
pip install git+https://github.com/mtxslv/digipathos_downloader
If you have Poetry available in your system, you can do:
poetry add git+https://github.com/mtxslv/digipathos_downloader
Once you have downloaded the project, you can get the dataset using:
from pathlib import Path
from digipathos_downloader import download
my_folder = Path(__name__).absolute().parent # this is the folder you are in
dataset_dir = my_folder / 'dataset_dir' # here is where the unzipped files will be
tmp_dir = my_folder / 'tmp' # this folder is used during installation and deleted once it is over
# get the dataset
download.get_dataset(str(dataset_dir),
str(tmp_dir))
This will be sufficient to download the dataset to dataset_dir
.
More references regarding the use of the other functions are found in the tests folder and in the functions docstrings.
Your feedback is much appreciated 🫂.
If you developed something interesting using the lib, please consider showing the world. Don't be shy! You can tag my Linkedin account or my Github.
If you find any bugs, or unexpected behaviour, submit an issue to the project. I'll answer it ASAP 😉
That said, happy coding!