TeraSort dataset creator for Python

This is a python implementation of TeraGen using Lithops. It generates a dataset for the sort benchmark. This implementation creates the dataset using FaaS and stores it in an object storage. The dataset is created in parallel using Lithops. It has been inspired by the hadoop implementation and this spark implementation.

Install dependencies

The only dependency is Lithops. You can install it using pip:

pip3 install lithops

or using the requirements.txt file:

pip3 install -r requirements.txt

Set up

You need to set up the config file for Lithops. You can find a template in the Lithops repository.

Usage

You can run the teragen.py script using the following command:

python3 teragen.py -s <size> -b <bucket> -k <key> -p <partitions> -c <config_file>

Parameters

The script takes the following parameters:

-s: Size of the dataset to generate. Examples: 100k, 5m, 10g, 1t. Or just the number of bytes.
-b: Bucket name to store the dataset.
-k: Key name prefix for the files created.
-p: Number of partitions files to create. Lithops will create a worker for each partition.
-c: Lithops config file path
--ascii: Use only printable characters in the dataset. Default: False
--localhost Execute the function locally using processes. Default: False
-h: Show help message.
--unique-file: Create a unique file instead of multiple files. Uses S3 multipart upload. Requires S3 as the configured Lithops storage backend. Default: False

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.idea		.idea
teragen		teragen
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
teragen.py		teragen.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TeraSort dataset creator for Python

Install dependencies

Set up

Usage

Parameters

About

Releases

Packages

Languages

License

gfinol/teragen-lithops

Folders and files

Latest commit

History

Repository files navigation

TeraSort dataset creator for Python

Install dependencies

Set up

Usage

Parameters

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages