The purpose of this repository is to experiment with Rust parallelization for data analysis mainly using Polars Library and compare againt python pandas doing somewhat of a benchmark.
Data comes from a file in data/foods.parquet which comes from the an example data from polars library.
Fist idea would be to run the file_dup.sh which would duplicate the file 800 times into a .data/directory Duplicate the basic data to create an scenario where working in parallel makes sense
- ./data_dup.sh
Run Rust and collect results:
- cargo run --release # compiles the project
- time ./target/release/data_pipeline
Run Python equivalent project and collect results 3) python main.py
Current results in intel 12th Gen i7 - 1270P - 16 threads.
Python:
real 0m2.362s
user 0m2.328s
sys 0m1.465s
Rust:
real 0m0.167s
user 0m0.191s
sys 0m0.278s
Rust is about 14 times faster than python with these conditions