A multi-level dataflow profiler to capture I/O calls from workflows.
- Newer Kernel > 5.6 see kernel doc
- BCC > v0.30.0
- cmake
- python 3.10
- hydra-core>=1.2.0
Test Requirements
- OpenMPI
We have a Image that can be used to do development on Chameleon called HARI-UBUNTU-22.04.04-BCC.
git clone https://github.com/hariharan-devarajan/datacrumbs.git
cd datacrumbs
mkdir build
cd build
cmake ..
make -j
sudo pip install -r requirements.txt
The profiler tool need to run as root
sudo su
export PYTHONPATH=<PATH to datacrumbs>
cd <PATH to datacrumbs Root>
python3 datacrumbs/main.py
Once the profiler is loaded, it will wait for applictaions to connect.
Check ulimit -n for open file handlers
ulimit -n 1048576
Increase probe limit within BCC
export BCC_PROBE_LIMIT=1048576
Once the profiler has started u can run the application code.
cd <PATH to datacrumbs Root>
cd tests/scripts
./run.sh
The profiler output is created in the directory where the profiler runs.
cc@ebpf:~/datacrumbs$ head -n 5 profile.pfw
[
{"pid": 30545, "tid": 30545, "name": "__libc_malloc [libc.so.6]", "cat": "[libc.so.6]", "ph": "C", "ts": 0.0, "args": {"count": 21, "time": 0.000198116}}
{"pid": 30545, "tid": 30545, "name": "cfree [libc.so.6]", "cat": "[libc.so.6]", "ph": "C", "ts": 0.0, "args": {"count": 2, "time": 1.9788e-05}}
{"pid": 30545, "tid": 30545, "name": "[unknown]", "cat": "unknown", "ph": "C", "ts": 0.0, "args": {"count": 1, "time": 1.3094e-05}}
{"pid": 30545, "tid": 30545, "name": "__libc_malloc [libc.so.6]", "cat": "[libc.so.6]", "ph": "C", "ts": 24000000.0, "args": {"count": 149, "time": 0.000503765}}
The output format used is Chrome Tracing format and can be viewed using perfetto.
The profiler output can be analyzed using Dask distributed analysis. Please refer to the notebook.