The Run:ai Model Streamer is a Python SDK designed to facilitate the streaming of tensors from tensors files to GPU memory with concurrency and streaming. It provides an API for loading SafeTensors files and building AI models, allowing loading models seamlessly.
For documentation click here
For benchmarks click here
Install the package
pip install runai_model_streamer
And stream tensors
from runai_model_streamer import SafetensorsStreamer
file_path = "model.safetensors"
with SafetensorsStreamer() as streamer:
streamer.stream_file(file_path)
for name, tensor in streamer.get_tensors():
gpu_tensor = tensor.to('CUDA:0')
Our repository is built using devcontainer (Further reading)
The following commands should run inside the dev container
Note
You can use devcontainer-cli tool (Further reading) by installing it, and run every command with the following prefix devcontainer exec --workspace-folder . [COMMAND]
Build
make build
Note
We build libstreamers3.so
and statically link it to libssl, libcrypto, and libcurl. if you would like to use your system libraries by dynamic link to them, run USE_SYSTEM_LIBS=1 make build
Note
On successful build, the .whl
file would be at py/runai_model_streamer/dist/<PACKAGE_FILE>
and py/runai_model_streamer_s3/dist/<PACKAGE_FILE>
Run tests
make test
Install locally
pip3 install py/runai_model_streamer py/runai_model_streamer_s3
Important
In order to the CPP to run, you need to install libcurl4 and libssl1.1_1