% ramalama 1
ramalama - Simple management tool for working with AI Models
ramalama [options] command
RamaLama : The goal of RamaLama is to make AI boring.
On first run RamaLama inspects your system for GPU support, falling back to CPU support if no GPUs are present. It then uses container engines like Podman or Docker to pull the appropriate OCI image with all of the software necessary to run an AI Model for your systems setup. This eliminates the need for the user to configure the system for AI themselves. After the initialization, RamaLama will run the AI Models within a container based on the OCI image.
RamaLama first pulls AI Models from model registires. It then start a chatbot or a service as a rest API from a simple single command. Models are treated similarly to the way that Podman or Docker treat container images.
RamaLama supports multiple AI model registries types called transports. Supported transports:
Transports | Web Site |
---|---|
HuggingFace | huggingface.co |
Ollama | ollama.com |
OCI Container Registries | opencontainers.org |
Examples: quay.io , Docker Hub , and Artifactory |
RamaLama uses the Ollama registry transport by default. Use the RAMALAMA_TRANSPORTS environment variable to modify the default. export RAMALAMA_TRANSPORT=huggingface
Changes RamaLama to use huggingface transport.
Individual model transports can be modifies when specifying a model via the huggingface://
, oci://
, or ollama://
prefix.
ramalama pull huggingface://
afrideva/Tiny-Vicuna-1B-GGUF/tiny-vicuna-1b.q2_k.gguf
To make it easier for users, RamaLama uses shortname files, which container alias names for fully specified AI Models allowing users to specify the shorter names when referring to models. RamaLama reads shortnames.conf files if they exist . These files contain a list of name value pairs for specification of the model. The following table specifies the order which Ramama reads the files . Any duplicate names that exist override previously defined shortnames.
Shortnames type | Path |
---|---|
Distribution | /usr/share/ramalama/shortnames.conf |
Administrators | /etc/ramamala/shortnames.conf |
Users | $HOME/.config/ramalama/shortnames.conf |
$ cat /usr/share/ramalama/shortnames.conf
[shortnames]
"tiny" = "ollama://tinyllama"
"granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
"granite:7b" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
"ibm/granite" = "huggingface://instructlab/granite-7b-lab-GGUF/granite-7b-lab-Q4_K_M.gguf"
"merlinite" = "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf"
"merlinite:7b" = "huggingface://instructlab/merlinite-7b-lab-GGUF/merlinite-7b-lab-Q4_K_M.gguf"
...
ramalama [GLOBAL OPTIONS]
run RamaLama in the default container (default: True) use environment variable "RAMALAMA_IN_CONTAINER=false" to change default.
show container runtime command without executing it (default: False)
show this help message and exit
do not run RamaLama in the default container (default: False)
specify the runtime to use, valid options are 'llama.cpp' and 'vllm' (default: llama.cpp)
store AI Models in the specified directory (default rootless: $HOME/.local/share/ramalama
, default rootful: /var/lib/ramalama
)
Command | Description |
---|---|
ramalama-containers(1) | list all RamaLama containers |
ramalama-list(1) | list all downloaded AI Models |
ramalama-login(1) | login to remote registry |
ramalama-logout(1) | logout from remote registry |
ramalama-pull(1) | pull AI Models from Model registries to local storage |
ramalama-push(1) | push AI Models from local storage to remote registries |
ramalama-rm(1) | remove AI Models from local storage |
ramalama-run(1) | run specified AI Model as a chatbot |
ramalama-serve(1) | serve REST API on specified AI Model |
ramalama-stop(1) | stop named container that is running AI Model |
ramalama-version(1) | display version of RamaLama |
Aug 2024, Originally compiled by Dan Walsh [email protected]