Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

source instead of exec in run-readme-pr-macos.yml #1476

Open
wants to merge 36 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
b716bf6
Update run-readme-pr-macos.yml
mikekgfb Jan 24, 2025
b0deb2a
Update run-docs
mikekgfb Jan 24, 2025
35dbb95
Merge branch 'main' into patch-43
Jack-Khuu Jan 24, 2025
b7af6b9
Update README.md
mikekgfb Jan 24, 2025
6ea3d55
Update multimodal.md
mikekgfb Jan 24, 2025
278b3fc
Merge branch 'pytorch:main' into patch-43
mikekgfb Jan 24, 2025
2e6b5ae
Update ADVANCED-USERS.md
mikekgfb Jan 24, 2025
049418d
Merge branch 'main' into patch-43
mikekgfb Jan 24, 2025
52fd00b
Update native-execution.md
mikekgfb Jan 25, 2025
76f7edf
Update run-readme-pr-macos.yml
mikekgfb Jan 25, 2025
da9a92a
Update run-readme-pr-mps.yml
mikekgfb Jan 25, 2025
3e4ad3d
Update ADVANCED-USERS.md
mikekgfb Jan 25, 2025
c2cb227
Update run-readme-pr-macos.yml
mikekgfb Jan 25, 2025
72702f0
Update run-readme-pr-mps.yml
mikekgfb Jan 25, 2025
170729b
Merge branch 'main' into patch-43
mikekgfb Jan 27, 2025
79c4a23
Update run-docs
mikekgfb Jan 28, 2025
ed702af
Create cuda-32.json
mikekgfb Jan 28, 2025
286bb08
Create mobile-32.json
mikekgfb Jan 28, 2025
e04d175
Merge branch 'main' into patch-43
mikekgfb Jan 30, 2025
0e21e95
Update run-readme-pr.yml
mikekgfb Jan 30, 2025
e901c03
Update install_requirements.sh
mikekgfb Jan 31, 2025
684816a
Update install_requirements.sh
mikekgfb Jan 31, 2025
11dd083
Merge branch 'pytorch:main' into patch-43
mikekgfb Jan 31, 2025
b3c4b9e
Update README.md
mikekgfb Jan 31, 2025
ead5b6a
Update run-docs
mikekgfb Jan 31, 2025
835ae0e
Update run-readme-pr-macos.yml
mikekgfb Jan 31, 2025
30f6ba8
Update run-readme-pr-mps.yml
mikekgfb Jan 31, 2025
5e65126
Update run-readme-pr.yml
mikekgfb Jan 31, 2025
d8dcb7b
Merge branch 'main' into patch-43
mikekgfb Jan 31, 2025
8519a44
Update run-docs
mikekgfb Jan 31, 2025
d5b3607
Update run-readme-pr.yml
mikekgfb Jan 31, 2025
f15bc15
Update run-docs
mikekgfb Feb 1, 2025
2a18f0d
Update run-readme-pr.yml
mikekgfb Feb 1, 2025
30746fc
Update run-readme-pr.yml
mikekgfb Feb 2, 2025
fb4e0dd
Update run-readme-pr-macos.yml
mikekgfb Feb 2, 2025
7786b84
Update run-readme-pr-linuxaarch64.yml
mikekgfb Feb 2, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions .ci/scripts/run-docs
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,16 @@ fi

# Pre-initialize variables
filepath=""
parameters="--replace 'llama3:stories15M,-l3:-l2' --suppress huggingface-cli,HF_TOKEN"
# cuda supports padding, so no need to replace quantization for now.
# otherwise add: 'cuda.json:cuda-32.json' to replace rules
parameters="--replace llama3:stories15M,-l3:-l2,mobile.json:mobile-32.json --suppress huggingface-cli,HF_TOKEN"
script_name="./run-${1}.sh" # Dynamically initialize script name

# Use a case statement to handle the $1 argument
case "$1" in
"readme")
filepath="README.md"
parameters="--replace llama3.1:stories15M,-l3:-l2,mobile.json:mobile-32.json --suppress huggingface-cli,HF_TOKEN"
;;
"quantization")
filepath="docs/quantization.md"
Expand Down Expand Up @@ -63,5 +66,6 @@ echo "::group::Run $1"
echo "*******************************************"
cat "$script_name"
echo "*******************************************"
bash -x "$script_name"
set -x
. "$script_name"
echo "::endgroup::"
21 changes: 20 additions & 1 deletion .github/workflows/run-readme-pr-linuxaarch64.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@ jobs:
uname -a
echo "::endgroup::"

which pip || true
which pip3 || true
which conda || true
TORCHCHAT_DEVICE=cpu .ci/scripts/run-docs readme

echo "::group::Completion"
Expand All @@ -44,7 +47,11 @@ jobs:
echo "::group::Print machine info"
uname -a
echo "::endgroup::"


which pip || true
which pip3 || true
which conda || true

TORCHCHAT_DEVICE=cpu .ci/scripts/run-docs quantization

test-gguf-cpu:
Expand All @@ -62,6 +69,10 @@ jobs:
uname -a
echo "::endgroup::"

which pip || true
which pip3 || true
which conda || true

TORCHCHAT_DEVICE=cpu .ci/scripts/run-docs gguf

echo "::group::Completion"
Expand All @@ -84,6 +95,10 @@ jobs:
uname -a
echo "::endgroup::"

which pip || true
which pip3 || true
which conda || true

TORCHCHAT_DEVICE=cpu .ci/scripts/run-docs advanced

echo "::group::Completion"
Expand All @@ -106,6 +121,10 @@ jobs:
uname -a
echo "::endgroup::"

which pip || true
which pip3 || true
which conda || true

TORCHCHAT_DEVICE=cpu .ci/scripts/run-docs evaluation

echo "::group::Completion"
Expand Down
24 changes: 17 additions & 7 deletions .github/workflows/run-readme-pr-macos.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,13 @@ jobs:
sysctl machdep.cpu.core_count
echo "::endgroup::"

which pip || true
which pip3 || true
which conda || true

echo "using workaround for #1416 and #1315 by setting torchchat device explicitly"
TORCHCHAT_DEVICE=cpu .ci/scripts/run-docs readme
export TORCHCHAT_DEVICE=cpu
. .ci/scripts/run-docs readme

echo "::group::Completion"
echo "tests complete"
Expand Down Expand Up @@ -70,8 +75,9 @@ jobs:
echo "::endgroup::"

echo "using workaround for #1416 and #1315 by setting torchchat device explicitly"
TORCHCHAT_DEVICE=cpu .ci/scripts/run-docs quantization

export TORCHCHAT_DEVICE=cpu
. .ci/scripts/run-docs quantization

echo "::group::Completion"
echo "tests complete"
echo "*******************************************"
Expand Down Expand Up @@ -106,7 +112,8 @@ jobs:
echo "::endgroup::"

echo "using workaround for #1416 and #1315 by setting torchchat device explicitly"
TORCHCHAT_DEVICE=cpu .ci/scripts/run-docs gguf
export TORCHCHAT_DEVICE=cpu
# .ci/scripts/run-docs gguf

echo "::group::Completion"
echo "tests complete"
Expand Down Expand Up @@ -141,7 +148,8 @@ jobs:
echo "::endgroup::"

echo "using workaround for #1416 and #1315 by setting torchchat device explicitly"
TORCHCHAT_DEVICE=cpu .ci/scripts/run-docs advanced
export TORCHCHAT_DEVICE=cpu
. .ci/scripts/run-docs advanced

echo "::group::Completion"
echo "tests complete"
Expand Down Expand Up @@ -209,7 +217,8 @@ jobs:
sysctl machdep.cpu.core_count
echo "::endgroup::"

.ci/scripts/run-docs multimodal
# metadata does not install properly on macos
# .ci/scripts/run-docs multimodal

echo "::group::Completion"
echo "tests complete"
Expand Down Expand Up @@ -243,7 +252,8 @@ jobs:
sysctl machdep.cpu.core_count
echo "::endgroup::"

.ci/scripts/run-docs native
echo ".ci/scripts/run-docs native DISABLED"
# .ci/scripts/run-docs native

echo "::group::Completion"
echo "tests complete"
Expand Down
8 changes: 5 additions & 3 deletions .github/workflows/run-readme-pr-mps.yml
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ jobs:
sysctl machdep.cpu.core_count
echo "::endgroup::"

.ci/scripts/run-docs gguf
# .ci/scripts/run-docs gguf

echo "::group::Completion"
echo "tests complete"
Expand Down Expand Up @@ -162,7 +162,8 @@ jobs:
sysctl machdep.cpu.core_count
echo "::endgroup::"

.ci/scripts/run-docs multimodal
# metadata does not install properly on macos
# .ci/scripts/run-docs multimodal

echo "::group::Completion"
echo "tests complete"
Expand All @@ -189,7 +190,8 @@ jobs:
sysctl machdep.cpu.core_count
echo "::endgroup::"

.ci/scripts/run-docs native
echo ".ci/scripts/run-docs native DISABLED"
# .ci/scripts/run-docs native

echo "::group::Completion"
echo "tests complete"
Expand Down
44 changes: 34 additions & 10 deletions .github/workflows/run-readme-pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,21 @@ jobs:
gpu-arch-version: "12.4"
timeout: 60
script: |
echo "::group::Print machine info"
echo "::group::Print machine info and try install pip and/or pip3"
set -x
which pip || true
which pip3 || true
which conda || true
apt-get install pip3 pip || true
which pip || true
which pip3 || true
which conda || true
uname -a
echo "::endgroup::"

which pip || true
which pip3 || true
which conda || true
.ci/scripts/run-docs readme

echo "::group::Completion"
Expand All @@ -41,8 +52,13 @@ jobs:
gpu-arch-version: "12.4"
timeout: 60
script: |
echo "::group::Print machine info"
echo "::group::Print machine info and try install pip and/or pip3"
set -x
apt-get install pip3 pip || true
uname -a
which pip || true
which pip3 || true
which conda || true
echo "::endgroup::"

TORCHCHAT_DEVICE=cpu .ci/scripts/run-docs readme
Expand All @@ -63,7 +79,9 @@ jobs:
gpu-arch-version: "12.4"
timeout: 60
script: |
echo "::group::Print machine info"
echo "::group::Print machine info and try install pip and/or pip3"
set -x
apt-get install pip3 pip || true
uname -a
echo "::endgroup::"

Expand All @@ -85,7 +103,9 @@ jobs:
gpu-arch-version: "12.4"
timeout: 60
script: |
echo "::group::Print machine info"
echo "::group::Print machine info and try install pip and/or pip3"
set -x
apt-get install pip3 pip || true
uname -a
echo "::endgroup::"

Expand All @@ -106,7 +126,8 @@ jobs:
uname -a
echo "::endgroup::"

.ci/scripts/run-docs gguf
# failing
# .ci/scripts/run-docs gguf

echo "::group::Completion"
echo "tests complete"
Expand All @@ -128,7 +149,8 @@ jobs:
uname -a
echo "::endgroup::"

TORCHCHAT_DEVICE=cpu .ci/scripts/run-docs gguf
# failing
# TORCHCHAT_DEVICE=cpu .ci/scripts/run-docs gguf

echo "::group::Completion"
echo "tests complete"
Expand All @@ -151,7 +173,8 @@ jobs:
uname -a
echo "::endgroup::"

.ci/scripts/run-docs advanced
# failing
# .ci/scripts/run-docs advanced

echo "::group::Completion"
echo "tests complete"
Expand All @@ -174,7 +197,8 @@ jobs:
uname -a
echo "::endgroup::"

TORCHCHAT_DEVICE=cpu .ci/scripts/run-docs advanced
# failing
# TORCHCHAT_DEVICE=cpu .ci/scripts/run-docs advanced

echo "::group::Completion"
echo "tests complete"
Expand All @@ -196,7 +220,7 @@ jobs:
uname -a
echo "::endgroup::"

.ci/scripts/run-docs evaluation
# .ci/scripts/run-docs evaluation

echo "::group::Completion"
echo "tests complete"
Expand All @@ -218,7 +242,7 @@ jobs:
uname -a
echo "::endgroup::"

TORCHCHAT_DEVICE=cpu .ci/scripts/run-docs evaluation
# TORCHCHAT_DEVICE=cpu .ci/scripts/run-docs evaluation

echo "::group::Completion"
echo "tests complete"
Expand Down
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,10 +90,11 @@ cd torchchat
python3 -m venv .venv
source .venv/bin/activate
./install/install_requirements.sh
mkdir exportedModels
```
[skip default]: end

[shell default]: ./install/install_requirements.sh
[shell default]: mkdir exportedModels; ./install/install_requirements.sh

## Commands

Expand Down Expand Up @@ -238,7 +239,9 @@ python3 torchchat.py server llama3.1
```
[skip default]: end

<!==
[shell default]: python3 torchchat.py server llama3.1 & server_pid=$! ; sleep 90 # wait for server to be ready to accept requests
-->

In another terminal, query the server using `curl`. Depending on the model configuration, this query might take a few minutes to respond.

Expand Down Expand Up @@ -279,7 +282,9 @@ curl http://127.0.0.1:5000/v1/chat/completions \

[skip default]: end

<!--
[shell default]: kill ${server_pid}
-->

</details>

Expand Down
14 changes: 13 additions & 1 deletion docs/ADVANCED-USERS.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,8 @@ preparatory step:
You can set these variables as follows for the exemplary model15M
model from Andrej Karpathy's tinyllamas model family:

[shell default]: pip install wget

```
MODEL_NAME=stories15M
MODEL_DIR=~/checkpoints/${MODEL_NAME}
Expand All @@ -185,6 +187,16 @@ MODEL_OUT=~/torchchat-exports

mkdir -p ${MODEL_DIR}
mkdir -p ${MODEL_OUT}

# Change to the MODELDIR directory
pushd ${MODEL_DIR}

# Download the files for stories15M using wget
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.pt
wget https://github.com/karpathy/llama2.c/raw/refs/heads/master/tokenizer.model

# Go back to the original directory
popd
```

When we export models with AOT Inductor for servers and desktops, and
Expand Down Expand Up @@ -335,7 +347,7 @@ tests against the exported model with the same interface, and support
additional experiments to confirm model quality and speed.

```
python3 torchchat.py generate --device [ cuda | cpu ] --dso-path ${MODEL_NAME}.so --prompt "Once upon a time"
python3 torchchat.py generate --device [ cuda | cpu ] --checkpoint-path ${MODEL_PATH} --dso-path ${MODEL_NAME}.so --prompt "Once upon a time"
```


Expand Down
2 changes: 2 additions & 0 deletions docs/multimodal.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,3 +111,5 @@ One of the goals of torchchat is to support various execution modes for every mo
- **[ExecuTorch](https://github.com/pytorch/executorch)**: On-device (Edge) inference

In addition, we are in the process of integrating with [lm_evaluation_harness](https://github.com/EleutherAI/lm-evaluation-harness) for multimodal model evaluation.

[end default]: end
3 changes: 2 additions & 1 deletion docs/native-execution.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ python3 torchchat.py export stories15M --output-dso-path ./model.so
We can now execute the runner with:

[shell default]: pip install wget

```
curl -OL https://github.com/karpathy/llama2.c/raw/master/tokenizer.model
./cmake-out/aoti_run ./model.so -z ./tokenizer.model -l 2 -i "Once upon a time"
Expand All @@ -109,7 +110,7 @@ installed ExecuTorch, running the commands below will build the
runner, without re-installing ExecuTorch from source:

```
# Pull submodules (re2, abseil) for Tiktoken
# Pull submodules re2 and abseil for Tiktoken
git submodule sync
git submodule update --init

Expand Down
Loading
Loading