Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small Demo #303

Open
reedkotler opened this issue Oct 1, 2023 · 12 comments
Open

Small Demo #303

reedkotler opened this issue Oct 1, 2023 · 12 comments

Comments

@reedkotler
Copy link

What would be really helpful is a small test case that can train in 30 minutes on a modest machine. It does not have to produce a useful model but is something that one can see end to end without days of training. I'm willing to help make the test case if I can get some help.

@mtrofin
Copy link
Collaborator

mtrofin commented Oct 2, 2023

Yup, and we could even set it up in the CI as a nightly. Do you have a project in mind to play "corpus" - if not, llvm itself could be it (if it wouldn't make things more confusing by playing 2 roles)

@reedkotler
Copy link
Author

reedkotler commented Oct 2, 2023 via email

@mtrofin
Copy link
Collaborator

mtrofin commented Oct 2, 2023

Right, so we could use llvm itself as the corpus donor. Sticking to inlining for size - because regalloc would need profiles (and I think for end to end demo-ing, inline for size is a fine example). These are the steps:

  • git clone llvm to, for instance, /work/llvm-project, cd /work/llvm-project, mkdir tflite-build && cd tflite-build. cmake will need to additionally have -DLLVM_ENABLE_PROJECTS=clang. The goal of this step is to build the clang we'll use for training, but we'll also use this clang for corpus collection. ninja clang llvm-objcopy (we need objcopy to extract the corpus)
  • cd .. && mkdir corpus-build && cd corpus-build
  • cmake -GNinja -DCMAKE_BUILD_TYPE=MinSizeRel -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DCMAKE_CXX_COMPILER=/work/llvm-project/build/bin/clang++ -DCMAKE_C_COMPILER=/work/llvm-project/build/bin/clang -DCMAKE_CXX_FLAGS="-Xclang=-fembed-bitcode=all" -DCMAKE_C_FLAGS="-Xclang=-fembed-bitcode=all" ../llvm. We don't bother trying to say we build for size, the goal is just to get to a corpus. Note this generates a compile_commands.json in that build dir.
  • ninja opt llc -> this is so we have some objects built.
  • assuming you git clone-d ml-compiler-opt under /work/ml-compiler-opt: cd /work/ml-compiler-opt then PYTHONPATH=$PYTHONPATH:. python3 compiler_opt/tools/extract_ir.py --input /work/llvm-project/corpus-build/compile_commands.json --input_type json --llvm_objcopy_path /work/llvm-project/build/bin/llvm-objcopy --output_dir /tmp/corpus
  • This extracts the corpus in /tmp/corpus, after that the training steps are the same as in the demos - i.e. collect a default trace.. all that.

@reedkotler
Copy link
Author

reedkotler commented Oct 2, 2023 via email

@pshung
Copy link

pshung commented Oct 24, 2023

What training performance could I expect if only using LLVM as the training corpus?

@mtrofin
Copy link
Collaborator

mtrofin commented Oct 24, 2023

What do you mean by "training performance": time it takes to train a model? Or model effectiveness (i.e. how much that model can shrink binaries)? Either way, I think @reedkotler did this recently (llvm as corpus), perhaps he can comment on both.

Fuchsia's case discussed in the demo used to be half a day, but when we doubled the feature count, so did the training time. IIRC they get ~3% shrinkage in their overall image.

@pshung
Copy link

pshung commented Oct 31, 2023

Thanks for your instruction about using LLVM as a training corpus. I was able to run the inlining training.
However, LLVM corpus includes about 2080 modules only.
so, I wonder if the size reduction and the generalization ability against the performance figure mentioned in the paper (It claims 28000 IR modules to reach that performance).

@mtrofin
Copy link
Collaborator

mtrofin commented Nov 1, 2023

In general, we observed that having more modules from more diverse projects, during training, would help a model generalize better, but just like with manual heuristics, without trying it out, it's hard to tell what to expect for a specific case.

@ioana-ghiban-arm
Copy link

ioana-ghiban-arm commented Sep 24, 2024

Hello!

What would be the steps for 'Deploying and using the new policy' when using LLVM itself as the corpus donor? I have hopefully trained the optimized model from the warmstart model, but in $OUTPUT_DIR I only see policy dir, no saved_policy, not sure if that can be the reason for which I can't build the release. I configured it this way:
cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang" -DLLVM_INLINER_MODEL_PATH=$OUTPUT_DIR/policy -DTENSORFLOW_AOT_PATH=${TENSORFLOW_AOT_PATH} $LLVM_SRCDIR/llvm
It seems there is no model in policy.
Another deviation from the demo instructions in my setup is that I hardcoded $TENSORFLOW_AOT_PATH to be able to generate the ninja.

Any suggestions would be much appreciated. Thanks.

@mtrofin
Copy link
Collaborator

mtrofin commented Sep 24, 2024

[...] but in $OUTPUT_DIR I only see policy dir, no saved_policy [...]

let's focus on this first. How long did train_locally.py take? (should be a good nr of hours); if no log is visible, can you add --alsologtostderr and see what it dumps - it should report it compiled some non-zero modules at each step.

(My suspicion is that there may be an issue that makes each compile step fail => no actual training => no model)

@ioana-ghiban-arm
Copy link

ioana-ghiban-arm commented Oct 9, 2024

Indeed, train_locally.py didn't run for long enough. The time it usually takes to create saved_policy seems to be around 10 hrs. However, I see the same performance (no binary size change) with this final saved_policy as I did with an intermediary one from some folder in policy. As mentioned here, in the case of using clang for both training and corpus collection, I've only noticed a difference in the size of the llvm-config binary, marginally larger when created by clang built with -DLLVM_INLINER_MODEL_PATH=/path/to/model/saved_policy.
Do you have any suggestions for tweaking my setup? I could share the steps I've taken but that might get too verbose for an issue comment. I've basically filled in the steps I was missing from the directions above with what seemed most sensible in either the inlining or regalloc demos.

@mtrofin
Copy link
Collaborator

mtrofin commented Oct 9, 2024

OK, that's weird. Let's first make sure the use side of things - i.e. how the model is ingested and used - is set up right. Then we can look at the training side.

I tried the published size model, here are my exact steps:

For brevity, I used my paths - I have a git repo for llvm under /work/llvm-project and I have a python 3.10 env set up under /work/python3.10, so those paths need replacing. The toolchain used to bootstrap clang shouldn't matter.

  1. Build the compiler we'll use to then build other binaries
cd /tmp
wget https://github.com/google/ml-compiler-opt/releases/download/inlining-Oz-v1.1/inlining-Oz-99f0063-v1.1.tar.gz
tar xvfz inlining-Oz-99f0063-v1.1.tar.gz
ls /tmp/model
cd /work/llvm-project
git checkout main && git pull && git checkout 665457815f11118f7e755a471f33606c8562a4be
mkdir build && cd build
cmake -GNinja -DCMAKE_BUILD_TYPE=Release ../llvm  -DLLVM_ENABLE_PROJECTS=clang  -DTENSORFLOW_AOT_PATH=/work/python3.10/lib/python3.10/site-packages/tensorflow -DLLVM_INLINER_MODEL_PATH=/tmp/model
ninja clang
  1. Build the baseline
cd ../ && mkdir build-base && cd build-base
cmake -GNinja -DCMAKE_BUILD_TYPE=MinSizeRel ../llvm  -DLLVM_ENABLE_PROJECTS=clang  -DTENSORFLOW_AOT_PATH=/work/python3.10/lib/python3.10/site-packages/tensorflow -DLLVM_INLINER_MODEL_PATH=/tmp/model -DCMAKE_C_COMPILER=/work/llvm-project/build/bin/clang -DCMAKE_CXX_COMPILER=/work/llvm-project/build/bin/clang++ -DCMAKE_EXPORT_COMPILE_COMMANDS=On

Check compile-commands.json to make sure it's using the previously built clang, then ninja clang (well, just ninja I guess - but I only built clang for validation)

  1. Build the experiment
cd ../ && mkdir build-exp && cd build-exp
cmake -GNinja -DCMAKE_BUILD_TYPE=MinSizeRel ../llvm  -DLLVM_ENABLE_PROJECTS=clang  -DTENSORFLOW_AOT_PATH=/work/python3.10/lib/python3.10/site-packages/tensorflow -DLLVM_INLINER_MODEL_PATH=/tmp/model -DCMAKE_C_COMPILER=/work/llvm-project/build/bin/clang -DCMAKE_CXX_COMPILER=/work/llvm-project/build/bin/clang++ -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DCMAKE_C_FLAGS="-mllvm -enable-ml-inliner=release" -DCMAKE_CXX_FLAGS="-mllvm -enable-ml-inliner=release"

I'd check again compile-commands.json to make sure it's using the previously built clang and that the flags are right (i.e. we compile with -mllvm -enable-ml-inliner=release). ninja clang again.

# [email protected] in /work/llvm-project on git:main o [8:05:40]
$ ls -l build-base/bin/clang-20
-rwxr-x--- 1 mtrofin primarygroup 179609320 Oct  9 07:57 build-base/bin/clang-20

$ ls -l build-exp/bin/clang-20
-rwxr-x--- 1 mtrofin primarygroup 158697880 Oct  9 08:05 build-exp/bin/clang-20

[8:07:54]

so that's about 12% size savings.

A possible gotcha: are you passing -mllvm -enable-ml-inliner=release? Are you building -Os or -Oz? (this latter is a weaker problem, the former is critical)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants