Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to visualize network? #2501

Closed
youki-sada opened this issue Nov 26, 2024 · 7 comments
Closed

How to visualize network? #2501

youki-sada opened this issue Nov 26, 2024 · 7 comments
Labels
question Further information is requested triaged Issue has been triaged by maintainers

Comments

@youki-sada
Copy link

Is there any tool or option to visualize trt engine of LLMs? I believe TREx doen't support LLM and also trtllm-buil --visualize_network doesn't work.

@hello-11 hello-11 added question Further information is requested triaged Issue has been triaged by maintainers labels Dec 2, 2024
@wili-65535
Copy link

There are several way to visualize the network.

  1. Using trtllm-build --log_level=verbose, you can get detailed layer information from the output log by searching "Engine Layer Information:".

  2. Using trtllm-build --visualize_network, but what is the exact error information you meet? Is something like "str object has no attribute of 'name'?"
    That is a known issue, we will fix it in a later commit. Now we can fix it temporarily by this:
    1. Find the file network.py in the installation directory of tensorrt_llm, for me as an example, "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/network.py"
    2. Find the function "to_onnx"
    3. Add one line path = Path(path) after the line trt_network = self.trt_network.
    4. Go back to use trtllm-build --visualize_network

  3. benchmarks/python/benchmark.py can print engine information per layer by using parameter --dump_layer_info.

@youki-sada
Copy link
Author

@wili-65535
Thank you. As for solution1 and 2, I could get layer information according to your procedure. However, I prefer graph visualization by --visualize_network so that I can understand TRT computational graph easily.

  1. Using trtllm-build --visualize_network, but what is the exact error information you meet? Is something like "str object has no attribute of 'name'?"

trtllm-build --checkpoint trt_models/llama3.2-1b-hf_fp16 --output_dir /trt_engines/tllm_llama3.2_1b_inst_fp16 --gemm_plugin auto --max_input_len 2048 --max_batch_size 1 --visualize_network didn't provide onnx file or svg one.

$ ls /trt_engines/tllm_llama3.2_1b_inst_fp16
config.json  rank0.engine
[TensorRT-LLM] TensorRT-LLM version: 0.14.0
[12/03/2024-02:35:55] [TRT-LLM] [I] Set bert_attention_plugin to auto.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set gpt_attention_plugin to auto.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set gemm_plugin to auto.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set gemm_swiglu_plugin to None.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set fp8_rowwise_gemm_plugin to None.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set nccl_plugin to auto.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set lookup_plugin to None.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set lora_plugin to None.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set moe_plugin to auto.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set mamba_conv1d_plugin to auto.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set low_latency_gemm_plugin to None.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set context_fmha to True.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set bert_context_fmha_fp32_acc to False.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set remove_input_padding to True.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set reduce_fusion to False.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set enable_xqa to True.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set tokens_per_block to 64.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set use_paged_context_fmha to False.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set use_fp8_context_fmha to False.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set multiple_profiles to False.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set paged_state to True.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set streamingllm to False.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set use_fused_mlp to True.
[12/03/2024-02:35:55] [TRT-LLM] [W] Implicitly setting LLaMAConfig.producer = {'name': 'modelopt', 'version': '0.19.0'}
[12/03/2024-02:35:55] [TRT-LLM] [W] Implicitly setting LLaMAConfig.bias = False
[12/03/2024-02:35:55] [TRT-LLM] [W] Implicitly setting LLaMAConfig.rotary_pct = 1.0
[12/03/2024-02:35:55] [TRT-LLM] [W] Implicitly setting LLaMAConfig.rank = 0
[12/03/2024-02:35:55] [TRT-LLM] [W] Implicitly setting LLaMAConfig.decoder = llama
[12/03/2024-02:35:55] [TRT-LLM] [W] Implicitly setting LLaMAConfig.rmsnorm = True
[12/03/2024-02:35:55] [TRT-LLM] [W] Implicitly setting LLaMAConfig.lm_head_bias = False
[12/03/2024-02:35:56] [TRT-LLM] [I] Compute capability: (8, 6)
[12/03/2024-02:35:56] [TRT-LLM] [I] SM count: 82
[12/03/2024-02:35:56] [TRT-LLM] [I] SM clock: 2100 MHz
[12/03/2024-02:35:56] [TRT-LLM] [I] int4 TFLOPS: 705
[12/03/2024-02:35:56] [TRT-LLM] [I] int8 TFLOPS: 352
[12/03/2024-02:35:56] [TRT-LLM] [I] fp8 TFLOPS: 0
[12/03/2024-02:35:56] [TRT-LLM] [I] float16 TFLOPS: 176
[12/03/2024-02:35:56] [TRT-LLM] [I] bfloat16 TFLOPS: 176
[12/03/2024-02:35:56] [TRT-LLM] [I] float32 TFLOPS: 88
[12/03/2024-02:35:56] [TRT-LLM] [I] Total Memory: 24 GiB
[12/03/2024-02:35:56] [TRT-LLM] [I] Memory clock: 9751 MHz
[12/03/2024-02:35:56] [TRT-LLM] [I] Memory bus width: 384
[12/03/2024-02:35:56] [TRT-LLM] [I] Memory bandwidth: 936 GB/s
[12/03/2024-02:35:56] [TRT-LLM] [I] NVLink is active: False
[12/03/2024-02:35:56] [TRT-LLM] [I] PCIe speed: 2500 Mbps
[12/03/2024-02:35:56] [TRT-LLM] [I] PCIe link width: 16
[12/03/2024-02:35:56] [TRT-LLM] [I] PCIe bandwidth: 5 GB/s
[12/03/2024-02:35:56] [TRT-LLM] [I] Set dtype to float16.
[12/03/2024-02:35:56] [TRT-LLM] [I] Set paged_kv_cache to True.
[12/03/2024-02:35:56] [TRT-LLM] [W] Overriding paged_state to False
[12/03/2024-02:35:56] [TRT-LLM] [I] Set paged_state to False.
[12/03/2024-02:35:56] [TRT-LLM] [W] max_seq_len is scaled to 4194304 by rotary scaling 32.0
[12/03/2024-02:35:56] [TRT-LLM] [I] max_seq_len is not specified, using deduced value 4194304
[12/03/2024-02:35:56] [TRT-LLM] [W] remove_input_padding is enabled, while opt_num_tokens is not set, setting to max_batch_size*max_beam_width.

[12/03/2024-02:35:56] [TRT-LLM] [W] padding removal and fMHA are both enabled, max_input_len is not required and will be ignored
[12/03/2024-02:36:06] [TRT] [I] [MemUsageChange] Init CUDA: CPU +15, GPU +0, now: CPU 1245, GPU 263 (MiB)
[12/03/2024-02:36:09] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +2133, GPU +396, now: CPU 3533, GPU 659 (MiB)
[12/03/2024-02:36:09] [TRT-LLM] [I] Set nccl_plugin to None.
[12/03/2024-02:36:09] [TRT-LLM] [E] Failed to import graphviz, please install graphviz to enable Network.to_dot()
[12/03/2024-02:36:09] [TRT-LLM] [I] Total time of constructing network from module object 13.18428659439087 seconds
[12/03/2024-02:36:09] [TRT-LLM] [I] Total optimization profiles added: 1
[12/03/2024-02:36:09] [TRT-LLM] [I] Total time to initialize the weights in network Unnamed Network 0: 00:00:00
[12/03/2024-02:36:09] [TRT-LLM] [I] Build TensorRT engine Unnamed Network 0
[12/03/2024-02:36:09] [TRT] [W] Unused Input: position_ids
[12/03/2024-02:36:09] [TRT] [W] [RemoveDeadLayers] Input Tensor position_ids is unused or used only at compile-time, but is not being removed.
[12/03/2024-02:36:09] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored.
[12/03/2024-02:36:09] [TRT] [I] Compiler backend is used during engine build.
[12/03/2024-02:36:13] [TRT] [I] [GraphReduction] The approximate region cut reduction algorithm is called.
[12/03/2024-02:36:13] [TRT] [I] Detected 16 inputs and 1 output network tensors.
[12/03/2024-02:36:30] [TRT] [I] Total Host Persistent Memory: 46400 bytes
[12/03/2024-02:36:30] [TRT] [I] Total Device Persistent Memory: 0 bytes
[12/03/2024-02:36:30] [TRT] [I] Max Scratch Memory: 67141632 bytes
[12/03/2024-02:36:30] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 256 steps to complete.
[12/03/2024-02:36:30] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 19.9799ms to assign 17 blocks to 256 nodes requiring 469768704 bytes.
[12/03/2024-02:36:30] [TRT] [I] Total Activation Memory: 469768192 bytes
[12/03/2024-02:36:35] [TRT] [I] Total Weights Memory: 3030520448 bytes
[12/03/2024-02:36:35] [TRT] [I] Compiler backend is used during engine execution.
[12/03/2024-02:36:35] [TRT] [I] Engine generation completed in 25.5894 seconds.
[12/03/2024-02:36:35] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 32 MiB, GPU 2891 MiB
[12/03/2024-02:36:37] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 11314 MiB
[12/03/2024-02:36:37] [TRT-LLM] [I] Total time of building Unnamed Network 0: 00:00:28
[12/03/2024-02:36:37] [TRT] [I] Serialized 27 bytes of code generator cache.
[12/03/2024-02:36:37] [TRT] [I] Serialized 138954 bytes of compilation cache.
[12/03/2024-02:36:37] [TRT] [I] Serialized 9 timing cache entries
[12/03/2024-02:36:37] [TRT-LLM] [I] Timing cache serialized to model.cache
[12/03/2024-02:36:37] [TRT-LLM] [I] Build phase peak memory: 11316.69 MB, children: 25.25 MB
[12/03/2024-02:36:37] [TRT-LLM] [I] Serializing engine to /trt_engines/tllm_llama3.2_1b_inst_fp16/rank0.engine...
[12/03/2024-02:36:40] [TRT-LLM] [I] Engine serialized. Total time: 00:00:02

@wili-65535
Copy link

Could you find a file "rank0.onnx" somewhere after running the command above, it might not be in the directory of the output engine.

@youki-sada
Copy link
Author

No I couldn't find it by below commands.

$ find . -name rank0.onnx
$ find /trt_engines -name rank0.onnx

@wili-65535
Copy link

How about find / -name rank0.onnx? The path of the output file was hard-code in current tensorrt_llm and I'm not sure where it should be if not building from source code.

@youki-sada
Copy link
Author

I couldn't still find it.

find / -name 'rank0.onnx' 2>/dev/null

@youki-sada
Copy link
Author

I successfully visualized my LLMs by using trtexec and NVIDIA Nsight Deep Learning Designer. @wili-65535 Thank you for your kind support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants