How to visualize network? #2501

youki-sada · 2024-11-26T08:21:46Z

Is there any tool or option to visualize trt engine of LLMs? I believe TREx doen't support LLM and also trtllm-buil --visualize_network doesn't work.

The text was updated successfully, but these errors were encountered:

wili-65535 · 2024-12-03T00:54:40Z

There are several way to visualize the network.

Using trtllm-build --log_level=verbose, you can get detailed layer information from the output log by searching "Engine Layer Information:".
Using trtllm-build --visualize_network, but what is the exact error information you meet? Is something like "str object has no attribute of 'name'?"
That is a known issue, we will fix it in a later commit. Now we can fix it temporarily by this:
1. Find the file network.py in the installation directory of tensorrt_llm, for me as an example, "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/network.py"
2. Find the function "to_onnx"
3. Add one line path = Path(path) after the line trt_network = self.trt_network.
4. Go back to use trtllm-build --visualize_network
benchmarks/python/benchmark.py can print engine information per layer by using parameter --dump_layer_info.

youki-sada · 2024-12-03T02:51:51Z

@wili-65535
Thank you. As for solution1 and 2, I could get layer information according to your procedure. However, I prefer graph visualization by --visualize_network so that I can understand TRT computational graph easily.

Using trtllm-build --visualize_network, but what is the exact error information you meet? Is something like "str object has no attribute of 'name'?"

trtllm-build --checkpoint trt_models/llama3.2-1b-hf_fp16 --output_dir /trt_engines/tllm_llama3.2_1b_inst_fp16 --gemm_plugin auto --max_input_len 2048 --max_batch_size 1 --visualize_network didn't provide onnx file or svg one.

$ ls /trt_engines/tllm_llama3.2_1b_inst_fp16
config.json  rank0.engine

[TensorRT-LLM] TensorRT-LLM version: 0.14.0
[12/03/2024-02:35:55] [TRT-LLM] [I] Set bert_attention_plugin to auto.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set gpt_attention_plugin to auto.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set gemm_plugin to auto.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set gemm_swiglu_plugin to None.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set fp8_rowwise_gemm_plugin to None.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set nccl_plugin to auto.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set lookup_plugin to None.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set lora_plugin to None.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set moe_plugin to auto.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set mamba_conv1d_plugin to auto.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set low_latency_gemm_plugin to None.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set context_fmha to True.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set bert_context_fmha_fp32_acc to False.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set remove_input_padding to True.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set reduce_fusion to False.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set enable_xqa to True.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set tokens_per_block to 64.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set use_paged_context_fmha to False.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set use_fp8_context_fmha to False.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set multiple_profiles to False.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set paged_state to True.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set streamingllm to False.
[12/03/2024-02:35:55] [TRT-LLM] [I] Set use_fused_mlp to True.
[12/03/2024-02:35:55] [TRT-LLM] [W] Implicitly setting LLaMAConfig.producer = {'name': 'modelopt', 'version': '0.19.0'}
[12/03/2024-02:35:55] [TRT-LLM] [W] Implicitly setting LLaMAConfig.bias = False
[12/03/2024-02:35:55] [TRT-LLM] [W] Implicitly setting LLaMAConfig.rotary_pct = 1.0
[12/03/2024-02:35:55] [TRT-LLM] [W] Implicitly setting LLaMAConfig.rank = 0
[12/03/2024-02:35:55] [TRT-LLM] [W] Implicitly setting LLaMAConfig.decoder = llama
[12/03/2024-02:35:55] [TRT-LLM] [W] Implicitly setting LLaMAConfig.rmsnorm = True
[12/03/2024-02:35:55] [TRT-LLM] [W] Implicitly setting LLaMAConfig.lm_head_bias = False
[12/03/2024-02:35:56] [TRT-LLM] [I] Compute capability: (8, 6)
[12/03/2024-02:35:56] [TRT-LLM] [I] SM count: 82
[12/03/2024-02:35:56] [TRT-LLM] [I] SM clock: 2100 MHz
[12/03/2024-02:35:56] [TRT-LLM] [I] int4 TFLOPS: 705
[12/03/2024-02:35:56] [TRT-LLM] [I] int8 TFLOPS: 352
[12/03/2024-02:35:56] [TRT-LLM] [I] fp8 TFLOPS: 0
[12/03/2024-02:35:56] [TRT-LLM] [I] float16 TFLOPS: 176
[12/03/2024-02:35:56] [TRT-LLM] [I] bfloat16 TFLOPS: 176
[12/03/2024-02:35:56] [TRT-LLM] [I] float32 TFLOPS: 88
[12/03/2024-02:35:56] [TRT-LLM] [I] Total Memory: 24 GiB
[12/03/2024-02:35:56] [TRT-LLM] [I] Memory clock: 9751 MHz
[12/03/2024-02:35:56] [TRT-LLM] [I] Memory bus width: 384
[12/03/2024-02:35:56] [TRT-LLM] [I] Memory bandwidth: 936 GB/s
[12/03/2024-02:35:56] [TRT-LLM] [I] NVLink is active: False
[12/03/2024-02:35:56] [TRT-LLM] [I] PCIe speed: 2500 Mbps
[12/03/2024-02:35:56] [TRT-LLM] [I] PCIe link width: 16
[12/03/2024-02:35:56] [TRT-LLM] [I] PCIe bandwidth: 5 GB/s
[12/03/2024-02:35:56] [TRT-LLM] [I] Set dtype to float16.
[12/03/2024-02:35:56] [TRT-LLM] [I] Set paged_kv_cache to True.
[12/03/2024-02:35:56] [TRT-LLM] [W] Overriding paged_state to False
[12/03/2024-02:35:56] [TRT-LLM] [I] Set paged_state to False.
[12/03/2024-02:35:56] [TRT-LLM] [W] max_seq_len is scaled to 4194304 by rotary scaling 32.0
[12/03/2024-02:35:56] [TRT-LLM] [I] max_seq_len is not specified, using deduced value 4194304
[12/03/2024-02:35:56] [TRT-LLM] [W] remove_input_padding is enabled, while opt_num_tokens is not set, setting to max_batch_size*max_beam_width.

[12/03/2024-02:35:56] [TRT-LLM] [W] padding removal and fMHA are both enabled, max_input_len is not required and will be ignored
[12/03/2024-02:36:06] [TRT] [I] [MemUsageChange] Init CUDA: CPU +15, GPU +0, now: CPU 1245, GPU 263 (MiB)
[12/03/2024-02:36:09] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +2133, GPU +396, now: CPU 3533, GPU 659 (MiB)
[12/03/2024-02:36:09] [TRT-LLM] [I] Set nccl_plugin to None.
[12/03/2024-02:36:09] [TRT-LLM] [E] Failed to import graphviz, please install graphviz to enable Network.to_dot()
[12/03/2024-02:36:09] [TRT-LLM] [I] Total time of constructing network from module object 13.18428659439087 seconds
[12/03/2024-02:36:09] [TRT-LLM] [I] Total optimization profiles added: 1
[12/03/2024-02:36:09] [TRT-LLM] [I] Total time to initialize the weights in network Unnamed Network 0: 00:00:00
[12/03/2024-02:36:09] [TRT-LLM] [I] Build TensorRT engine Unnamed Network 0
[12/03/2024-02:36:09] [TRT] [W] Unused Input: position_ids
[12/03/2024-02:36:09] [TRT] [W] [RemoveDeadLayers] Input Tensor position_ids is unused or used only at compile-time, but is not being removed.
[12/03/2024-02:36:09] [TRT] [I] Global timing cache in use. Profiling results in this builder pass will be stored.
[12/03/2024-02:36:09] [TRT] [I] Compiler backend is used during engine build.
[12/03/2024-02:36:13] [TRT] [I] [GraphReduction] The approximate region cut reduction algorithm is called.
[12/03/2024-02:36:13] [TRT] [I] Detected 16 inputs and 1 output network tensors.
[12/03/2024-02:36:30] [TRT] [I] Total Host Persistent Memory: 46400 bytes
[12/03/2024-02:36:30] [TRT] [I] Total Device Persistent Memory: 0 bytes
[12/03/2024-02:36:30] [TRT] [I] Max Scratch Memory: 67141632 bytes
[12/03/2024-02:36:30] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 256 steps to complete.
[12/03/2024-02:36:30] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 19.9799ms to assign 17 blocks to 256 nodes requiring 469768704 bytes.
[12/03/2024-02:36:30] [TRT] [I] Total Activation Memory: 469768192 bytes
[12/03/2024-02:36:35] [TRT] [I] Total Weights Memory: 3030520448 bytes
[12/03/2024-02:36:35] [TRT] [I] Compiler backend is used during engine execution.
[12/03/2024-02:36:35] [TRT] [I] Engine generation completed in 25.5894 seconds.
[12/03/2024-02:36:35] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 32 MiB, GPU 2891 MiB
[12/03/2024-02:36:37] [TRT] [I] [MemUsageStats] Peak memory usage during Engine building and serialization: CPU: 11314 MiB
[12/03/2024-02:36:37] [TRT-LLM] [I] Total time of building Unnamed Network 0: 00:00:28
[12/03/2024-02:36:37] [TRT] [I] Serialized 27 bytes of code generator cache.
[12/03/2024-02:36:37] [TRT] [I] Serialized 138954 bytes of compilation cache.
[12/03/2024-02:36:37] [TRT] [I] Serialized 9 timing cache entries
[12/03/2024-02:36:37] [TRT-LLM] [I] Timing cache serialized to model.cache
[12/03/2024-02:36:37] [TRT-LLM] [I] Build phase peak memory: 11316.69 MB, children: 25.25 MB
[12/03/2024-02:36:37] [TRT-LLM] [I] Serializing engine to /trt_engines/tllm_llama3.2_1b_inst_fp16/rank0.engine...
[12/03/2024-02:36:40] [TRT-LLM] [I] Engine serialized. Total time: 00:00:02

wili-65535 · 2024-12-03T02:59:22Z

Could you find a file "rank0.onnx" somewhere after running the command above, it might not be in the directory of the output engine.

youki-sada · 2024-12-03T03:05:20Z

No I couldn't find it by below commands.

$ find . -name rank0.onnx
$ find /trt_engines -name rank0.onnx

wili-65535 · 2024-12-03T03:20:25Z

How about find / -name rank0.onnx? The path of the output file was hard-code in current tensorrt_llm and I'm not sure where it should be if not building from source code.

youki-sada · 2024-12-03T03:45:09Z

I couldn't still find it.

find / -name 'rank0.onnx' 2>/dev/null

youki-sada · 2024-12-06T08:17:43Z

I successfully visualized my LLMs by using trtexec and NVIDIA Nsight Deep Learning Designer. @wili-65535 Thank you for your kind support.

hello-11 added question Further information is requested triaged Issue has been triaged by maintainers labels Dec 2, 2024

youki-sada closed this as completed Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to visualize network? #2501

How to visualize network? #2501

youki-sada commented Nov 26, 2024

wili-65535 commented Dec 3, 2024

youki-sada commented Dec 3, 2024

wili-65535 commented Dec 3, 2024

youki-sada commented Dec 3, 2024

wili-65535 commented Dec 3, 2024

youki-sada commented Dec 3, 2024

youki-sada commented Dec 6, 2024

How to visualize network? #2501

How to visualize network? #2501

Comments

youki-sada commented Nov 26, 2024

wili-65535 commented Dec 3, 2024

youki-sada commented Dec 3, 2024

wili-65535 commented Dec 3, 2024

youki-sada commented Dec 3, 2024

wili-65535 commented Dec 3, 2024

youki-sada commented Dec 3, 2024

youki-sada commented Dec 6, 2024