Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Bumps up Dockerfile to TRTLLM v0.12.
Important information
This bump up is different from earlier ones - there are some things that I could not test yet due to some technical issues with installing tensorrt_llm for breaking changes that require new binaries. It seemed to be missing a binary I could not provide. Hence, due to this change, our nightly container is prematurely sent to v12, so it may have some bugs for around a day or so - I will fix all remaining issues and test as quickly as possible.
To clarify, the bugs left are related to how the code for tp_size, pp_size, etc. is written. It should work regardless, but I just need to double check.
I have found that the other breaking changes including use_custom_all_reduce and max_output_len are non-blocking pending changes (they do not affect functionality but I will be cleaning up the code later).