forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync with 0.7.2 #315
Merged
Merged
sync with 0.7.2 #315
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
dtrifiro
commented
Feb 7, 2025
- 0.7.2 changelog: https://github.com/vllm-project/vllm/releases/v0.7.2
- Dockerfile.ubi: bump flashinfer to v0.2.0.post2
Word "evolved" was mistyped Signed-off-by: Vicente Herrera <[email protected]> --------- Signed-off-by: Vicente Herrera <[email protected]>
Fix vllm-project#12647 The `get_quant_method` of `moe_wna16` always return moe method, GPTQ-based linear method or AWQ-based linear method, even when the target module is attention layer. https://github.com/vllm-project/vllm/blob/baeded25699f9f4851843306f27f685c4d4ee7c5/vllm/attention/layer.py#L86-L92 Signed-off-by: Jinzhen Lin <[email protected]>
I noticed during testing that I was getting a lot of these deprecation warnings about `local_lora_path`: ``` DeprecationWarning: The 'lora_local_path' attribute is deprecated and will be removed in a future version. Please use 'lora_path' instead. ``` The check used for emitting this warning was always True, even when the parameter was not actually specified. It will always be in `__struct_fields__`. We should be checking for a non-None value, instead. Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: Russell Bryant <[email protected]>
A small optimization to avoid creating a new `ConstantList` every time `request.kv_block_hashes` is used. Signed-off-by: Woosuk Kwon <[email protected]>
…anager (vllm-project#12608) As mentioned in RFC vllm-project#12254, this PR achieves the task: combine allocate_slots and append_slots. There should be no functionality change, except that in decode, also raise exception when num_tokens is zero (like prefill), and change the unit test case accordingly. @comaniac @rickyyx @WoosukKwon @youkaichao @heheda12345 @simon-mo --------- Signed-off-by: Shawn Du <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
…lm-project#12628) - **Add SPDX license headers to python source files** - **Check for SPDX headers using pre-commit** commit 9d7ef44 Author: Russell Bryant <[email protected]> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <[email protected]> commit 5a1cf1c Author: Russell Bryant <[email protected]> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <[email protected]> --------- Signed-off-by: Russell Bryant <[email protected]>
…ct#12667) As more and more people are trying deepseek models with multi-node inference, vllm-project#7815 becomes more frequent. Let's give clear message to users. Signed-off-by: youkaichao <[email protected]>
sgl_moe_align_block_size is based on: sgl-project/sglang@ded9fcd moe_align_block_size is based on: sgl-project/sglang@ba5112f Signed-off-by: Yang Chen <[email protected]>
…oject#12669) When people use deepseek models, they find that they need to solve cv2 version conflict, see https://zhuanlan.zhihu.com/p/21064432691 . I added the check, and make all imports of `cv2` lazy. --------- Signed-off-by: youkaichao <[email protected]>
…roject#12666) Thanks @kylesayrs for catching this!
…llm-project#12570) Fix to AWQ quant loading of the new R1 model The new optimized MoE kernels for a large number of experts `moe_wn16` uses AWQ quant which requires the attention layers to be in 16bit The current merge has broken this, and the `get_quant_method` must return None for it to work correctly again --------- Signed-off-by: Srikanth Srinivas <[email protected]> Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Beim <[email protected]> Signed-off-by: [email protected] <[email protected]> Signed-off-by: mgoin <[email protected]> Signed-off-by: npanpaliya <[email protected]> Signed-off-by: Aleksandr Malyshev <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Signed-off-by: simon-mo <[email protected]> Signed-off-by: Cody Yu <[email protected]> Signed-off-by: Chen Zhang <[email protected]> Signed-off-by: Tyler Michael Smith <[email protected]> Signed-off-by: Ryan N <[email protected]> Signed-off-by: Brian Dellabetta <[email protected]> Signed-off-by: Jee Jee Li <[email protected]> Signed-off-by: Rahul Tuli <[email protected]> Signed-off-by: Russell Bryant <[email protected]> Signed-off-by: simon-mo <[email protected]> Signed-off-by: Vicente Herrera <[email protected]> Signed-off-by: Jinzhen Lin <[email protected]> Signed-off-by: Woosuk Kwon <[email protected]> Signed-off-by: Shawn Du <[email protected]> Signed-off-by: Kunshang Ji <[email protected]> Signed-off-by: youkaichao <[email protected]> Co-authored-by: Harry Mellor <[email protected]> Co-authored-by: Beim <[email protected]> Co-authored-by: Robert Shaw <[email protected]> Co-authored-by: mgoin <[email protected]> Co-authored-by: simon-mo <[email protected]> Co-authored-by: Nishidha <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]> Co-authored-by: Aleksandr Malyshev <[email protected]> Co-authored-by: Aleksandr Malyshev <[email protected]> Co-authored-by: Woosuk Kwon <[email protected]> Co-authored-by: simon-mo <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Zhuohan Li <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: Alexander Matveev <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Cody Yu <[email protected]> Co-authored-by: Chen Zhang <[email protected]> Co-authored-by: Kevin H. Luu <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: Ryan Nguyen <[email protected]> Co-authored-by: Brian Dellabetta <[email protected]> Co-authored-by: fade_away <[email protected]> Co-authored-by: weilong.yu <[email protected]> Co-authored-by: Jee Jee Li <[email protected]> Co-authored-by: Eldar Kurtic <[email protected]> Co-authored-by: Rahul Tuli <[email protected]> Co-authored-by: Russell Bryant <[email protected]> Co-authored-by: Vicente Herrera <[email protected]> Co-authored-by: Jinzhen Lin <[email protected]> Co-authored-by: Shawn Du <[email protected]> Co-authored-by: Kunshang Ji <[email protected]> Co-authored-by: youkaichao <[email protected]>
fixes problems like vllm-project#12635 and vllm-project#12636 and vllm-project#12565 --------- Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
# Adds support for `transformers` as a backend Following huggingface/transformers#35235, a bunch of models should already be supported, we are ramping up support for more models. Thanks @Isotr0py for the TP support, and @hmellor for his help as well! This includes: - `trust_remote_code=True` support: any model on the hub, if it implements attention the correct way can be natively supported!! - tensor parallel support --------- Signed-off-by: Harry Mellor <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Harry Mellor <[email protected]> Co-authored-by: Isotr0py <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: Isotr0py <[email protected]>
…#12694) Signed-off-by: Russell Bryant <[email protected]>
…aled mm (vllm-project#12696) Signed-off-by: Tyler Michael Smith <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Cody Yu <[email protected]>
…project#12415) Signed-off-by: Cody Yu <[email protected]>
…ct#12621) Signed-off-by: Russell Bryant <[email protected]>
…fig (vllm-project#12710) Signed-off-by: mgoin <[email protected]>
Signed-off-by: Thomas Parnell <[email protected]>
…essed Tensors configs (vllm-project#12711)
Signed-off-by: Hongxia Yang <[email protected]> Co-authored-by: Matthew Wong <[email protected]>
Signed-off-by: Jee Jee Li <[email protected]>
…oject#12553) Signed-off-by: DarkLight1337 <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]>
…or_pytorch'' for --tensor-parallel-size more than 1 (vllm-project#12546)
Signed-off-by: youkaichao <[email protected]>
Merged via CLI script
Signed-off-by: Lu Fang <[email protected]>
Signed-off-by: youkaichao <[email protected]>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dtrifiro The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
ae28d3e
to
6ec0863
Compare
groenenboomj
pushed a commit
that referenced
this pull request
Feb 24, 2025
…with the existing tuning, prior to moving all the way forward to release/3.2.x; Using the correct hipblaslt version in the name (#315)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.