update mscale calculation to keep back compatible with previous phi models#20
Closed
zelinms wants to merge 1175 commits intoxiaoxiawu-microsoft:mainfrom wenxcs-msft:dev/zelin/msft-phimoe-mscale
Commits
This pull request is big! We're only showing the most recent 250 commits
Commits on Jul 31, 2024
- authored
- authored
Commits on Aug 1, 2024
- authored
- authored
- authored
- authored
[Bugfix] Set SamplingParams.max_tokens for OpenAI requests if not provided by user (vllm-project#6954)
authored- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Aug 2, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Aug 3, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Aug 4, 2024
- authored
- authored
[Bugfix] [SpecDecode] Default speculative_draft_tensor_parallel_size to 1 when using MLPSpeculator (vllm-project#7105)
authored- authored
- authored
Commits on Aug 5, 2024
- authored
[Speculative decoding] Add periodic log with time spent in proposal/scoring/verification (vllm-project#6963)
authored- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Aug 6, 2024
- authored
- authored
- authored
- authored
- authored
- authored
[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (vllm-project#4942)
Commits on Aug 7, 2024
- authored
- authored
- authored
- authored
- authored
[Misc] Refactor linear layer weight loading; introduce
BasevLLMParameter
andweight_loader_v2
(vllm-project#5874)authored- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Aug 8, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Aug 9, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Aug 10, 2024
Commits on Aug 11, 2024
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Aug 12, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel (vllm-project#7208)
- authored
- authored
Commits on Aug 13, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Aug 14, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Aug 15, 2024
- authored
- authored
- authored
Commits on Aug 16, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
[Bugfix][Hardware][AMD][Frontend] add quantization param to embedding checking method (vllm-project#7513)
authored- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Aug 17, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Aug 18, 2024
- authored
- authored
- authored
Commits on Aug 19, 2024
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
Commits on Aug 20, 2024
[Bugfix] use StoreBoolean instead of type=bool for --disable-logprobs-during-spec-decoding (vllm-project#7665)
authored- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored
- authored