forked from NVIDIA/Megatron-LM
-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: UL2 merge #23
Open
RaymondLi0
wants to merge
144
commits into
multi-query-attention
Choose a base branch
from
ul2-merge
base: multi-query-attention
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
WIP: UL2 merge #23
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ke all arguments keyword to avoid warnings.
Remove old merge tool. See merge request ADLR/megatron-lm!433
added a flag to be able to switch between pytorch and ring exchange p2p See merge request ADLR/megatron-lm!434
support for all mask in fused kernel + avoiding inplace operation in bwd pass See merge request ADLR/megatron-lm!435
fix a bug for size mismatch See merge request ADLR/megatron-lm!438
Timing levels See merge request ADLR/megatron-lm!436
fixed grad scalar warning so it only prints it for fp16 See merge request ADLR/megatron-lm!441
fixed grad scalar warning for bf16 See merge request ADLR/megatron-lm!442
Memory safety checks were incorrect for the tokens_to_generate=0 case See merge request ADLR/megatron-lm!447
Update state_dict arguments for recent PyTorch versions. See merge request ADLR/megatron-lm!432
The LICENSE file says everything is 3-clause BSD, which is what we want, but at some point the Apache license was added to the top of some files and that proliferated. This commit removes the Apache license from any files that we own the copyright to. Also updates the copyright year and removes the unnessary coding=utf-8 line.
Clean up licensing. See merge request ADLR/megatron-lm!451
Also merged in some changed from apex
Since the normal distribution is unbounded, we cannot have `max_ngrams` set to a bounded value.
Filtered means not `cls_id` or `sep_id` tokens. This slightly improves calculated statistics for long sequences and greatly for very short sequences.
Via an extra "private" argument.
The GPT tokenizer does not handle the difference between UL2 tokens and other special tokens well. This should be fine as UL2 tokens being distinct from other special tokens is never assumed at the moment (although other tokenizers implement it like that). In general, `additional_special_token_ids` is new for the GPT tokenizer, so there is no backward compatibility trouble.
Not always strictly necessary; this is only important for the decoder-only case. However, we don't bother checking for this since it's also queried in the `UL2Dataset`.
Usually we do not iterate through all indices, so we can save quite some time if `max_ngrams` is large.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is based on NVIDIA#268
In addition:
TODO: getting around 30%reduced throughput with UL2.