-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add util for loss spike save and decode. #1044
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1044 +/- ##
==========================================
- Coverage 78.53% 78.25% -0.29%
==========================================
Files 187 191 +4
Lines 17336 17784 +448
==========================================
+ Hits 13615 13916 +301
- Misses 3721 3868 +147 ☔ View full report in Codecov by Sentry. |
You need to format your commits to pass the test of atorch-pre-commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use English for document writing and with readable format.
|
||
|
||
class TokenLossSpike(LossSpikeBase): | ||
def save_loss(self, file_name, cur_loss, cur_iter, losses_str, sample_infos_str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the relationship between cur_loss and losses_str, or cur_iter and sample_infos_str?
What do losses_str and sample_infos_str mean in model training ?
data = tokenizer.decode(data) | ||
return ds, data, max_loss | ||
|
||
def fetch(self, each_sample_info): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since fetch is user-defined method, then either:
define it in base class as abstract method.
or
the class instance initialization should have a parameter (fetch_func) , which is provided by user.
Does this PR still need updates and merging? If so, please reply. This PR will be closed by the end of the month if there is no response. Thanks a lot. |
What changes were proposed in this pull request?
针对loss尖刺的记录和解析提供了一个工具
Why are the changes needed?
为了更多训练的人来使用