Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add util for loss spike save and decode. #1044

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

haikuotiankong1212
Copy link

What changes were proposed in this pull request?

针对loss尖刺的记录和解析提供了一个工具

Why are the changes needed?

为了更多训练的人来使用

Copy link

codecov bot commented Mar 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.25%. Comparing base (3157af7) to head (dd27cbf).
Report is 244 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1044      +/-   ##
==========================================
- Coverage   78.53%   78.25%   -0.29%     
==========================================
  Files         187      191       +4     
  Lines       17336    17784     +448     
==========================================
+ Hits        13615    13916     +301     
- Misses       3721     3868     +147     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@workingloong
Copy link
Collaborator

You need to format your commits to pass the test of atorch-pre-commit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use English for document writing and with readable format.



class TokenLossSpike(LossSpikeBase):
def save_loss(self, file_name, cur_loss, cur_iter, losses_str, sample_infos_str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the relationship between cur_loss and losses_str, or cur_iter and sample_infos_str?
What do losses_str and sample_infos_str mean in model training ?

data = tokenizer.decode(data)
return ds, data, max_loss

def fetch(self, each_sample_info):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since fetch is user-defined method, then either:
define it in base class as abstract method.
or
the class instance initialization should have a parameter (fetch_func) , which is provided by user.

@BalaBalaYi
Copy link
Collaborator

Does this PR still need updates and merging? If so, please reply. This PR will be closed by the end of the month if there is no response. Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants