-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Update D2V, AutoTokenizer, and pretraining scripts #155
Conversation
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## dev #155 +/- ##
==========================================
- Coverage 97.81% 97.31% -0.51%
==========================================
Files 80 84 +4
Lines 4349 4650 +301
==========================================
+ Hits 4254 4525 +271
- Misses 95 125 +30 ☔ View full report in Codecov by Sentry. |
EduNLP/ModelZoo/hf_model/hf_model.py
Outdated
bert_config = AutoConfig.from_pretrained(pretrained_model_dir) | ||
if init: | ||
logger.info(f'Load AutoModel from checkpoint: {pretrained_model_dir}') | ||
self.bert = AutoModel.from_pretrained(pretrained_model_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change this to sth like self.model? AutoModel should not be constrained to BERT
EduNLP/ModelZoo/hf_model/hf_model.py
Outdated
bert_config = AutoConfig.from_pretrained(pretrained_model_dir) | ||
if init: | ||
logger.info(f'Load AutoModel from checkpoint: {pretrained_model_dir}') | ||
self.bert = AutoModel.from_pretrained(pretrained_model_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
EduNLP/Pretrain/auto_vec.py
Outdated
pass | ||
|
||
|
||
def finetune_edu_auto_model( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be sth like pretrain_hf_auto_model
? It is only used for huggingface models. Also, it is domain pretraining instead of fine-tuning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the file name should be hf_auto_vec? It is not auto for our educational models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And pretrain_bert, finetune_bert_for_xx should be the same with these auto functions? Maybe with these auto function, we can delete the bert_vec? Not sure if it is better or not. Or we can keep that file but directly reuse these auto functions?
EduNLP/Pretrain/elmo_vec.py
Outdated
|
||
__all__ = ["ElmoTokenizer", "ElmoDataset", "train_elmo", "train_elmo_for_property_prediction", | ||
"train_elmo_for_knowledge_prediction"] | ||
__all__ = ["ElmoTokenizer", "ElmoDataset", "pretrain_elmo", "pretrain_elmo_for_property_prediction", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be finetune_elmo_for_xxx
Maybe that's my task lol
The test coverage seems to drop a lot. Try adding more tests for your new code |
Thanks for sending a pull request!
Please make sure you click the link above to view the contribution guidelines,
then fill out the blanks below.
Description
(Brief description on what this PR is about)
What does this implement/fix? Explain your changes.
...
Pull request type
Changes
Does this close any currently open issues?
N/A
Any relevant logs, error output, etc?
N/A
Checklist
Before you submit a pull request, please make sure you have to following:
Essentials
Comments