-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UDA Example #343
base: master
Are you sure you want to change the base?
UDA Example #343
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole thing is still non-forte, without clear documentation of how to use UDA.
If a user looks at this example, he wouldn't know how to incorporate UDA. There is not explanation in the readme, no comments in the main.py. All he can see is a 500 line code and don't know where to start, or don't know which part is important.
Previous comments on the last PR are not fixed, such as calling subprocess in Python, and there is no use of Forte at all. Why would I need this framework instead of going to Google's code?
Codecov Report
@@ Coverage Diff @@
## master #343 +/- ##
==========================================
- Coverage 81.87% 81.78% -0.10%
==========================================
Files 187 187
Lines 11941 11940 -1
==========================================
- Hits 9777 9765 -12
- Misses 2164 2175 +11
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd better place all data aug examples in the same folder: https://github.com/asyml/forte/tree/master/examples/data_augmentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is still at a very low quality, which cannot be merged at its current state? For example, where is the step to generate the back translation data? How can the user do it on his own?
Here are a list of problems:
- there's no variable type
- there is no instruction on how to get the back translation data.
- A lot of code is directly copied from Google.
- All the docstrings do not follow our standard.
- After mentioning using the training team's format, there is no change, there are still a lot of ad-hoc code such as
InputFeatures
,InputExample
.
If that's the quality of this work, there's no reason that a user would use this instead of Google's implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only see back_trans
folder with the requirement file, I guess you need to refer to Google?
Another way is to document how to get back translation from Google with steps, but not copying the code over.
Again, most docstrings are not following our standard.
Sure. It's still a work in progress. I will add instructions on how to install |
This PR fixes #293.
Description of changes
This PR adds an example that uses UDA to train a text classifier on the IMDB text classification dataset. Please see the README for details.
Test Conducted
Performed experiments with/without UDA, under supervised and semi-supervised settings.