UDA Example #343

jrxk · 2020-12-19T20:54:38Z

This PR fixes #293.

Description of changes
This PR adds an example that uses UDA to train a text classifier on the IMDB text classification dataset. Please see the README for details.

Test Conducted
Performed experiments with/without UDA, under supervised and semi-supervised settings.

…ssifier

hunterhector

The whole thing is still non-forte, without clear documentation of how to use UDA.

If a user looks at this example, he wouldn't know how to incorporate UDA. There is not explanation in the readme, no comments in the main.py. All he can see is a 500 line code and don't know where to start, or don't know which part is important.

Previous comments on the last PR are not fixed, such as calling subprocess in Python, and there is no use of Forte at all. Why would I need this framework instead of going to Google's code?

examples/text_classification/download_imdb.py

examples/text_classification/utils/imdb_format.py

examples/text_classification/utils/data_utils.py

codecov · 2020-12-24T02:19:36Z

Codecov Report

Merging #343 (4c1b6fb) into master (900cf93) will decrease coverage by 0.09%.
The diff coverage is 54.16%.

@@            Coverage Diff             @@
##           master     #343      +/-   ##
==========================================
- Coverage   81.87%   81.78%   -0.10%     
==========================================
  Files         187      187              
  Lines       11941    11940       -1     
==========================================
- Hits         9777     9765      -12     
- Misses       2164     2175      +11

Impacted Files	Coverage Δ
forte/data/readers/largemovie_reader.py	`77.35% <52.17%> (-20.69%)`	⬇️
tests/forte/data/readers/largemovie_reader_test.py	`97.29% <100.00%> (-0.21%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 900cf93...4c1b6fb. Read the comment docs.

hunterhector

I think we'd better place all data aug examples in the same folder: https://github.com/asyml/forte/tree/master/examples/data_augmentation

examples/text_classification/utils/data_utils.py

…example

hunterhector

This PR is still at a very low quality, which cannot be merged at its current state? For example, where is the step to generate the back translation data? How can the user do it on his own?

Here are a list of problems:

there's no variable type
there is no instruction on how to get the back translation data.
A lot of code is directly copied from Google.
All the docstrings do not follow our standard.
After mentioning using the training team's format, there is no change, there are still a lot of ad-hoc code such as InputFeatures, InputExample.

If that's the quality of this work, there's no reason that a user would use this instead of Google's implementation.

examples/data_augmentation/uda/config_data.py

examples/data_augmentation/uda/utils/data_utils.py

examples/data_augmentation/uda/main.py

examples/data_augmentation/uda/utils/imdb_format.py

examples/data_augmentation/uda/utils/data_utils.py

examples/data_augmentation/uda/main.py

hunterhector

I only see back_trans folder with the requirement file, I guess you need to refer to Google?

Another way is to document how to get back translation from Google with steps, but not copying the code over.

Again, most docstrings are not following our standard.

forte/data/readers/largemovie_reader.py

examples/data_augmentation/uda/utils/model_utils.py

jrxk · 2020-12-29T02:06:36Z

I only see back_trans folder with the requirement file, I guess you need to refer to Google?

Another way is to document how to get back translation from Google with steps, but not copying the code over.

Again, most docstrings are not following our standard.

Sure. It's still a work in progress. I will add instructions on how to install tensor2tensor with the right versions of Tensorflow and and use the pre-trained translation models. Thanks!

haoyuLucas and others added 30 commits October 3, 2020 16:19

draft of back translation

1ca7f31

Add a backtranslation augmenter

f166160

after merge

babda7a

rebase on the new base classes

4e1e866

Update data_augment_processor.py

31b326e

Update data_augment_processor.py

681f15c

Update data_augment_processor.py

2142587

Delete base_augmenter.py

d6f3a4f

Delete dictionary_replacement_augmenter.py

653a00e

Delete text_generation_augment_processor.py

b85190f

Delete dictionary_replacement_augmenter_test.py

610e99a

Delete text_generation_augment_processor_test.py

47d0805

change the configs to a Texar Config

d7b9a5b

Merge branch 'bt' of github.com:haoyuLucas/forte into bt

917090c

abstract a machine translator class

a6b6169

Update machine_translator.py

5d17c13

add the transformer to requirements

45f0100

add an extra space

fcd33d0

add the transformers to travis yml

5445fbe

add travis yml

07d7b27

Merge branch 'master' into bt

05947ab

add text classifier

a229a58

add list

ae0ffa5

fix main

04e92fa

fix

2dc228c

delete some files

1d1a577

Merge branch 'master' into imdb_classifier

3c46e9b

first commit of uda

8c6e000

Merge branch 'master' of https://github.com/asyml/forte into imdb_cla…

a386c62

…ssifier

Merge branch 'master' into imdb_classifier

f8c0995

jrxk added the data_aug Features on data augmentation label Dec 19, 2020

jrxk requested a review from hunterhector December 19, 2020 20:54

jrxk self-assigned this Dec 19, 2020

jrxk and others added 3 commits December 19, 2020 15:55

fix init

a95023c

Merge branch 'master' into uda_example

2e16182

Merge branch 'master' into uda_example

b00293f

hunterhector reviewed Dec 21, 2020

View reviewed changes

jrxk added 4 commits December 23, 2020 14:49

Merge branch 'master' of https://github.com/asyml/forte into uda_example

3984038

removed wget, changed imdb_format to forte pipeline

efcae45

Update README with tutorial to UDA

2800218

fix travis

f6a4be1

Merge branch 'master' into uda_example

c0aa83f

hunterhector reviewed Dec 24, 2020

View reviewed changes

examples/text_classification/utils/data_utils.py Outdated Show resolved Hide resolved

examples/text_classification/utils/data_utils.py Outdated Show resolved Hide resolved

jrxk added 2 commits December 24, 2020 02:54

move to da folder, remove classes

f17a238

Merge branch 'uda_example' of https://github.com/jrxk/forte into uda_…

8d8016f

…example

hunterhector requested changes Dec 24, 2020

View reviewed changes

jrxk added 3 commits December 24, 2020 16:54

clean some code, update reader

a7fb89d

more clean, adding bt

46ccb67

update scripts, add requirements for t2t

0fec67c

hunterhector reviewed Dec 29, 2020

View reviewed changes

forte/data/readers/largemovie_reader.py Outdated Show resolved Hide resolved

examples/data_augmentation/uda/utils/model_utils.py Show resolved Hide resolved

examples/data_augmentation/uda/utils/model_utils.py Outdated Show resolved Hide resolved

jrxk and others added 6 commits December 29, 2020 20:42

add merge sentences code

34727eb

Added instructions for back translation

554f892

fix docstring, travis

259664f

fix travis

f935d74

update test

e319fe9

Merge branch 'master' into uda_example

4c1b6fb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UDA Example #343

UDA Example #343

jrxk commented Dec 19, 2020

hunterhector left a comment

codecov bot commented Dec 24, 2020 •

edited

Loading

hunterhector left a comment

hunterhector left a comment •

edited

Loading

hunterhector left a comment

jrxk commented Dec 29, 2020 •

edited

Loading

UDA Example #343

Are you sure you want to change the base?

UDA Example #343

Conversation

jrxk commented Dec 19, 2020

hunterhector left a comment

Choose a reason for hiding this comment

codecov bot commented Dec 24, 2020 • edited Loading

Codecov Report

hunterhector left a comment

Choose a reason for hiding this comment

hunterhector left a comment • edited Loading

Choose a reason for hiding this comment

hunterhector left a comment

Choose a reason for hiding this comment

jrxk commented Dec 29, 2020 • edited Loading

codecov bot commented Dec 24, 2020 •

edited

Loading

hunterhector left a comment •

edited

Loading

jrxk commented Dec 29, 2020 •

edited

Loading