How can I use this model on Chinese dataset? #8

f617452296 · 2020-02-17T06:40:12Z

And can this model be helpful on Chinese dataset?

ekQ · 2020-02-17T13:06:53Z

We haven't looked into this, but you could try it using BERT-Base, Chinese to initialize the model.

qiuhuiGithub · 2020-03-18T05:50:04Z

I test the model on chinese GEC task and it works fine.

f617452296 · 2020-03-18T05:53:58Z

May I have your email to ask some question?

…

------------------ Original ------------------ From: qiuhuiGitHub <[email protected]> Date: Wed,Mar 18,2020 1:50 PM To: google-research/lasertagger <[email protected]> Cc: fishfang <[email protected]>, Author <[email protected]> Subject: Re: [google-research/lasertagger] How can I use this model on Chinese dataset? (#8) I test the model in chinese GEC task and it works fine. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

qiuhuiGithub · 2020-03-18T06:10:58Z

May I have your email to ask some question?
…
------------------ Original ------------------ From: qiuhuiGitHub <[email protected]> Date: Wed,Mar 18,2020 1:50 PM To: google-research/lasertagger <[email protected]> Cc: fishfang <[email protected]>, Author <[email protected]> Subject: Re: [google-research/lasertagger] How can I use this model on Chinese dataset? (#8) I test the model in chinese GEC task and it works fine. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

U can use any chinese bert model by simple replace the bert path and it works fine.

ekQ · 2020-03-18T15:47:30Z

I test the model on chinese GEC task and it works fine.

Good to know this!

varepsilon · 2020-03-19T15:03:37Z

I test the model on chinese GEC task and it works fine.

Is it a public dataset? If so, could you share a link?

qiuhuiGithub · 2020-03-20T02:55:32Z

I test the model on chinese GEC task and it works fine.

Is it a public dataset? If so, could you share a link?

http://tcci.ccf.org.cn/conference/2018/taskdata.php The second task is GEC task.

f617452296 · 2020-03-20T03:10:19Z

I test the model on chinese GEC task and it works fine.

Is it a public dataset? If so, could you share a link?

http://tcci.ccf.org.cn/conference/2018/taskdata.php The second task is GEC task.

Could you please tell me how I can run this model on GEC task
Such as step1. Phrase Vocabulary Optimization and 2. Converting Target Texts to Tags

qiuhuiGithub · 2020-03-20T03:41:28Z

I test the model on chinese GEC task and it works fine.

Is it a public dataset? If so, could you share a link?

http://tcci.ccf.org.cn/conference/2018/taskdata.php The second task is GEC task.

Could you please tell me how I can run this model on GEC task
Such as step1. Phrase Vocabulary Optimization and 2. Converting Target Texts to Tags

First I suggest you read the run_wikisplit_experiment.sh in the project. You can simple run lasertagger by changing the script. Here is a example.

You should change your data into wikisplit format, such as "I like you \t I love you".
Change the all the Path in the script to your's.
Change vocab_size in configs/lasertagger_config.json because the vocab_size is different in Chinese BERT.
Run the script step by step.
Best wishes.

f617452296 · 2020-03-20T06:14:36Z

I test the model on chinese GEC task and it works fine.

Is it a public dataset? If so, could you share a link?

http://tcci.ccf.org.cn/conference/2018/taskdata.php The second task is GEC task.

Could you please tell me how I can run this model on GEC task
Such as step1. Phrase Vocabulary Optimization and 2. Converting Target Texts to Tags

First I suggest you read the run_wikisplit_experiment.sh in the project. You can simple run lasertagger by changing the script. Here is a example.

You should change your data into wikisplit format, such as "I like you \t I love you".

Change the all the Path in the script to your's.

Change vocab_size in configs/lasertagger_config.json because the vocab_size is different in Chinese BERT.

Run the script step by step.
Best wishes.

It helps a lot! Thank you!

f617452296 · 2020-03-20T07:10:09Z

I test the model on chinese GEC task and it works fine.

Is it a public dataset? If so, could you share a link?

http://tcci.ccf.org.cn/conference/2018/taskdata.php The second task is GEC task.

By the way, I want to know whether the training data on http://tcci.ccf.org.cn/conference/2018/taskdata.php
need to be seg into words or just I can send a whole sentence into the model? Thanks!

qiuhuiGithub · 2020-03-20T07:18:18Z

I test the model on chinese GEC task and it works fine.

Is it a public dataset? If so, could you share a link?

http://tcci.ccf.org.cn/conference/2018/taskdata.php The second task is GEC task.

By the way, I want to know whether the training data on http://tcci.ccf.org.cn/conference/2018/taskdata.php
need to be seg into words or just I can send a whole sentence into the model? Thanks!

eh, the input of the chinese bert is separate word, so you should cut the sentence into separate word.

Ivy-C-85 · 2020-03-25T00:29:09Z

I test the model on chinese GEC task and it works fine.

Is it a public dataset? If so, could you share a link?

http://tcci.ccf.org.cn/conference/2018/taskdata.php The second task is GEC task.

By the way, I want to know whether the training data on http://tcci.ccf.org.cn/conference/2018/taskdata.php
need to be seg into words or just I can send a whole sentence into the model? Thanks!

eh, the input of the chinese bert is separate word, so you should cut the sentence into separate word.

Hi, I also tested GEC task. But my model didn't work well, it didn't actually 'correct', it just delete every difference and even some same parts between source and target texts. I use JIEBA to cut my sentences and I thought everything was done just fine, only the results were pretty bad. Could you please tell me did you have the same problem and which score did you use?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I use this model on Chinese dataset? #8

How can I use this model on Chinese dataset? #8

f617452296 commented Feb 17, 2020

ekQ commented Feb 17, 2020

qiuhuiGithub commented Mar 18, 2020 •

edited

Loading

f617452296 commented Mar 18, 2020 via email

qiuhuiGithub commented Mar 18, 2020

ekQ commented Mar 18, 2020

varepsilon commented Mar 19, 2020

qiuhuiGithub commented Mar 20, 2020

f617452296 commented Mar 20, 2020

qiuhuiGithub commented Mar 20, 2020 •

edited

Loading

f617452296 commented Mar 20, 2020

f617452296 commented Mar 20, 2020

qiuhuiGithub commented Mar 20, 2020

Ivy-C-85 commented Mar 25, 2020

How can I use this model on Chinese dataset? #8

How can I use this model on Chinese dataset? #8

Comments

f617452296 commented Feb 17, 2020

ekQ commented Feb 17, 2020

qiuhuiGithub commented Mar 18, 2020 • edited Loading

f617452296 commented Mar 18, 2020 via email

qiuhuiGithub commented Mar 18, 2020

ekQ commented Mar 18, 2020

varepsilon commented Mar 19, 2020

qiuhuiGithub commented Mar 20, 2020

f617452296 commented Mar 20, 2020

qiuhuiGithub commented Mar 20, 2020 • edited Loading

f617452296 commented Mar 20, 2020

f617452296 commented Mar 20, 2020

qiuhuiGithub commented Mar 20, 2020

Ivy-C-85 commented Mar 25, 2020

qiuhuiGithub commented Mar 18, 2020 •

edited

Loading

qiuhuiGithub commented Mar 20, 2020 •

edited

Loading