Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct way for data split #4

Open
hsujh073 opened this issue Aug 21, 2020 · 4 comments
Open

Correct way for data split #4

hsujh073 opened this issue Aug 21, 2020 · 4 comments
Assignees

Comments

@hsujh073
Copy link

Hi.
I want to run some codes with FriendsQA dataset and find out that in the JSON file episode 21-22 are the test set and those after 23 are the development set, different from that written in README.

So which one is the correct way to split the dataset? Thanks.

@jdchoi77
Copy link
Member

@hsujh073 I believe 21-22 should be the development set and 23+ are the test set as written in the paper:
https://www.aclweb.org/anthology/2020.acl-main.505.pdf

@FrankLicm could you please verify this and fix the typos if any? Thanks.

@arianakc
Copy link
Contributor

Hi, I am sorry that I think I may make a mistake when naming the generated split files before so I actually forgot which set I used to get the result in the paper, but the correct way I originally proposed is indeed 21-22 should be the development set and 23+ are the test set. Besides, this data split is generated from full data for version 1.0 when uploading it to make it consistent with the version 1.0, and due to my previous laptop issue, I lost the original data split files when I did experiments for which I did some deletion of some invalid questions and the development environment for this now is also lost so I am afraid that I cannot do any further operations regarding this repo. The typo here, I think, is only that the name of dev and test files of both versions 1.0 and 2.0 should be exchanged. Thanks.

@hsujh073
Copy link
Author

OK. Thank you.

@jdchoi77
Copy link
Member

@FrankLicm you still have the access to this repo, so please fix the names when you have time. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants