Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hold-out data #23

Open
florianhartig opened this issue May 7, 2021 · 1 comment
Open

Hold-out data #23

florianhartig opened this issue May 7, 2021 · 1 comment

Comments

@florianhartig
Copy link
Member

Hi @MaximilianPi, I'm just looking at the ML datasets, what do we want to do with the hold-out data for those, should we add them here and hide or maybe keep the with the submission server on the ML repo?

@MaximilianPi
Copy link
Member

Hi,
two options:
a) two versions (full and without hold-out) of the datasets in ecodata package (as we did with the titanic and the plant-pollinator database), but the students were confused by the two different versions and one even said it is disappointing that the holdouts are theoretically available in the EcoData
b) as you said, only one version of the wine, nasa, and flower datasets without the holdouts (the holdouts are also available in this separate server submission repo: https://github.com/MaximilianPi/submission_server

If you want to use the wine, nasa, and flower datasets for other courses (e.g. the stats courses) I don't think it is a big problem if half of the data is missing, right?

I am in favour of b) or hiding them in EcoData

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants