Train, dev, test split for the data #3

BillyZhang24kobe · 2023-10-13T17:36:27Z

Hello,

On the paper of this dataset you mentioned the data is split into several train, dev and test splits. I am wondering if you have some documentations on how exactly the splits are? I have downloaded the dataset from the official website (https://geodiverse-data-collection.cs.princeton.edu/), but it seems that there is only an 'index.csv' as a metadata file, which does not specify how the train-val-test data is split. Any pointers are welcomed! Thanks!

hassony2 · 2023-11-22T14:43:55Z

Hi @BillyZhang24kobe,

I am also looking into this :)
It looks like the splits are defined in load_data.
If my understanding is correct, to report numbers which would be comparable to Table 6, we need to use prep_geode_38 to generate the different per-region files, using 'index.csv' in place of the metadata file, using 'object' and 'file_path' instead of the 'script_name' and 'file_name' fields.

@vramaswamy94, thank you for contributing such a nice dataset :)
Would you be able to confirm if my understanding is correct ? It would be great if you could provide the generated region-specific pickle files to avoid any risks of using a different train/val/test partition compared to your paper. Would you be able to share these ?

Have a great day !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train, dev, test split for the data #3

Train, dev, test split for the data #3

BillyZhang24kobe commented Oct 13, 2023

hassony2 commented Nov 22, 2023 •

edited

Loading

Train, dev, test split for the data #3

Train, dev, test split for the data #3

Comments

BillyZhang24kobe commented Oct 13, 2023

hassony2 commented Nov 22, 2023 • edited Loading

hassony2 commented Nov 22, 2023 •

edited

Loading