Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diverse data sets helps most when our target is a general data set, so when it's not diverse, the loss should be higher #15

Open
brando90 opened this issue Feb 8, 2024 · 0 comments

Comments

@brando90
Copy link
Collaborator

brando90 commented Feb 8, 2024

one idea for an experiment:

  1. control for alignment e.g., for data sets with roughly same alignment (two figures, 1 low 1 high alignment)
  2. then plot CE eval on the y-axis and div coeff on the x-axis. Do we see models trained on mode diverse data sets perform better?

This is mainly to back up our claim that "quality == diversity if target == general data set"

@brando90 brando90 changed the title diverse data sets helps most when our target is a general data set diverse data sets helps most when our target is a general data set, so when it's not diverse, the loss should be higher Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant