Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding correspondance between metadata #8

Open
ryanwebster90 opened this issue Apr 5, 2023 · 0 comments
Open

Finding correspondance between metadata #8

ryanwebster90 opened this issue Apr 5, 2023 · 0 comments

Comments

@ryanwebster90
Copy link
Owner

ryanwebster90 commented Apr 5, 2023

Right now, it is possible to download a de-duplicated set w.r.t. metadata corresponding to a particular feature set (e.g. ViT-H-14). Unfortunately, if you already have downloaded LAION-2B with the original metadata, you will need to "map" the metadata from your existing set into a deduplicated version of this set.

Thus several things should be done:

  1. Release a full set of de-dupped metadata (or de-dupped up to some redundancy factor)
  2. Make functionality for easily finding correspondence between large sets of urls.

In fact, for the 2nd point, some de-duplication can be done just with this function, as we found duplicate urls (despite laion's attempt to de-dup the urls themselves).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant