Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: check size also on main branch #16

Closed
wants to merge 14 commits into from
Closed

ci: check size also on main branch #16

wants to merge 14 commits into from

Conversation

KennethEnevoldsen
Copy link
Contributor

We just had an error on main due to size, but which the tests does not catch

@KennethEnevoldsen
Copy link
Contributor Author

@orionw it seems like the tests and HF doesn't agree.

when I check one of the files it also seems fine:

7.1M    results/multilingual-e5-small/e4ce9877abf3edfe10b0d82785e83bdcb973e22e/FloresBitextMining.json

@KennethEnevoldsen
Copy link
Contributor Author

I am not entirely sure what the best solution is here. We can reduce the precision further, remove the "languages" the results dict (require changes in results loading to add it back in based on metadata). Lastly (which is probably my suggestion), we can remove unused metrics (e.g. precision).

for reference, the inner result dict looks like:

{'main_score': 0.5, 'hf_subset': 'en-de', 'languages': ['eng-Latn', 'deu-Latn'], "metric1": ...}
# for flores:
{"accuracy": 0.794466, "f1": 0.747831, "hf_subset": "tum_Latn-som_Latn", "languages": ["tum-Latn", "som-Latn"], "main_score": 0.747831, "precision": 0.727899, "recall": 0.794466}

We previously also discussed splitting up the file, which would cause problems with the current loading of MTEBResults objects. If we do so, we need a solution in the mteb.load_results(), which glues together files based on before loading (which seems slightly frustrating).

@orionw
Copy link
Contributor

orionw commented Aug 12, 2024

Thanks for looking at this @KennethEnevoldsen! I think this could be trying to sync an older commit perhaps? Did we ever not squash merge? I don't see how HF could say its 10+MB when everything else says 7MB.

If it was an intermediate commit which had 10MB+ files it will error like this. We could just remove that commit from this repo

@KennethEnevoldsen
Copy link
Contributor Author

KennethEnevoldsen commented Aug 12, 2024

Ahh, right, that is probably it. Didn't consider that. What is the best approach to resolve this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants