fix models path #25

Samoed · 2024-09-05T17:58:51Z

Since #18, you can't load results for models that are in the author__model folder format because the script only looks for folders that match the model name. For example:

from datasets import load_dataset
ds = load_dataset("mteb/results", "e5-small-v2")

This would result in an error because there is only an intfloat__e5-small-v2 folder. However, models without a revision still work since their folder names haven't changed.

I've also added a test to load all model results. I'm not sure how long this will take to run on CI, but locally, with a hard-coded local JSON, it took 2 minutes. Currently, it is failing due to errors like the one mentioned above.

@KennethEnevoldsen, could you please take a look at this?

Samoed · 2024-09-05T18:29:52Z

Interesting. Locally I can run make test, but CI fails

KennethEnevoldsen · 2024-09-06T09:53:22Z

This sounds like a change that should be fixed within mteb's load function. Will add a fix there and the return to this issue.

Samoed · 2024-09-06T09:56:52Z

I don't think so. Because this repo don't use mteb for work with data and source of problem wrong paths.json

__init__.py

results.py

KennethEnevoldsen · 2024-09-06T10:47:36Z

tests/test_load_results.py

+@pytest.mark.parametrize("model", MODELS)
+def test_load_results_from_datasets(model):
+    """Ensures that all models can be imported from dataset"""
+    path = Path(__file__).parent.parent / "results.py"
+    ds = load_dataset(str(path.absolute()), model, trust_remote_code=True)


Why is this required? Does the leaderboard load it using datasets?

Yes https://github.com/embeddings-benchmark/leaderboard/blob/d443c367474648aec8f145a73e0d7df5304f0a94/refresh.py#L245-L251

tests/test_load_results.py

KennethEnevoldsen · 2024-09-06T15:23:56Z

@Samoed if you resolve a comment, please resolve - it makes it easier for me to see what you have fixed.

Also seems like there is still unresolved files

Samoed · 2024-09-06T15:29:21Z

Now I resolved only __init__ comment and was waiting for tests, but they didn't run. Other comments needed your review

results.py

Co-authored-by: Kenneth Enevoldsen <[email protected]>

# Conflicts: # paths.json

KennethEnevoldsen · 2024-09-06T15:34:43Z

Other comments needed your review

Took a look over it again. Are you referring to:

I don't think so. Because this repo don't use mteb for work with data and source of problem wrong paths.json

Are you saying that the leaderboard repo CI, assumes something about the file structure on this repo and that is what is causing the issue?

The current file format is def. the one we want to stick with. So we should update the leaderboard code to match if that is the case.

Samoed · 2024-09-06T15:43:36Z

Are you saying that the leaderboard repo CI, assumes something about the file structure on this repo and that is what is causing the issue?

A bit yes, now in paths.json can be two versions of model result files

before adding author to model_name:
https://github.com/embeddings-benchmark/results/blob/45604e5b10b383a06d8725125e485273211da8e1/paths.json#L592-L596
and after that:
https://github.com/embeddings-benchmark/results/blob/45604e5b10b383a06d8725125e485273211da8e1/paths.json#L305-L309

And results repo looks only for models that are defined in split and now in repo for bge-m3 result will be files without revision and others with revision can't be imported. My PR merges these results by finding model_name after author name, so new format results can be used with old format

…to fix_models_path

# Conflicts: # paths.json

tests/test_load_results.py

KennethEnevoldsen · 2024-09-06T16:37:53Z

Hmm right, getting a better hang of the code here - the integration with mteb is quite poor here, which leads to a few issues.

So ideal situation:

from datasets import load_dataset
ds = load_dataset("mteb/results", "intfloat/e5-small-v2") # using just e5-small-v2 can lead to multiple matches

and that results should be compatible with

results = mteb.load_results(models=["intfloat/e5-small-v2"])

For splits etc. this is a duplicate of information already within mteb. A benchmark should only be specified once (and that is within the package). Split etc. should all be derived from mteb.

I don't see any reason why we should filter it within this repository (I might be missing something).

That is the ideal case. We might need to make some shortcuts here.

Samoed · 2024-09-06T17:01:30Z

I understand your idea and agree that it should be implemented this way. However, it will require more time to integrate the old results into the new format, and some refactoring of the leaderboard will be needed

Samoed · 2024-09-06T17:44:37Z

Now tests fails becase of huggingface/datasets#7141

KennethEnevoldsen

I understand your idea and agree that it should be implemented this way. However, it will require more time to integrate the old results into the new format, and some refactoring of the leaderboard will be needed

Thanks. Yep def. required more changes - I think this looks reasonable as is. Though we need to resolve the failing tests

@orionw since you are more familiar with the leaderboard integration, can I ask you to review this PR as well?

orionw

Thanks for this PR @Samoed!! There was definitely some problem with it before that you fixed, thanks! I may be having a hard time following this PR, but it seems like we just make it so that any shortened model name works also so we don't miss any/have duplicates.

If so, did we make sure we kept all the existing results? I see the GritLM no instruction one got deleted for example.

That's my only concern, otherwise LGTM.

results.py

KennethEnevoldsen · 2024-09-09T19:12:32Z

@Samoed we still have to resolve the failing test before we can merge this in. Also please respond to the comments.

You also write:

Currently, it is failing due to errors like the one mentioned above.

Which, when I read it along with the failing tests it seems like there is only partly a solution here. In which case please describe what is stopping it from becoming a finished solution.

Samoed · 2024-09-09T19:37:23Z

I've added a test to load all model results by splits, but the results take path.json from the HF repository, where the paths are broken. To make this test pass, it needs to be fixed in the HF repository.

TIL: I wrote replies to all your comments, but I didn't know I needed to press the submit button in the files tab to make them visible for you

Samoed · 2024-09-09T20:00:27Z

Currently, there is one model that won't be fixed by my PR: text-embedding-preview-0409-768. At the moment, there are two folders with the same model name, but the author_name is separated by a .. I think this should be standardized, but I'm not sure which model this corresponds to.

KennethEnevoldsen · 2024-09-09T20:07:58Z

I've added a test to load all model results by splits, but the results take path.json from the HF repository, where the paths are broken. To make this test pass, it needs to be fixed in the HF repository.

Thanks for the clarification should have noted that. Seems like a test that will never fail when we want it to (when we introduce an error it will use the current working version) and when that error is added the next (working) PR will fail. Shouldn't it just reference the path.json within the PR?

TIL: I wrote replies to all your comments, but I didn't know I needed to press the submit button in the files tab to make them visible for you

Ahh we have all been there :)

re names, we should use '__' to separate the author from the model names. For now, let us treat 256 as a part of the authors (though my guess is that it refers to the embed size) so:

google-gecko-256.text-embedding-preview-0409/no_revision_available -> google-gecko-256__text-embedding-preview-0409/no_revision_available

Re: text-embedding-preview-0409-768

I can't seem to find that folder in the results repo?

Samoed · 2024-09-09T20:16:40Z

I can't seem to find that folder in the results repo?

I can't find it either, but I think it might be google-gecko.text-embedding-preview-0409. However, I'm not sure. Found these models in leaderboard config https://github.com/embeddings-benchmark/leaderboard/blob/12fdafc4eb41fb4e4dddfc6a6a202f13e23b734a/model_meta.yaml#L814-L831

Shouldn't it just reference the path.json within the PR?

I thought the same regarding #24, but I believe that when HF tries to download the repository, it will only download results.py and README, not the paths file. So, the paths file is being downloaded by the results file at runtime, and I'm not sure how to improve this.

Revert "rename" This reverts commit e252add. rename

KennethEnevoldsen · 2024-09-09T20:33:36Z

Let us delete text-embedding-preview-0409-768 for now. The results are not there and the two on the leaderboard fall under different names.

I thought the same regarding #24, but I believe that when HF tries to download the repository, it will only download results.py and README, not the paths file. So, the paths file is being downloaded by the results file at runtime, and I'm not sure how to improve this.

hmm not sure either the code:

ds = load_dataset(str(path.absolute()), model, trust_remote_code=True)

does seem to follow the docs. It does say legacy though. If we can't get it to work we should just remove the test

Samoed · 2024-09-09T20:45:31Z

Let us delete text-embedding-preview-0409-768 for now. The results are not there and the two on the leaderboard fall under different names.

I thought the same regarding #24, but I believe that when HF tries to download the repository, it will only download results.py and README, not the paths file. So, the paths file is being downloaded by the results file at runtime, and I'm not sure how to improve this.

hmm not sure either the code:

ds = load_dataset(str(path.absolute()), model, trust_remote_code=True)

does seem to follow the docs. It does say legacy though. If we can't get it to work we should just remove the test

Yes, this will work for local development if full repo is provided, but when using load_dataset("mteb/results") code will give error, because it won't have results file

orionw · 2024-09-09T20:50:17Z

Thanks for the response @Samoed, I understand now.

google-gecko.text-embedding-preview-0409

I have been confused by the difference between Google Gecko and this text-embedding-004 model on the Arena. Google's naming conventions make this very hard to understand.

KennethEnevoldsen · 2024-09-10T07:13:26Z

Thanks @Samoed for taking the time on this - I will merge it in now to allow the other PRs to be resolved as well

Samoed added 4 commits September 5, 2024 20:48

fix models path

e6abf58

fix test

76d9818

add last models

7dc5df0

fix CI test

2b0777c

Samoed added 2 commits September 5, 2024 21:32

try tests

98e85eb

add comment

12de698

Samoed mentioned this pull request Sep 6, 2024

Update COIR default.jsonl embeddings-benchmark/leaderboard#27

Open

KennethEnevoldsen requested changes Sep 6, 2024

View reviewed changes

Samoed requested a review from KennethEnevoldsen September 6, 2024 13:32

remove __init__

ba7075a

KennethEnevoldsen reviewed Sep 6, 2024

View reviewed changes

results.py Outdated Show resolved Hide resolved

Samoed and others added 3 commits September 6, 2024 18:29

Update results.py

13fc093

Co-authored-by: Kenneth Enevoldsen <[email protected]>

Merge branch 'refs/heads/main' into fix_models_path

10d93c2

# Conflicts: # paths.json

upd pahts

2ab6aa1

Samoed added 5 commits September 6, 2024 18:43

Merge remote-tracking branch 'refs/remotes/origin/fix_models_path' in…

429546d

…to fix_models_path

just one results

4721de4

remove comment

2f66c03

Merge branch 'refs/heads/main' into fix_models_path

228170e

# Conflicts: # paths.json

upd paths

bb86797

KennethEnevoldsen reviewed Sep 6, 2024

View reviewed changes

tests/test_load_results.py Outdated Show resolved Hide resolved

move tests

9aa2149

KennethEnevoldsen approved these changes Sep 8, 2024

View reviewed changes

orionw reviewed Sep 9, 2024

View reviewed changes

results.py Show resolved Hide resolved

results.py Show resolved Hide resolved

Samoed requested a review from KennethEnevoldsen September 9, 2024 15:14

KennethEnevoldsen reviewed Sep 9, 2024

View reviewed changes

results.py Outdated Show resolved Hide resolved

Samoed and others added 3 commits September 9, 2024 22:42

remove os

6a3fb56

Merge branch 'refs/heads/main' into fix_models_path

c8fcf38

move refresh paths

5e04ae1

remove text-embedding-preview-0409

266d095

Revert "rename" This reverts commit e252add. rename

Samoed force-pushed the fix_models_path branch from e252add to 266d095 Compare September 9, 2024 20:30

allow test to fail

9c9ab28

allow test to fail

28ca295

KennethEnevoldsen merged commit 16d7a28 into embeddings-benchmark:main Sep 10, 2024
2 checks passed

This was referenced Sep 10, 2024

re-add GritLM-7B-noinstruct #28

Merged

Fix leaderboard metrics and COIR tasks embeddings-benchmark/leaderboard#26

Merged

Samoed mentioned this pull request Sep 21, 2024

Standartize results folders #34

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix models path #25

fix models path #25

Samoed commented Sep 5, 2024 •

edited

Loading

Samoed commented Sep 5, 2024

KennethEnevoldsen commented Sep 6, 2024 •

edited

Loading

Samoed commented Sep 6, 2024

KennethEnevoldsen Sep 6, 2024

Samoed Sep 6, 2024

KennethEnevoldsen commented Sep 6, 2024

Samoed commented Sep 6, 2024

KennethEnevoldsen commented Sep 6, 2024

Samoed commented Sep 6, 2024 •

edited

Loading

KennethEnevoldsen commented Sep 6, 2024

Samoed commented Sep 6, 2024

Samoed commented Sep 6, 2024

KennethEnevoldsen left a comment •

edited

Loading

orionw left a comment •

edited

Loading

KennethEnevoldsen commented Sep 9, 2024

Samoed commented Sep 9, 2024 •

edited

Loading

Samoed commented Sep 9, 2024

KennethEnevoldsen commented Sep 9, 2024

Samoed commented Sep 9, 2024 •

edited

Loading

KennethEnevoldsen commented Sep 9, 2024 •

edited

Loading

Samoed commented Sep 9, 2024 •

edited

Loading

orionw commented Sep 9, 2024

KennethEnevoldsen commented Sep 10, 2024

fix models path #25

fix models path #25

Conversation

Samoed commented Sep 5, 2024 • edited Loading

Samoed commented Sep 5, 2024

KennethEnevoldsen commented Sep 6, 2024 • edited Loading

Samoed commented Sep 6, 2024

KennethEnevoldsen Sep 6, 2024

Choose a reason for hiding this comment

Samoed Sep 6, 2024

Choose a reason for hiding this comment

KennethEnevoldsen commented Sep 6, 2024

Samoed commented Sep 6, 2024

KennethEnevoldsen commented Sep 6, 2024

Samoed commented Sep 6, 2024 • edited Loading

KennethEnevoldsen commented Sep 6, 2024

Samoed commented Sep 6, 2024

Samoed commented Sep 6, 2024

KennethEnevoldsen left a comment • edited Loading

Choose a reason for hiding this comment

orionw left a comment • edited Loading

Choose a reason for hiding this comment

KennethEnevoldsen commented Sep 9, 2024

Samoed commented Sep 9, 2024 • edited Loading

Samoed commented Sep 9, 2024

KennethEnevoldsen commented Sep 9, 2024

Samoed commented Sep 9, 2024 • edited Loading

KennethEnevoldsen commented Sep 9, 2024 • edited Loading

Samoed commented Sep 9, 2024 • edited Loading

orionw commented Sep 9, 2024

KennethEnevoldsen commented Sep 10, 2024

Samoed commented Sep 5, 2024 •

edited

Loading

KennethEnevoldsen commented Sep 6, 2024 •

edited

Loading

Samoed commented Sep 6, 2024 •

edited

Loading

KennethEnevoldsen left a comment •

edited

Loading

orionw left a comment •

edited

Loading

Samoed commented Sep 9, 2024 •

edited

Loading

Samoed commented Sep 9, 2024 •

edited

Loading

KennethEnevoldsen commented Sep 9, 2024 •

edited

Loading

Samoed commented Sep 9, 2024 •

edited

Loading