Leaderboard 2.0: added performance x n_parameters plot + more benchmark info #1437

x-tabdeveloping · 2024-11-11T15:53:31Z

I added an interactive performance vs. number of parameters plot as the first thing people see when selecting a benchmark. #1396
I also added some info on the benchmarks to the benchmark description as Niklas requested here: #1317

Here's a screenshot:

I also bumped the Gradio version, as I thought it might fix certain things, but I have two burning problems still, for which I opened respective issues in Gradio (gradio-app/gradio#9938, gradio-app/gradio#9937)

isaac-chung

This is beautiful! Nice work. I love the current layout.
I see that there's already an open issue on the formatting. Hope we get a response soon.

Samoed · 2024-11-11T16:54:18Z

That's very beautiful!
Here are a few small UI suggestions: maybe move the citation section to the bottom of the page (it could even be collapsible) and switch the order of the table and plot so the table appears at the top after search bar. What do you think?

x-tabdeveloping · 2024-11-12T07:25:41Z

Hey @Samoed thanks! I have been there :D After deliberation I though having the citation up close to the benchmark description makes more sense since it is more visually linked to the specific benchmark, and also fills up a gap that would otherwise be there. I also prefer having the plot first than the table, since it communicates the same information, while being easier to interpret visually.

I'm open to changing it if enough people think we should rearrange things.

Muennighoff

Looks amazing!

x-tabdeveloping · 2024-11-12T07:43:04Z

@Muennighoff @Samoed @isaac-chung @KennethEnevoldsen I would also like to hear your take on whether we should be dark or light theme by default, cause in the case that we want to go dark I can also make the plot with dark background and light text.

Samoed · 2024-11-12T07:44:58Z

I thought Gradio used the system theme by default, which I think is the better option. If not, I would prefer a dark theme

x-tabdeveloping · 2024-11-12T07:47:41Z

Alright, we can stick with the default. It just looks a bit weird to have a light plot against a dark background and vice versa.

Muennighoff · 2024-11-12T08:08:42Z

FYI somehow got the error below when trying to start the LB in a space, but maybe just me?

Traceback (most recent call last):
  File "/home/user/app/app.py", line 5, in <module>
    from mteb.leaderboard.app import demo
  File "/usr/local/lib/python3.10/site-packages/mteb/leaderboard/__init__.py", line 3, in <module>
    from mteb.leaderboard.app import demo
  File "/usr/local/lib/python3.10/site-packages/mteb/leaderboard/app.py", line 79, in <module>
    summary_table, per_task_table = scores_to_tables(default_scores)
  File "/usr/local/lib/python3.10/site-packages/mteb/leaderboard/table.py", line 138, in scores_to_tables
    model_metas.map(lambda m: format_n_parameters(m.n_parameters)),
  File "/usr/local/lib/python3.10/site-packages/pandas/core/series.py", line 4700, in map
    new_values = self._map_values(arg, na_action=na_action)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/base.py", line 921, in _map_values
    return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)
  File "/usr/local/lib/python3.10/site-packages/pandas/core/algorithms.py", line 1743, in map_array
    return lib.map_infer(values, mapper, convert=convert)
  File "lib.pyx", line 2972, in pandas._libs.lib.map_infer
  File "/usr/local/lib/python3.10/site-packages/mteb/leaderboard/table.py", line 138, in <lambda>
    model_metas.map(lambda m: format_n_parameters(m.n_parameters)),
  File "/usr/local/lib/python3.10/site-packages/mteb/leaderboard/table.py", line 36, in format_n_parameters
    n_zeros = math.log10(n_million)
ValueError: math domain error

x-tabdeveloping · 2024-11-12T12:20:20Z

hmm strange enough. Maybe some model had model size -1 or None? Can you make an issue on this? @Muennighoff

x-tabdeveloping · 2024-11-12T12:28:11Z

Nvm I got this, will fix in next PR

x-tabdeveloping added 9 commits November 11, 2024 08:38

Added elementary speed/performance plot

8177d54

Refactored table formatting code

bc68553

Bumped Gradio version

0fb1b2d

Added more general info to benchmark description markdown block

c88eb83

Adjusted margin an range on plot

781ae93

Made hover information easier to read on plot

6f322f8

Made range scaling dynamic in plot

e8f497c

Moved citation next to benchmark description

7deaf64

Made titles in benchmark info bold

b338658

x-tabdeveloping requested review from KennethEnevoldsen, isaac-chung and Muennighoff and removed request for KennethEnevoldsen November 11, 2024 15:53

isaac-chung approved these changes Nov 11, 2024

View reviewed changes

Muennighoff approved these changes Nov 12, 2024

View reviewed changes

x-tabdeveloping merged commit 76c2112 into main Nov 12, 2024
10 checks passed

x-tabdeveloping deleted the plot branch November 12, 2024 07:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leaderboard 2.0: added performance x n_parameters plot + more benchmark info #1437

Leaderboard 2.0: added performance x n_parameters plot + more benchmark info #1437

x-tabdeveloping commented Nov 11, 2024

isaac-chung left a comment •

edited

Loading

Samoed commented Nov 11, 2024

x-tabdeveloping commented Nov 12, 2024

Muennighoff left a comment

x-tabdeveloping commented Nov 12, 2024

Samoed commented Nov 12, 2024 •

edited

Loading

x-tabdeveloping commented Nov 12, 2024

Muennighoff commented Nov 12, 2024

x-tabdeveloping commented Nov 12, 2024

x-tabdeveloping commented Nov 12, 2024

Leaderboard 2.0: added performance x n_parameters plot + more benchmark info #1437

Leaderboard 2.0: added performance x n_parameters plot + more benchmark info #1437

Conversation

x-tabdeveloping commented Nov 11, 2024

isaac-chung left a comment • edited Loading

Choose a reason for hiding this comment

Samoed commented Nov 11, 2024

x-tabdeveloping commented Nov 12, 2024

Muennighoff left a comment

Choose a reason for hiding this comment

x-tabdeveloping commented Nov 12, 2024

Samoed commented Nov 12, 2024 • edited Loading

x-tabdeveloping commented Nov 12, 2024

Muennighoff commented Nov 12, 2024

x-tabdeveloping commented Nov 12, 2024

x-tabdeveloping commented Nov 12, 2024

isaac-chung left a comment •

edited

Loading

Samoed commented Nov 12, 2024 •

edited

Loading