Home: SGM on intersection of benchmarks solved by all #114

siddharth-krishna · 2025-02-21T06:56:26Z

SGM Runtime is currently very misleading on #107 because CBC solves much fewer benchmarks compared to HiGHS:

It would be good to add a toggle/dropdown between "Compute SGM using TO values / penalizing TO by a factor of <> / only on intersection of solved benchmarks" and adding the following tooltips to explain the choices:

When calculating SGM, there are many choices of what to do about benchmarks that error or time-out. "Using TO values" will assign to them the time-out or the maximum value of memory used when running the benchmark instance. "Penalizing TO by a factor of X" will take the TO/max value from the previous option and multiply it by a factor of X. "Intersection of solved benchmarks" will filter the benchmark instances to those that are solved by all solvers before computing SGM, so that there are no error or time-out values to consider.

The code will also need to be updated to compute the SGM appropriately based on what the user selects.

For the "Intersection of solved benchmarks" case, the "SGM Runtime" column header should display the number of benchmarks in the intersection. E.g. "SGM Runtime on 5 benchmarks"

jacek-oet · 2025-02-21T14:44:29Z

@siddharth-krishna
When a user selects ‘penalizing TO by a factor of’, we display an input field like the one shown in the image, correct?

siddharth-krishna · 2025-02-24T07:27:17Z

Yes, thank you. The default value of X can be 5.

While you are updating this code, can you also keep in mind that in the next run of benchmarks we will have some benchmarks with e.g. a TO of 1h (S, M, ...) and some with a TO of 10h (R). The results CSV file should have the appropriate TO value in the "Runtime (s)" field (either 1h or 10h), so can we make sure that the code also uses the TO value given in the same row in the CSV instead of using a single TO value for all benchmarks? Thank you.

jacek-oet · 2025-02-25T07:58:15Z

@siddharth-krishna
Do you mean that if a benchmark result is 'TO', we will map the corresponding runtime value based on the size column, or that in the next run there will be an additional 'TO' column containing the time value? If it's a new column, I'll update the code when it's available.

siddharth-krishna · 2025-02-25T08:16:47Z

Sorry for not being super clear. What I mean is that for rows in the CSV file where Status == TO, the Runtime (s) column will have the TO value that was used while running that benchmark. Note that different benchmarks will have different TO values in the next benchmark run. So the CSV file will look like e.g.:

Benchmark,Size,Solver,Solver Version,Solver Release Year,Status,Termination Condition,Runtime (s),Memory Usage (MB),Objective Value,Max Integrality Violation,Duality Gap
pypsa-eur-sec,2-24h,glpk,5.0,2020,TO,Timeout,600,134.748,,,
pypsa-eur-sec,2-24h,highs,5.0,2020,ok,optimal,32,134.748,,,
genx-real,2-24h,glpk,5.0,2020,TO,Timeout,3600,134.748,,,
genx-real,2-24h,highs,5.0,2020,ok,optimal,2345,134.748,,,

In the above example, the first benchmark TO-ed on glpk after 600s, while the second benchmark TO-ed after 3600s. So when penalizing by a factor of e.g. 5, you would do 600*5 for the first benchmark on glpk and 3600*5 for the second. Does that make sense?

jacek-oet · 2025-02-25T10:29:26Z

@siddharth-krishna
Does this apply only to results with status 'TO'? For other warnings or errors, do we still use a default runtime value of 600? And will the Timeout field on the dashboard/home display the maximum TO value from the CSV file?

siddharth-krishna · 2025-02-25T11:12:41Z

Great questions..

Does this apply only to results with status 'TO'? For other warnings or errors, do we still use a default runtime value of 600?

Unfortunately, right now I think the status warning is using the actual runtime and not the TO value in the CSV. I'll open an issue to fix this (see #123). Perhaps a cleaner solution is to add a Timeout column to the CSV so that each row can record the TO value used while running it? What do you think?

For this PR, just use the Runtime (s) value for non-ok statuses, and we'll fix things later, maybe?

And will the Timeout field on the dashboard/home display the maximum TO value from the CSV file?

I've opened an issue to discuss this, I think the answer is to update the design: #122. For now let's keep the hardcoded runtime with a TODO that points to this issue?

siddharth-krishna mentioned this issue Feb 21, 2025

Website: Q4 improvements and cleanup #95

Open

18 tasks

jacek-oet linked a pull request Feb 27, 2025 that will close this issue

Website: add option to calculate SGM on benchmarks solved by all #118

Open

siddharth-krishna mentioned this issue Feb 25, 2025

Runner: some improvements and fixes #123

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home: SGM on intersection of benchmarks solved by all #114

Home: SGM on intersection of benchmarks solved by all #114

siddharth-krishna commented Feb 21, 2025 •

edited

Loading

jacek-oet commented Feb 21, 2025

siddharth-krishna commented Feb 24, 2025

jacek-oet commented Feb 25, 2025

siddharth-krishna commented Feb 25, 2025

jacek-oet commented Feb 25, 2025

siddharth-krishna commented Feb 25, 2025 •

edited

Loading

Home: SGM on intersection of benchmarks solved by all #114

Home: SGM on intersection of benchmarks solved by all #114

Comments

siddharth-krishna commented Feb 21, 2025 • edited Loading

jacek-oet commented Feb 21, 2025

siddharth-krishna commented Feb 24, 2025

jacek-oet commented Feb 25, 2025

siddharth-krishna commented Feb 25, 2025

jacek-oet commented Feb 25, 2025

siddharth-krishna commented Feb 25, 2025 • edited Loading

siddharth-krishna commented Feb 21, 2025 •

edited

Loading

siddharth-krishna commented Feb 25, 2025 •

edited

Loading