Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Home: SGM on intersection of benchmarks solved by all #114

Open
Tracked by #95
siddharth-krishna opened this issue Feb 21, 2025 · 6 comments · May be fixed by #118
Open
Tracked by #95

Home: SGM on intersection of benchmarks solved by all #114

siddharth-krishna opened this issue Feb 21, 2025 · 6 comments · May be fixed by #118

Comments

@siddharth-krishna
Copy link
Contributor

siddharth-krishna commented Feb 21, 2025

SGM Runtime is currently very misleading on #107 because CBC solves much fewer benchmarks compared to HiGHS:

Image

It would be good to add a toggle/dropdown between "Compute SGM using TO values / penalizing TO by a factor of <> / only on intersection of solved benchmarks" and adding the following tooltips to explain the choices:

When calculating SGM, there are many choices of what to do about benchmarks that error or time-out. "Using TO values" will assign to them the time-out or the maximum value of memory used when running the benchmark instance. "Penalizing TO by a factor of X" will take the TO/max value from the previous option and multiply it by a factor of X. "Intersection of solved benchmarks" will filter the benchmark instances to those that are solved by all solvers before computing SGM, so that there are no error or time-out values to consider.

The code will also need to be updated to compute the SGM appropriately based on what the user selects.

For the "Intersection of solved benchmarks" case, the "SGM Runtime" column header should display the number of benchmarks in the intersection. E.g. "SGM Runtime on 5 benchmarks"

@jacek-oet
Copy link
Collaborator

@siddharth-krishna
When a user selects ‘penalizing TO by a factor of’, we display an input field like the one shown in the image, correct?

Image

@siddharth-krishna
Copy link
Contributor Author

Yes, thank you. The default value of X can be 5.

While you are updating this code, can you also keep in mind that in the next run of benchmarks we will have some benchmarks with e.g. a TO of 1h (S, M, ...) and some with a TO of 10h (R). The results CSV file should have the appropriate TO value in the "Runtime (s)" field (either 1h or 10h), so can we make sure that the code also uses the TO value given in the same row in the CSV instead of using a single TO value for all benchmarks? Thank you.

@jacek-oet
Copy link
Collaborator

@siddharth-krishna
Do you mean that if a benchmark result is 'TO', we will map the corresponding runtime value based on the size column, or that in the next run there will be an additional 'TO' column containing the time value? If it's a new column, I'll update the code when it's available.

@siddharth-krishna
Copy link
Contributor Author

Sorry for not being super clear. What I mean is that for rows in the CSV file where Status == TO, the Runtime (s) column will have the TO value that was used while running that benchmark. Note that different benchmarks will have different TO values in the next benchmark run. So the CSV file will look like e.g.:

Benchmark,Size,Solver,Solver Version,Solver Release Year,Status,Termination Condition,Runtime (s),Memory Usage (MB),Objective Value,Max Integrality Violation,Duality Gap
pypsa-eur-sec,2-24h,glpk,5.0,2020,TO,Timeout,600,134.748,,,
pypsa-eur-sec,2-24h,highs,5.0,2020,ok,optimal,32,134.748,,,
genx-real,2-24h,glpk,5.0,2020,TO,Timeout,3600,134.748,,,
genx-real,2-24h,highs,5.0,2020,ok,optimal,2345,134.748,,,

In the above example, the first benchmark TO-ed on glpk after 600s, while the second benchmark TO-ed after 3600s. So when penalizing by a factor of e.g. 5, you would do 600*5 for the first benchmark on glpk and 3600*5 for the second. Does that make sense?

@jacek-oet
Copy link
Collaborator

@siddharth-krishna
Does this apply only to results with status 'TO'? For other warnings or errors, do we still use a default runtime value of 600? And will the Timeout field on the dashboard/home display the maximum TO value from the CSV file?

Image

@siddharth-krishna
Copy link
Contributor Author

siddharth-krishna commented Feb 25, 2025

Great questions..

Does this apply only to results with status 'TO'? For other warnings or errors, do we still use a default runtime value of 600?

Unfortunately, right now I think the status warning is using the actual runtime and not the TO value in the CSV. I'll open an issue to fix this (see #123). Perhaps a cleaner solution is to add a Timeout column to the CSV so that each row can record the TO value used while running it? What do you think?

For this PR, just use the Runtime (s) value for non-ok statuses, and we'll fix things later, maybe?

And will the Timeout field on the dashboard/home display the maximum TO value from the CSV file?

I've opened an issue to discuss this, I think the answer is to update the design: #122. For now let's keep the hardcoded runtime with a TODO that points to this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants