Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tools to test and debug JuMP models #3664

Open
odow opened this issue Feb 3, 2024 · 9 comments
Open

Tools to test and debug JuMP models #3664

odow opened this issue Feb 3, 2024 · 9 comments

Comments

@odow
Copy link
Member

odow commented Feb 3, 2024

Despite making it easy for users to formulate and solve a wide range of optimization problems, JuMP provides little support for the users who make mistakes, or tools for advanced users to debug problematic models. Moreover, in our experience the majority of (expensive) human programmer time is spent, not in formulating or solving a model, but in the debugging and testing stage of development.

Common questions that developers ask us during the debugging and testing stage of development include:

  • How do I find out why an optimization model is infeasible or unbounded?
  • How do I know if the model has numerical issues?
  • How to identify parts of the model that could be improved?

The goal of this project is to develop an automated analysis framework that can identify common problems to reduce the time spent by developers in the debugging and testing phase. There are a range of problems which are trivial to identify and fix. We suggest that the correct approach is not to fix the problems automatically (as the AMPL presolve does), but to educate the user on how to write better models.

Ideas

The deliverable for this issue is a Julia package that exposes an API for linting JuMP and MathOptInterface models. See below for a set of initial ideas.

Summary statistics

Print summary statistics of the model such as the number of variables and constraints, whether bridging is used, and a hash of the problem data.

Motivation: Simple statistics are often helpful to give an idea of the model. If the number of variables or constraints is very large, it may be difficult to solve, and if the number of variables or constraints does not match what the user expected, they may have made a mistake in the formulation.

Prior art: Most solvers provide this information. It is trivial to implement.

Coefficient analysis

For each constraint type, print statistics such as the number of constraints, the range of left- and right-hand side coefficients, and the density of the coefficients (number of terms / number of variables in the model), and whether the function is convex or non-convex.

Motivation: Poorly scaled problems have coefficients that vary by many orders of magnitude. Poor problem scaling is the most common cause of numerical problems as many common solution algorithms include a decomposition of the constraint matrix whose stability depends on the condition number of that matrix. Poorly scaled problems are usually an indication of user-error, for example, using the wrong units so a coefficient is off by a factor of 10^N. Once problematic coefficients are identified, they can usually be fixed by scaling the variables and input coefficients.

Prior art: Some solvers provide this information. Oscar has implemented it before in SDDP.jl: https://github.com/odow/SDDP.jl/blob/9e88c6ba2a7dacc0f842b9be3f8073acfd55073e/src/print.jl#L191-L384

Degeneracy

Identify constraints in the problem that are linearly dependent.

Motivation: Most solution algorithms require a linearly independent set of constraints. Because linearly dependent constraints are commonly provided by users, most solvers have detection routines for them. However, they are usually a sign that a model’s formulation can be improved.

Prior art: Alex Dowling has a paper and implementation https://github.com/adowling2/DegeneracyHunter.jl

Incidence analysis

Identify nonlinear models in which the Jacobian is not full rank. This problem is closely related to the degeneracy problem.

Motivation: Under- or over-constrained systems of nonlinear equations cause solvers trouble, and they are typically a sign of user-error since physical systems are typically not over- or under-constrained.

Prior art: Robby Parker has an implementation of this for Pyomo https://pyomo.readthedocs.io/en/stable/contributed_packages/incidence/index.html, and an in-development version of the code for JuMP.

Bounds given as constraints

Identify constraints of the form @constraint(model, l <= x <= u). These should instead be specified as @variable(model, l <= x <= u).

Motivation: The former adds a new linear constraint row to the problem. The latter sets a variable’s bounds. Solvers typically have specialised support for variable bounds. If the solver doesn’t have specialised support for variable bounds, JuMP will reformulate into the constraint version.

Prior art: Most solvers have a presolve that does this, but it’s a sign of user-error that could be improved.

Variables that do not appear in constraints

Variables that do not appear in constraints are generally a sign of user-error. They might have forgotten to include the variable, or they could make the model more performant by omitting the variable in the first place.

Starting point analysis

Starting points which violate domain restrictions like log(x) when x = 0 to start are common. Therefore, we should analyse the feasibility of a given starting point, and potentially that of the first- and second-derivatives as well.

Domain analysis

Nonlinear programs often include terms like log(x) where x >= 0. This is a common cause of domain violations because the solver may try x = 0. It’s usually better to add an explicit lower bound on x of x >= 0.01.

Convexity analysis

Proving convexity is difficult. But we could test feasible trial points for a proof of nonconvexity. A full disciplined convex programming package is out-of-scope for this project.

Prior art: CVXPY has an implementation: https://github.com/cvxpy/cvxpyanalyzer/blob/master/analyzer/convexity_checker.py

Irreducible infeasible subsystem

If the problem is infeasible we could try and find a reduced subsystem that is still infeasible. This does not need to be the minimal subsystem (a notoriously difficult problem to solve), even a smaller system can be helpful.

Prior art: Python-MIP includes an implementation:
https://github.com/coin-or/python-mip/blob/6044fc8f0414d71430e94b8a08573b695dc35b5a/mip/conflict.py#L15

Dual feasibility report

We have primal_feasibility_report, but we need a way to verify the feasibility of dual solutions

This could also include a way to check whether the primal and dual objectives match, and whether objective_value(model) matches value(objective_function(model)). See jump-dev/HiGHS.jl#223

Hessian analysis

See jump-dev/MathOptInterface.jl#2527

We should provide a tool that determines the density of the Hessian, and whether user-level reformulations could improve performance.

MIP solution refiner

See https://youtu.be/rKcdF4Fgl-g?feature=shared&t=2535

@odow
Copy link
Member Author

odow commented Sep 2, 2024

I'm going to point to jump-dev/HiGHS.jl#223 as well.

I think we need some way to generically assess solution quality, because solvers lie to us. cc @sstroemer

@odow odow changed the title Tools to test JuMP models Tools to test and debug JuMP models Sep 3, 2024
@odow
Copy link
Member Author

odow commented Sep 3, 2024

I've updated the description of this issue with my notes about a JuMP linter.

@odow
Copy link
Member Author

odow commented Sep 7, 2024

cc @Robbybp: this is the issue I was talking about this morning

@sstroemer
Copy link

Thanks for the great summary - just wanted to say that I've some (the ones that relate to LPs) of those points partially implemented - but only based on the LP matrix data struct. If this isn't already done anyways (since you're always so fast... 😅) when I return in Oct I'll clean it up and link it here!

@joaquimg
Copy link
Member

A long time ago, in a galaxy far, far away I started sketching this:https://github.com/joaquimg/FeasibilityOptInterface.jl/tree/jg/first/src

The main feedback is to try using Dualization.jl to check dual stuff.

Might be nice checking complementarity constraints as well

@joaquimg
Copy link
Member

I don't know how this would be done, but we can at least have a manual/tutorial section telling the user about:

Clearing the solver after a failed (sometimes infeasible) optimization.
In the case of HiGHS it is: Highs_clearSolver. In other solvers, the solution might be rebuild. Other solutions might be: "Try another method".

@odow
Copy link
Member Author

odow commented Oct 14, 2024

Obligatory link to where I automatically try to reset the optimizer in SDDP
https://github.com/odow/SDDP.jl/blob/2fd1e8d89b562df97fa23597b36b6c3a18c185a5/src/algorithm.jl#L295-L322

@odow
Copy link
Member Author

odow commented Nov 6, 2024

I'll also leave a link to this section of the TimeFold docs, which discuss testing:
https://docs.timefold.ai/timefold-solver/latest/constraints-and-score/score-calculation#constraintStreamsTesting

@odow
Copy link
Member Author

odow commented Jan 12, 2025

This tooling should also have some way of tracking/profiling where JuMP spends its time: #3910

Pyomo has a built in profiler: see pg 129 of https://www.osti.gov/servlets/purl/1771935

The other place to look for inspiration is cvxpy, which prints a log of "compiling to x" and the time, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants