Fix low-rank convergence criterion #547

michalk8 · 2024-06-05T13:44:14Z

closes #495

codecov · 2024-06-05T14:26:39Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.39%. Comparing base (c6fb25c) to head (ef479a8).
Report is 33 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #547      +/-   ##
==========================================
+ Coverage   89.09%   89.39%   +0.30%     
==========================================
  Files          70       71       +1     
  Lines        7427     7556     +129     
  Branches     1051     1080      +29     
==========================================
+ Hits         6617     6755     +138     
+ Misses        659      650       -9     
  Partials      151      151

Files with missing lines	Coverage Δ
src/ott/solvers/linear/sinkhorn_lr.py	`98.65% <100.00%> (+<0.01%)`	⬆️
src/ott/solvers/quadratic/gromov_wasserstein_lr.py	`81.60% <100.00%> (+0.05%)`	⬆️

... and 2 files with indirect coverage changes

marcocuturi

This LGTM!

I am convinced that we need to get rid of the (1/ gamma**2) rescaling in line 60 of sinkhorn_lr.py. This rescaling might make sense for a theoretical analysis, but does not make sense for a practical convergence analysis.

tagging @meyerscetbon who might have an opinion, but, for instance, setting large gamma, e.g. to 100 vs. 10, makes the same criterion be rescaled "optimistically" (smaller) by 1e2! with a threshold of 1e-3 that we use by default, this makes absolutely no sense, and might explain erratic behavior.

marcocuturi · 2024-06-07T12:04:54Z

src/ott/solvers/linear/sinkhorn_lr.py

@@ -687,7 +682,10 @@ def one_iteration(
        lambda: state.reg_ot_cost(ot_prob, epsilon=self.epsilon),
        lambda: jnp.inf
    )
-    error = state.compute_error(previous_state)
+    error = jax.lax.cond(


good catch!

marcocuturi · 2024-06-10T11:27:22Z

I suggest to:

remove this factor gamma^2 in the error. With the default gamma, this was dividing the criterion by 100, and hence applying a very large threshold=1e-1.
keep the default threshold parameter in LRSinkhorn. It's inheriting currently the 1e-3 setting of Sinkhorn. When removing gamma^2 this sounds like a reasonable value (although we might want to make it size dependent, as should be done soon with Sinkhorn)

meyerscetbon · 2024-06-10T11:47:38Z

The convergence criterion (as currently implemented) only makes “theoretical” sense when the gradient step is constant throughout the iterations. In that case, for each choice of gamma, we could adapt the desired approximation error (by considering gamma^2 \times esp) as suggested Marco. This will be equivalent as the current procedure and we are additionally eliminating an indeterminacy.

However, as we are using a rescaled gradient step, we should, in theory, monitor 1/gamma_k^2 (err_Q + err_R + err_g). And if gamma_k^2 converges towards a constant (close to 1) then it becomes equivalent to only monitor the errors. We could also apply the same idea as in the constant gradient-step case, by monitoring gamma_k^2 \times eps instead.

I hope this helps a little.

marcocuturi · 2024-06-10T20:15:38Z

thanks @meyerscetbon ! I am not sure what you advocate though, practically speaking :) It seems to me that the sum of the 3 KLs is more natural as a quantity to monitor, independently of the stepsize that's chosen. Is there any reason why, intuitively, the termination criterion should be scaled with stepsize? For instance, if the first gamma=1000 (small stepsize) we will immediately converge in one iteration in principle. This seems counterintuitive to me.

meyerscetbon · 2024-06-10T21:18:17Z

I was trying to say that in the proof, we look at the convergence of (1/\gamma_k^2) * err (where err are the KL terms) and gamma_k is not assumed to be constant (indeed we only assume that it is constant at the end of the proof in order to obtain a sufficient condition of convergence). Therefore there might be some cases where the convergence of (1/\gamma_k^2) * err does not imply the convergence of err. However I think we can safely assume that gamma_k will converge towards a constant and therefore we can only monitor the error term as you suggested.

I was also trying to say that when gamma is constant along the iterations, then rescaling the error by (1/\gamma^2) or not is the same in term of convergence, and we can, as you suggested, remove (1/gamma^2) from the criterion and only monitor the error term.

Sorry for the confusion.

giovp · 2024-06-17T15:18:56Z

thanks for the discussion all, very interesting, and thanks @michalk8 for the fix. I've just tested on some of the failing tests we have in moscot and now the convergence seems to be set correctly (returning True when it should be).

selmanozleyen · 2024-06-17T16:08:48Z

These are the related issues:

Fix low-rank convergence criterion

817a354

michalk8 added the bug Something isn't working label Jun 5, 2024

michalk8 requested a review from marcocuturi June 5, 2024 16:09

marcocuturi approved these changes Jun 7, 2024

View reviewed changes

Remove gamma scaling factor

b1e22da

michalk8 added 6 commits June 27, 2024 16:52

Merge branch 'main' into feature/better-converged

023abcc

Merge branch 'main' into feature/better-converged

4fc3388

Merge branch 'main' into feature/better-converged

5921692

Fix test_progress_fn test

bfae791

Increase tolerance in test_lr_unbalanced_ti

629d65e

Remove generalized k-means test

ef479a8

michalk8 merged commit 7cfd393 into main Jul 3, 2024
12 checks passed

michalk8 deleted the feature/better-converged branch July 3, 2024 15:43

selmanozleyen mentioned this pull request Jul 3, 2024

Remove skip markers for balanced since LR criterion is fixed theislab/moscot#728

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix low-rank convergence criterion #547

Fix low-rank convergence criterion #547

michalk8 commented Jun 5, 2024

codecov bot commented Jun 5, 2024 •

edited

Loading

marcocuturi left a comment

marcocuturi Jun 7, 2024

marcocuturi commented Jun 10, 2024

meyerscetbon commented Jun 10, 2024

marcocuturi commented Jun 10, 2024

meyerscetbon commented Jun 10, 2024

giovp commented Jun 17, 2024 •

edited

Loading

selmanozleyen commented Jun 17, 2024

Fix low-rank convergence criterion #547

Fix low-rank convergence criterion #547

Conversation

michalk8 commented Jun 5, 2024

codecov bot commented Jun 5, 2024 • edited Loading

Codecov Report

marcocuturi left a comment

Choose a reason for hiding this comment

marcocuturi Jun 7, 2024

Choose a reason for hiding this comment

marcocuturi commented Jun 10, 2024

meyerscetbon commented Jun 10, 2024

marcocuturi commented Jun 10, 2024

meyerscetbon commented Jun 10, 2024

giovp commented Jun 17, 2024 • edited Loading

selmanozleyen commented Jun 17, 2024

codecov bot commented Jun 5, 2024 •

edited

Loading

giovp commented Jun 17, 2024 •

edited

Loading