Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: use Lebesgue integrals and non-negative divergence functions #174

Open
wants to merge 89 commits into
base: master
Choose a base branch
from

Conversation

RemyDegenne
Copy link
Owner

@RemyDegenne RemyDegenne commented Nov 3, 2024

New design for f-divergences

We change the definition of f-divergences to use functions ℝ≥0∞ → ℝ≥0∞ with specific properties bundled in a structure and Lebesgue integrals.

Closes #154

Old design

Before the refactor, the definition of the f-divergence was as follows.

def fDiv (f : ℝ → ℝ) (μ ν : Measure α) : EReal :=
  if ¬ Integrable (fun x ↦ f ((∂μ/∂ν) x).toReal) ν thenelse ∫ x, f ((∂μ/∂ν) x).toReal ∂ν + derivAtTop f * μ.singularPart ν .univ

Then most results assumed that f was convex and continuous on [0,∞).
Since f is only used composed with a Radon-Nikodym derivative in f ((∂μ/∂ν) x).toReal, it would be more natural to use ℝ≥0∞ for its domain. But if we do so we lose the ability (in Mathlib) to talk about its derivatives, which is essential for some of our proofs. We thus settled for . For the codomain, we used in a Bochner integral with the idea that the divergence should be allowed to have negative values: the Kullback-Leibler divergence expressed as ∫ x, llr μ ν x ∂μ takes negative values if the measures don't have the same total mass.

Here are some issues with the current design:

  • The integral uses the value of f at zero, f 0 : ℝ. The math definition requires that the value at 0 should be equal to the limit of f at 0 from the right. If the limit is finite that's fine, we can simply require that f should be continuous at 0. However, if the limit at 0 is infinite, our current definition cannot encode the math definition. That was not an issue until now but it prevents us from writing desirable statements like the invariance of fDiv by taking the "dual" of f (see issue Skew symmetry of hellingerDiv should be generalized #25 about generalizing skew symmetry).
  • If we keep the current approach of a real function in an EReal valued divergence, fixing the above would mean that the f-divergence has to be an unwieldy sum of three terms, with integrals on subsets of the space (see the proposal in Refactors of f-divergences #154).
  • fDiv takes values in EReal, which is a pain to work with. We don't need the negative infinity though.
  • The use of a Bochner integral forces us to deal with integrability conditions.

New design

The new definition is this.

def fDiv (f : DivFunction) (μ ν : Measure α) : ℝ≥0∞ :=
  ∫⁻ x, f ((∂μ/∂ν) x) ∂ν + f.derivAtTop * μ.singularPart ν .univ

A DivFunction is defined as follows.

structure DivFunction where
  toFun : ℝ≥0∞ → ℝ≥0∞
  one : toFun 1 = 0
  rightDerivOne : rightDeriv (fun x : ℝ ↦ (toFun (ENNReal.ofReal x)).toReal) 1 = 0
  convexOn' : ConvexOn ℝ≥0 univ toFun
  -- the continuity everywhere but 0 and ∞ is implied by the convexity
  continuous' : Continuous toFun

derivAtTop is also redesigned to take values in ℝ≥0∞.

Why we can use ℝ≥0∞ → ℝ≥0∞ after all

  • domain: we used previously to be able to talk about derivatives and to use some convexity lemmas from Mathlib (notably Jensen). This is needed only in specific places. The new approach is to use ℝ≥0∞ → ℝ≥0∞ everywhere in integral computations and to define a function f.realFun : ℝ → ℝ from f to use in the places where derivatives and convexity are needed:
def DivFunction.realFun (f : DivFunction) : ℝ → ℝ := (fun x : ℝ ↦ (f (ENNReal.ofReal x)).toReal)
  • codomain: we use ℝ≥0∞ to be able to integrate with Lebesgue integrals and not worry about integrability. That means that our f-divergences have to be nonnegative, and the KL definition discussed above cannot work. However, since any f-divergence (in the math sense) is invariant by adding a + b*(x-1) on probability measures, we can simply subtract f 1 + rightDeriv f 1 * (x - 1) from the function to turn it into another one with same f-divergence on probability measures, but for which the f-divergence is nonnegative for all finite measures. We choose to enforce that for our new definition of f-divergences through the fields one and rightDerivOne of DivFunction.

What we gain, what we lose

Gain:

  • We don't have to have many if to deal with integrability conditions, and don't have to have separate lemmas for the cases where the divergences are infinite.
  • We can do computations in ℝ≥0∞ instead of EReal, which is a big gain in usability.
  • We can have functions with infinite limit at 0 without splitting another term from the integral.

Lose:

  • The definitions of KL and other divergences look a bit more exotic than otherwise. For example, a kl which is equal to an f-divergence is
def kl (μ ν : Measure α) : ℝ≥0∞ :=
  if μ ≪ ν ∧ Integrable (llr μ ν) μ
    then ENNReal.ofReal (∫ x, llr μ ν x ∂μ + (ν .univ).toReal - (μ .univ).toReal)
    else
  • While we don't have to deal with integrability in proofs about both abstract f-divergences and concrete divergences like KL, for concrete divergences given by a real function we have to prove non-negativity side conditions because of subtraction on ℝ≥0∞.
  • The Hellinger divergence for a = 0 can't be an f-divergence any more because of its discontinuity at 0. We have to do a special case for it if we want to define it in the old way. Currently the new code has the split only at the level of the Rényi divergence.

TODO

  • Setting rightDeriv f.realFun 1 = 0 is too restrictive. The conjugate x * f (1/x) will not be a DivFunction unless f actually has a derivative at 1 (because the right derivative of that one at 1 is the left derivative of f). That's not the case for the function that gives TV for example. After the first refactor builds, we should replace that rightDeriv condition by a constraint on the subderivative : 0 ∈ ∂f(1).

@RemyDegenne RemyDegenne added the WIP label Nov 3, 2024
RemyDegenne and others added 30 commits November 9, 2024 11:31
the old statement was false, I needed to add the hp `x ≠ ∞`, I also fixes the dependencies
`hadDeriv...` instead of `hasDeriv...`
I had strengthen the hp `0 ≤ x` to `0 < x`, with the former hp the result is false.
consequencies of the stricter hp of `ConvexOn.nonneg_of_todo`
I had to add the hp `c = 0 → a ≠ 1`, because if `c = 0` and `a = 1` the result is false
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactors of f-divergences
2 participants