Refactor: use Lebesgue integrals and non-negative divergence functions #174

RemyDegenne · 2024-11-03T11:08:46Z

New design for f-divergences

We change the definition of f-divergences to use functions ℝ≥0∞ → ℝ≥0∞ with specific properties bundled in a structure and Lebesgue integrals.

Closes #154

Old design

Before the refactor, the definition of the f-divergence was as follows.

def fDiv (f : ℝ → ℝ) (μ ν : Measure α) : EReal :=
  if ¬ Integrable (fun x ↦ f ((∂μ/∂ν) x).toReal) ν then ⊤
  else ∫ x, f ((∂μ/∂ν) x).toReal ∂ν + derivAtTop f * μ.singularPart ν .univ

Then most results assumed that f was convex and continuous on [0,∞).
Since f is only used composed with a Radon-Nikodym derivative in f ((∂μ/∂ν) x).toReal, it would be more natural to use ℝ≥0∞ for its domain. But if we do so we lose the ability (in Mathlib) to talk about its derivatives, which is essential for some of our proofs. We thus settled for ℝ. For the codomain, we used ℝ in a Bochner integral with the idea that the divergence should be allowed to have negative values: the Kullback-Leibler divergence expressed as ∫ x, llr μ ν x ∂μ takes negative values if the measures don't have the same total mass.

Here are some issues with the current design:

The integral uses the value of f at zero, f 0 : ℝ. The math definition requires that the value at 0 should be equal to the limit of f at 0 from the right. If the limit is finite that's fine, we can simply require that f should be continuous at 0. However, if the limit at 0 is infinite, our current definition cannot encode the math definition. That was not an issue until now but it prevents us from writing desirable statements like the invariance of fDiv by taking the "dual" of f (see issue Skew symmetry of hellingerDiv should be generalized #25 about generalizing skew symmetry).
If we keep the current approach of a real function in an EReal valued divergence, fixing the above would mean that the f-divergence has to be an unwieldy sum of three terms, with integrals on subsets of the space (see the proposal in Refactors of f-divergences #154).
fDiv takes values in EReal, which is a pain to work with. We don't need the negative infinity though.
The use of a Bochner integral forces us to deal with integrability conditions.

New design

The new definition is this.

def fDiv (f : DivFunction) (μ ν : Measure α) : ℝ≥0∞ :=
  ∫⁻ x, f ((∂μ/∂ν) x) ∂ν + f.derivAtTop * μ.singularPart ν .univ

A DivFunction is defined as follows.

structure DivFunction where
  toFun : ℝ≥0∞ → ℝ≥0∞
  one : toFun 1 = 0
  rightDerivOne : rightDeriv (fun x : ℝ ↦ (toFun (ENNReal.ofReal x)).toReal) 1 = 0
  convexOn' : ConvexOn ℝ≥0 univ toFun
  -- the continuity everywhere but 0 and ∞ is implied by the convexity
  continuous' : Continuous toFun

derivAtTop is also redesigned to take values in ℝ≥0∞.

Why we can use `ℝ≥0∞ → ℝ≥0∞` after all

domain: we used ℝ previously to be able to talk about derivatives and to use some convexity lemmas from Mathlib (notably Jensen). This is needed only in specific places. The new approach is to use ℝ≥0∞ → ℝ≥0∞ everywhere in integral computations and to define a function f.realFun : ℝ → ℝ from f to use in the places where derivatives and convexity are needed:

def DivFunction.realFun (f : DivFunction) : ℝ → ℝ := (fun x : ℝ ↦ (f (ENNReal.ofReal x)).toReal)

codomain: we use ℝ≥0∞ to be able to integrate with Lebesgue integrals and not worry about integrability. That means that our f-divergences have to be nonnegative, and the KL definition discussed above cannot work. However, since any f-divergence (in the math sense) is invariant by adding a + b*(x-1) on probability measures, we can simply subtract f 1 + rightDeriv f 1 * (x - 1) from the function to turn it into another one with same f-divergence on probability measures, but for which the f-divergence is nonnegative for all finite measures. We choose to enforce that for our new definition of f-divergences through the fields one and rightDerivOne of DivFunction.

What we gain, what we lose

Gain:

We don't have to have many if to deal with integrability conditions, and don't have to have separate lemmas for the cases where the divergences are infinite.
We can do computations in ℝ≥0∞ instead of EReal, which is a big gain in usability.
We can have functions with infinite limit at 0 without splitting another term from the integral.

Lose:

The definitions of KL and other divergences look a bit more exotic than otherwise. For example, a kl which is equal to an f-divergence is

def kl (μ ν : Measure α) : ℝ≥0∞ :=
  if μ ≪ ν ∧ Integrable (llr μ ν) μ
    then ENNReal.ofReal (∫ x, llr μ ν x ∂μ + (ν .univ).toReal - (μ .univ).toReal)
    else ∞

While we don't have to deal with integrability in proofs about both abstract f-divergences and concrete divergences like KL, for concrete divergences given by a real function we have to prove non-negativity side conditions because of subtraction on ℝ≥0∞.
The Hellinger divergence for a = 0 can't be an f-divergence any more because of its discontinuity at 0. We have to do a special case for it if we want to define it in the old way. Currently the new code has the split only at the level of the Rényi divergence.

TODO

Setting rightDeriv f.realFun 1 = 0 is too restrictive. The conjugate x * f (1/x) will not be a DivFunction unless f actually has a derivative at 1 (because the right derivative of that one at 1 is the left derivative of f). That's not the case for the function that gives TV for example. After the first refactor builds, we should replace that rightDeriv condition by a constraint on the subderivative : 0 ∈ ∂f(1).

… into ennreal

the old statement was false, I needed to add the hp `x ≠ ∞`, I also fixes the dependencies

…r-bounds into ennreal

`hadDeriv...` instead of `hasDeriv...`

I had strengthen the hp `0 ≤ x` to `0 < x`, with the former hp the result is false.

consequencies of the stricter hp of `ConvexOn.nonneg_of_todo`

I had to add the hp `c = 0 → a ≠ 1`, because if `c = 0` and `a = 1` the result is false

…gerDiv_symm`

RemyDegenne added 27 commits October 28, 2024 08:53

start refactor

5e18346

add erealstieltjes

ee36a9e

minor

3782666

work

3236c88

remove sorry in add

604222b

Ioo

1317175

work on ERealStieltjes

4ef903d

sorry-free ERealStieltjes

f354c5e

move lemmas

fca7ea5

minor

73db351

fix FDiv/Basic

e63f93c

fix EqTopIff

d302dce

fix FDiv/CompProd'

5e1477d

fix condFDiv

33f7997

work

2afae2a

fix Trim

6090f82

fix DPIJensen

5ebf7cc

new approach to define the curvature measure of DivFunction

058380d

def of DivFunction.curvatureMeasure

1904d7f

work

c2a7c39

work

3ef8253

work on KL

5eff244

work on KL

26bec55

fixes

45e49da

more KL work

5a52fec

change hellingerFun to set derivative at 1

ce272bc

work on Hellinger

62e6c86

RemyDegenne added the WIP label Nov 3, 2024

RemyDegenne added 2 commits November 3, 2024 17:18

fix some Hellinger, comment out a bunch

0117d10

work on Hellinger and Renyi

c6e275b

RemyDegenne and others added 30 commits November 9, 2024 11:31

minor

739d0eb

Merge branch 'ennreal' of github.com:RemyDegenne/testing-lower-bounds…

f16ec4a

… into ennreal

alternative proof for condFDiv_compProd_meas

a5ca103

wip

7cd833a

some progress towards Jensen for laverage

3350785

prove ENNReal.toReal_Ioo and ENNReal.toReal_Ioo_top

3d84184

the old statement was false, I needed to add the hp `x ≠ ∞`, I also fixes the dependencies

Merge branch 'ennreal' of https://github.com/RemyDegenne/testing-lowe…

3145832

…r-bounds into ennreal

fix typo in the name of two lemmas

ba28931

`hadDeriv...` instead of `hasDeriv...`

add affine_le_of_mem_interior'

2f616eb

prove nonneg_of_todo and nonneg_of_todo'

744ba97

I had strengthen the hp `0 ≤ x` to `0 < x`, with the former hp the result is false.

prove leftDeriv_nonpos_of_isMinOn and rightDeriv_nonneg_of_isMinOn

de07d7b

fix bugs

40801ff

consequencies of the stricter hp of `ConvexOn.nonneg_of_todo`

prove a few sorry, comment out a lot of unused stuff

ff93bd2

work on rightDerivStieltjes

7fdd3bb

rightDerivStieltjes lemmas

529c9b6

split DivFunction file

8faa9da

mk_all

9fcf608

fix

a2eca88

generalize strictConvexOn_mul_log_add_one_sub

4450475

prove mul_log_add_one_sub_eq_zero_iff

1f118cc

prove integral_rpow_rnDeriv_smul_left

78a6a9c

prove integral_rpow_rnDeriv_smul_right

55823fc

I had to add the hp `c = 0 → a ≠ 1`, because if `c = 0` and `a = 1` the result is false

prove hellingerDiv_symm'

1a37692

remove hellingerDiv_symm and rename hellingerDiv_symm' to `hellin…

bbf8c64

…gerDiv_symm`

minor

378f6c8

remove a pair of sorry

864ab00

minor

71536ac

partial proof of derivAtTop_ofConvexOn_of_tendsto_atTop

f3e764e

finish proof of measurable_comp_rnDeriv_of_convexOn_of_continuous

c53f5ba

minor

a5458ac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: use Lebesgue integrals and non-negative divergence functions #174

Refactor: use Lebesgue integrals and non-negative divergence functions #174

RemyDegenne commented Nov 3, 2024 •

edited

Loading

Refactor: use Lebesgue integrals and non-negative divergence functions #174

Are you sure you want to change the base?

Refactor: use Lebesgue integrals and non-negative divergence functions #174

Conversation

RemyDegenne commented Nov 3, 2024 • edited Loading

New design for f-divergences

Old design

New design

Why we can use ℝ≥0∞ → ℝ≥0∞ after all

What we gain, what we lose

TODO

RemyDegenne commented Nov 3, 2024 •

edited

Loading

Why we can use `ℝ≥0∞ → ℝ≥0∞` after all