Consider implementing metalog size distributions #585

smk78 · 2023-10-24T15:58:22Z

Following discussions at CanSAS-2023, @Kohlbrecher suggests SasView follow the lead of SASfit and implement its size distributions in terms of a Metalog Distribution:

https://en.wikipedia.org/wiki/Metalog_distribution
https://doi.org/10.1107/S1600576722009037, Section 5.1 & Fig 4

This very flexible distribution could overcome the limitations imposed on fit solutions by the shapes of singular (or composites of singular) distributions.

The Metalog Distribution is defined through its Quantile Probability Distribution (the inverse of the Cumulative Distribution Function).

Many 'conventional' distribution functions also have analytic quantile functions, for example:

https://en.wikipedia.org/wiki/Quantile_function#Simple_example
https://en.wikipedia.org/wiki/Log-normal_distribution, under 'Mode, Median, Quantiles'

It is also possible that the QPD approach could be more numerically stable because it is a proper integral (over 0 - 1) unlike the improper integrals (over 0 - inf) of traditional size distributions.

lucas-wilkins · 2023-10-24T16:24:07Z

I think the quantile approach will be good for 1D polydispersity, but it gets weird, complicated or both when it comes to multiple dimensions.

Kohlbrecher · 2023-10-24T16:33:48Z

I am not sure why you think it gets more complicated using multiple parameters with a distribution. The parameter having a distribution is simply replaced by its QDF. If you have multiple independent parameters each will be replaced by a separate QDF and the multiple integral has to be calculated over a hypercube [0,1]^n

pkienzle · 2025-02-24T15:36:37Z

I played with integration using equally spaced quantile steps a long while back. I was expecting higher accuracy for the same number of evaluations since it is effectively importance sampling for ∫p(x)f(q;x) dx. Even on a normal distribution it couldn't compete with the linear integration points and I didn't pursue it further. [edit: I had written Gauss-Lobatto before, but that is only used for the orientation integrals.]

Looking at the simple example of a sphere, f(q;r) = (4/3 π r³ Δρ j₁(qr)/(qr))² = C ((qr)² j₁(qr))². Looking at the first term of the Taylor expansion this grows as (qr)⁶ so our polydispersity integral around a single q is q⁶∫ exp(-Δr²/2σ²) r⁶ dr. That is, the right tail will be too heavy for importance sampling to work well.

Try with q=1/10, r=100, σ=20 over ±3σ without the gaussian:

plot (r^2 j₁(r/10)/10)^2 from 40 to 160

with quantile based spacing the right side of the function will be undersampled.

Here's the plot again, scaled by the gaussian:

plot exp(-(r-100)^2/(2*20^2)) (r^2 j₁(r/10)/10)^2 from 40 to 160

pkienzle · 2025-02-24T15:45:01Z

Note that we may want to go more than 3σ =160 on the high side:

plot log10(exp(-(r-100)^2/(2*20^2)) (r^2 j1(r/10)/10)^2) from 4 to 240

Kohlbrecher · 2025-02-25T10:31:17Z

To compare the different integration strategies over a size distribution it might be useful to test against an analytical solution available. I could imagine that instead of a Gaussian distribution a gamma distribution of sphere might be useful in this case. I found a paper from Andre Heineman (https://doi.org/10.1107/S0021889800013248) who supplied an analytical expression for core shell particles with a gamma size distribution, whereas the gamma size distribution was parametrized in terms of a mean radius $R_{mean}$ and variance $\sigma^2$ (eq. 15). As the distribution seems to be already parametrized in similar terms than the integration algorithm in SASview this might be a useful test model. The gamma distribution (another name for it is Schulz-distribution, already available in SASview) might also be a nice test case as next to the mean value also the mode and the variance as well as higher moments of the distribution functions are analytically available.

As far as I understand, the SASview internal integration routine make use of algorithm on a fixed grid, like Gauss-Legendre or Gauss-Lobatto. Both algorithms rely on well-chosen grid points which are in the case of these two strategies related to the roots of Legendre polynomials. The boundaries of the integration interval are chosen manual via the GUI by $[R_{mean}-n \sigma;R_{mean}+n \sigma]$. It has been mentioned in the previous comment, that one must take care to avoid clipping effects due to the $r^6$ dependency of the form factor of spherical particles as the integral is dominated by larger particles. My first suggestion would be to test the loss of precision due to clipping the integration interval from $[0;\infty)$ to $[R_{mean}-n \sigma;R_{mean}+n \sigma]$.

Instead of clipping the semi-infinite interval $[0,+\infty)$ one often tries a variable transformation like $r = (1-t)/t$, which converts the semi-infinite interval into a semi-open interval (0,1]. However, by this those quadrature algorithms evaluating the function at the interval boundaries might not be suitable anymore, like Gauss-Legendre or Gauss-Lobatto, or the case t=0 needs to be handled as a special case or instead of $(0,1]$ one tries [np.finfo(np.float).eps;1].

In the past I thought this would be the solution for calculating size smeared form factors. However, test have shown that for broad distribution the transformation works well, but for very narrow distributions the integrand is zero almost everywhere in the interval (0;1] except in a very tiny range, which is typically difficult to catch with a low number of grid points. Therefore, I initially went back to the clipping strategy of the integration interval. As it was mentioned already one needs to take care, that due to the $r^6$ dependency of the form factor of spheres the interval clipping does not lead to truncation effects. SASview supplies distributions $w(r)$ of Gaussian, LogNorm, Schulz (which I think is identical to the gamma distribution mentioned above) and some more. All these distributions have in common, that their variance and mode or mean are known analytically. Furthermore, if one generate an new distribution out of them defined as $p(r,\alpha) = w(r)*r^\alpha / \int_0^\infty w(r)*r^\alpha dr$ also these distributions are analytically known as well as their mode, mean and variance. The integration boundaries can therefore be chosen by $[R_{mode,p}-n \sigma_p;R_{mode,p}+n \sigma_p]$, where $R_{mode,p}$ and $\sigma_p^2$ are the mode and variance of $p(r,\alpha)$ instead of $w(r)$, i.e. the integration looks like:
$$\int_{R_{mode,p}-n \sigma_p}^{R_{mode,p}+n \sigma_p} w(r) F^2(h,r) dr $$, where $h$ is the scattering vector.
The disadvantage here is, that one needs an information about the scaling ($\alpha$) of the form factor, i.e., if it scales with $r^{\alpha=6}$ for spheres or in case of cylinders with $r^{\alpha=4}$ or in case of planar systems with a thickness ($d$) distribution scaling with $d^{\alpha=2}$.

What the above-mentioned distributions additionally have in common is that there is an analytical available expression for their quantile distribution function $Q(y)=r$. I must admit having only successfully tested that by using them one does not need to know anything about the scaling behavior of the form factor with the parameter having a distribution and can handle both very broad as well as very narrow distributions without clipping issues.

Especially when mainly using adaptive integration routines to have control over the relative and absolute error of the size smearing, performance comparison tests with fixed grid algorithms need more caution.

Coming back to the initial suggested model for testing and comparing the different strategies, the above-mentioned case might be useful as the formula given by Andre Heinemann are analytical and can be calculated to arbitrary precision.

Kohlbrecher · 2025-02-25T19:34:34Z

You mentioned you "played with integration using equally spaced quantile steps". Actually, what I suggested was not to use the quantile distribution function for the generation of a grid. I suggested a full change of variables for the size distribution integral. Instead of integrating over the size parameter $\int_0^\infty w(r) F^2(h,r) dr$ the suggested integration was over the cumulant $dy$ by the substitution $r=Q(y)$ with $Q(y)$ being the quantile distribution of $w(r)$ so that the integral to be solved becomes $\int_0^1 F^2(h,Q(y)) dy$. Please, do not be confused that I used now $h$ for the scattering vector. I did it as in literature the character $Q()$ is used for the quantile distribution and $q()$ for the quantile density distribution. The new integral now can be treated with standard quadrature algorithms like Gauss-Legendre or adaptive algorithms.

smk78 added enhancement SasModels Infrastructure labels Oct 24, 2023

pkienzle mentioned this issue Feb 25, 2025

point spacing of polydispersity should not always be linear. #633

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider implementing metalog size distributions #585

Consider implementing metalog size distributions #585

smk78 commented Oct 24, 2023 •

edited

Loading

lucas-wilkins commented Oct 24, 2023

Kohlbrecher commented Oct 24, 2023

pkienzle commented Feb 24, 2025 •

edited

Loading

pkienzle commented Feb 24, 2025

Kohlbrecher commented Feb 25, 2025 •

edited

Loading

Kohlbrecher commented Feb 25, 2025 •

edited

Loading

Consider implementing metalog size distributions #585

Consider implementing metalog size distributions #585

Comments

smk78 commented Oct 24, 2023 • edited Loading

lucas-wilkins commented Oct 24, 2023

Kohlbrecher commented Oct 24, 2023

pkienzle commented Feb 24, 2025 • edited Loading

pkienzle commented Feb 24, 2025

Kohlbrecher commented Feb 25, 2025 • edited Loading

Kohlbrecher commented Feb 25, 2025 • edited Loading

smk78 commented Oct 24, 2023 •

edited

Loading

pkienzle commented Feb 24, 2025 •

edited

Loading

Kohlbrecher commented Feb 25, 2025 •

edited

Loading

Kohlbrecher commented Feb 25, 2025 •

edited

Loading