Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider implementing metalog size distributions #585

Open
smk78 opened this issue Oct 24, 2023 · 6 comments
Open

Consider implementing metalog size distributions #585

smk78 opened this issue Oct 24, 2023 · 6 comments

Comments

@smk78
Copy link
Contributor

smk78 commented Oct 24, 2023

Following discussions at CanSAS-2023, @Kohlbrecher suggests SasView follow the lead of SASfit and implement its size distributions in terms of a Metalog Distribution:

https://en.wikipedia.org/wiki/Metalog_distribution
https://doi.org/10.1107/S1600576722009037, Section 5.1 & Fig 4

This very flexible distribution could overcome the limitations imposed on fit solutions by the shapes of singular (or composites of singular) distributions.

The Metalog Distribution is defined through its Quantile Probability Distribution (the inverse of the Cumulative Distribution Function).

Many 'conventional' distribution functions also have analytic quantile functions, for example:

https://en.wikipedia.org/wiki/Quantile_function#Simple_example
https://en.wikipedia.org/wiki/Log-normal_distribution, under 'Mode, Median, Quantiles'

It is also possible that the QPD approach could be more numerically stable because it is a proper integral (over 0 - 1) unlike the improper integrals (over 0 - inf) of traditional size distributions.

@lucas-wilkins
Copy link
Contributor

I think the quantile approach will be good for 1D polydispersity, but it gets weird, complicated or both when it comes to multiple dimensions.

@Kohlbrecher
Copy link

I am not sure why you think it gets more complicated using multiple parameters with a distribution. The parameter having a distribution is simply replaced by its QDF. If you have multiple independent parameters each will be replaced by a separate QDF and the multiple integral has to be calculated over a hypercube [0,1]^n

@pkienzle
Copy link
Contributor

pkienzle commented Feb 24, 2025

I played with integration using equally spaced quantile steps a long while back. I was expecting higher accuracy for the same number of evaluations since it is effectively importance sampling for ∫p(x)f(q;x) dx. Even on a normal distribution it couldn't compete with the linear integration points and I didn't pursue it further. [edit: I had written Gauss-Lobatto before, but that is only used for the orientation integrals.]

Looking at the simple example of a sphere, f(q;r) = (4/3 π r³ Δρ j₁(qr)/(qr))² = C ((qr)² j₁(qr))². Looking at the first term of the Taylor expansion this grows as (qr)⁶ so our polydispersity integral around a single q is q⁶∫ exp(-Δr²/2σ²) r⁶ dr. That is, the right tail will be too heavy for importance sampling to work well.

Try with q=1/10, r=100, σ=20 over ±3σ without the gaussian:

plot (r^2 j₁(r/10)/10)^2 from 40 to 160

Image

with quantile based spacing the right side of the function will be undersampled.

Here's the plot again, scaled by the gaussian:

plot exp(-(r-100)^2/(2*20^2)) (r^2 j₁(r/10)/10)^2 from 40 to 160

Image

@pkienzle
Copy link
Contributor

Note that we may want to go more than 3σ =160 on the high side:

plot log10(exp(-(r-100)^2/(2*20^2)) (r^2 j1(r/10)/10)^2) from 4 to 240

Image

@Kohlbrecher
Copy link

Kohlbrecher commented Feb 25, 2025

To compare the different integration strategies over a size distribution it might be useful to test against an analytical solution available. I could imagine that instead of a Gaussian distribution a gamma distribution of sphere might be useful in this case. I found a paper from Andre Heineman (https://doi.org/10.1107/S0021889800013248) who supplied an analytical expression for core shell particles with a gamma size distribution, whereas the gamma size distribution was parametrized in terms of a mean radius $R_{mean}$ and variance $\sigma^2$ (eq. 15). As the distribution seems to be already parametrized in similar terms than the integration algorithm in SASview this might be a useful test model. The gamma distribution (another name for it is Schulz-distribution, already available in SASview) might also be a nice test case as next to the mean value also the mode and the variance as well as higher moments of the distribution functions are analytically available.

As far as I understand, the SASview internal integration routine make use of algorithm on a fixed grid, like Gauss-Legendre or Gauss-Lobatto. Both algorithms rely on well-chosen grid points which are in the case of these two strategies related to the roots of Legendre polynomials. The boundaries of the integration interval are chosen manual via the GUI by $[R_{mean}-n \sigma;R_{mean}+n \sigma]$. It has been mentioned in the previous comment, that one must take care to avoid clipping effects due to the $r^6$ dependency of the form factor of spherical particles as the integral is dominated by larger particles. My first suggestion would be to test the loss of precision due to clipping the integration interval from $[0;\infty)$ to $[R_{mean}-n \sigma;R_{mean}+n \sigma]$.

Instead of clipping the semi-infinite interval $[0,+\infty)$ one often tries a variable transformation like $r = (1-t)/t$, which converts the semi-infinite interval into a semi-open interval (0,1]. However, by this those quadrature algorithms evaluating the function at the interval boundaries might not be suitable anymore, like Gauss-Legendre or Gauss-Lobatto, or the case t=0 needs to be handled as a special case or instead of $(0,1]$ one tries [np.finfo(np.float).eps;1].

In the past I thought this would be the solution for calculating size smeared form factors. However, test have shown that for broad distribution the transformation works well, but for very narrow distributions the integrand is zero almost everywhere in the interval (0;1] except in a very tiny range, which is typically difficult to catch with a low number of grid points. Therefore, I initially went back to the clipping strategy of the integration interval. As it was mentioned already one needs to take care, that due to the $r^6$ dependency of the form factor of spheres the interval clipping does not lead to truncation effects. SASview supplies distributions $w(r)$ of Gaussian, LogNorm, Schulz (which I think is identical to the gamma distribution mentioned above) and some more. All these distributions have in common, that their variance and mode or mean are known analytically. Furthermore, if one generate an new distribution out of them defined as $p(r,\alpha) = w(r)*r^\alpha / \int_0^\infty w(r)*r^\alpha dr$ also these distributions are analytically known as well as their mode, mean and variance. The integration boundaries can therefore be chosen by $[R_{mode,p}-n \sigma_p;R_{mode,p}+n \sigma_p]$, where $R_{mode,p}$ and $\sigma_p^2$ are the mode and variance of $p(r,\alpha)$ instead of $w(r)$, i.e. the integration looks like:
$$\int_{R_{mode,p}-n \sigma_p}^{R_{mode,p}+n \sigma_p} w(r) F^2(h,r) dr $$, where $h$ is the scattering vector.
The disadvantage here is, that one needs an information about the scaling ($\alpha$) of the form factor, i.e., if it scales with $r^{\alpha=6}$ for spheres or in case of cylinders with $r^{\alpha=4}$ or in case of planar systems with a thickness ($d$) distribution scaling with $d^{\alpha=2}$.

What the above-mentioned distributions additionally have in common is that there is an analytical available expression for their quantile distribution function $Q(y)=r$. I must admit having only successfully tested that by using them one does not need to know anything about the scaling behavior of the form factor with the parameter having a distribution and can handle both very broad as well as very narrow distributions without clipping issues.

Especially when mainly using adaptive integration routines to have control over the relative and absolute error of the size smearing, performance comparison tests with fixed grid algorithms need more caution.

Coming back to the initial suggested model for testing and comparing the different strategies, the above-mentioned case might be useful as the formula given by Andre Heinemann are analytical and can be calculated to arbitrary precision.

@Kohlbrecher
Copy link

Kohlbrecher commented Feb 25, 2025

You mentioned you "played with integration using equally spaced quantile steps". Actually, what I suggested was not to use the quantile distribution function for the generation of a grid. I suggested a full change of variables for the size distribution integral. Instead of integrating over the size parameter $\int_0^\infty w(r) F^2(h,r) dr$ the suggested integration was over the cumulant $dy$ by the substitution $r=Q(y)$ with $Q(y)$ being the quantile distribution of $w(r)$ so that the integral to be solved becomes $\int_0^1 F^2(h,Q(y)) dy$. Please, do not be confused that I used now $h$ for the scattering vector. I did it as in literature the character $Q()$ is used for the quantile distribution and $q()$ for the quantile density distribution. The new integral now can be treated with standard quadrature algorithms like Gauss-Legendre or adaptive algorithms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants