Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Python scripts to BYM12 case study #220

Open
mitzimorris opened this issue Sep 11, 2023 · 5 comments
Open

add Python scripts to BYM12 case study #220

mitzimorris opened this issue Sep 11, 2023 · 5 comments

Comments

@mitzimorris
Copy link
Member

in case study knitr/car-iar-poisson there are scripts for fit the BYM2 model using CmdStanR.
write a Jupyter notebook (preferably co-lab) that works with CmdStanPy and plotnine and all the spatial libraries you need.

see https://mc-stan.org/users/documentation/case-studies/radon_cmdstanpy_plotnine.html for examples of plotnine.
and https://github.com/mitzimorris/ljubljiana_lecture/blob/main/data_prep_spatial_maps.ipynb

@farhanreynaldo
Copy link

Hi,

I would like to work on this issue. As a start, can I first create a PR draft on Fitting the Model to the Scotland Lip Cancer Dataset and Bigger data: from 56 counties in Scotland to 1921 census tracts in New York City sections?

@mitzimorris
Copy link
Member Author

yes, please do!

@mitzimorris
Copy link
Member Author

I put a few comments in the notebook.

Also, why call this the BYM12 model? BYM2 is what it's been called in all the papers.

Computing the scaling factor in Python w/out using R-INLA is a challenge, but doable.

The idea here is that you need to get just the diagonal elements of the inverse of the precision matrix, which is constructed from the adjacency matrix, and then compute the geometric mean of that vector. This is in file https://github.com/stan-dev/example-models/blob/master/knitr/car-iar-poisson/fit_scotland_bym2.R

#Build the adjacency matrix using INLA library functions
adj.matrix = sparseMatrix(i=nbs$node1,j=nbs$node2,x=1,symmetric=TRUE)
#The ICAR precision matrix (note! This is singular)
Q=  Diagonal(nbs$N, rowSums(adj.matrix)) - adj.matrix
#Add a small jitter to the diagonal for numerical stability (optional but recommended)
Q_pert = Q + Diagonal(nbs$N) * max(diag(Q)) * sqrt(.Machine$double.eps)

# Compute the diagonal elements of the covariance matrix subject to the 
# constraint that the entries of the ICAR sum to zero.
#See the inla.qinv function help for further details.
Q_inv = inla.qinv(Q_pert, constr=list(A = matrix(1,1,nbs$N),e=0))

#Compute the geometric mean of the variances, which are on the diagonal of Q.inv
scaling_factor = exp(mean(log(diag(Q_inv))))

The challenge for a Python implementation is finding the equivalent of R-INLA's q-inv function - the inverse of a sparse precision matrix. Dan Simpson, in blogging on sparse matrices in JAX, presents his version of what INLA's doing:

https://dansblog.netlify.app/posts/2022-05-20-to-catch-a-derivative-first-youve-got-to-think-like-a-derivative/to-catch-a-derivative-first-youve-got-to-think-like-a-derivative#computing-the-partial-inverse

another possibility was suggested by ChatGPT:

Probabilistic or Approximate Methods: For large matrices, exact computation of the inverse's diagonal may not be necessary, depending on the application. Methods like stochastic estimation can give a good approximation with less computational effort. For example, you could use randomized algorithms which involve fewer matrix-vector multiplications.

Python Implementation with Approximation (Example)

Here’s a brief look at how you might implement a stochastic estimation method to approximate the diagonal of the inverse:

import numpy as np
import scipy.sparse as sp
import scipy.sparse.linalg as spla

def approximate_diagonal_inverse(A, num_samples=100):
    size = A.shape[0]
    diagonal_approx = np.zeros(size)
    for i in range(num_samples):
        v = np.random.randn(size)  # Random vector
        Av = A.dot(v)
        w = spla.spsolve(A, Av)
        diagonal_approx += w * v  # Element-wise multiplication
    return diagonal_approx / num_samples

# Example matrix
size = 2000
A = sp.random(size, size, density=0.01) + sp.eye(size) * size
A = A.tocsr()

# Approximate the diagonal of the inverse
diagonal_of_inv_approx = approximate_diagonal_inverse(A)
print("Approximate diagonal of the inverse:", diagonal_of_inv_approx)

This uses a randomized method to approximate the diagonal entries, significantly reducing the number of linear system solutions required.

It would be worth seeing if this method provides answers that are close enough to the R code to be useful.

@farhanreynaldo
Copy link

Sorry, there's a typo in the PR title, but I can't modify it in this issue title.

The ChatGPT implementation for the diagonal inverse doesn't yield a value close enough to the R code. So I decided to implement it on my own based on the line-by-line code of the following code chunk. The scaling factor is almost identical to the R code.

For the next step, I plan to improve the map visualization and fit the BYM2 model to the NYC data.

@farhanreynaldo
Copy link

I have modified the map visualization using Folium and fitted the BYM2 model to the NYC data. While working on this, I encountered several issues:

  1. In the Adding an ICAR component (spatial smoothing only) section, the Stan model requires a covariate x, but the script does not include this covariate.

  2. The range of injury counts for the NYC choropleth plot in the original notebook only goes up to 15, whereas the original data has a range up to 300. I was wondering if there is a truncation in the process of plotting the choropleth, but I couldn't find the code to produce all the visualizations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants