Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_sbc won't sample from inferred Direct Posterior #1372

Closed
paarth-dudani opened this issue Jan 20, 2025 · 12 comments · Fixed by #1490
Closed

run_sbc won't sample from inferred Direct Posterior #1372

paarth-dudani opened this issue Jan 20, 2025 · 12 comments · Fixed by #1490
Assignees
Labels
bug Something isn't working hackathon urgent

Comments

@paarth-dudani
Copy link

Describe the bug
I am trying to establish an SBC pipeline. I inferred a few independent SNPE estimates using concurrent futures. However, the run_sbc sampler won't sample from my inferred Direct Posterior.

To Reproduce

  1. Versions
    Python version: 3.9.13
    SBI version: 0.23.1

  2. Minimal code:

num_sbc_samples = 200  # choose a number of sbc runs, should be ~100s

t = np.linspace(0,4,n)
input_array = t
initial_condition = 2.0

# 2D SBC
thetas = unifrom_beta_sigma.sample((num_sbc_samples,))
xs = simulator_2D(thetas, num_sbc_samples, t, initial_condition)

# results[0] is one of the 3 direct posteriors
num_posterior_samples = 1_000
num_workers = 1
ranks, dap_samples = run_sbc(
    thetas, xs, results[0], num_posterior_samples=num_posterior_samples, num_workers=num_workers

Code for context:

if __name__ == "__main__":
  with concurrent.futures.ProcessPoolExecutor(max_workers = 3) as executor:
    futures = [executor.submit(SNE_training_basic_multi_round, neural_net, proposal, simulator, observed_data, learning_rate, num_rounds, num_sim, initial_condition, hidden_features) for i in array_indices]

results = [futures[i].result() for i in range(len(futures))]
  1. Error message
    No error message but the sampler won't sample:

Expected behavior
Fast sampling

@paarth-dudani paarth-dudani added the bug Something isn't working label Jan 20, 2025
@janfb
Copy link
Contributor

janfb commented Jan 20, 2025

Hi @paarth-dudani thanks for reporting this.

The run_sbc does nothing to the posterior than just sampling from the posterior for the sbc ranking. Two questions:

  1. Have you checked that the posterior you obtained from your parallelization routine is "healthy", i.e., it's fast to sample / has no leakage etc?
  2. Is the simulator_2d simulator the same one you used for training SNPE?

Cheers,
Jan

@paarth-dudani
Copy link
Author

Hi @janfb

  1. The posterior is indeed quick to sample from. I don't exactly know how to check for leakage. If you are referring to the warning given in the following link on the sbi website: FAQ, then I get no such warnings while training. The training is smooth.
  2. Yes, it is the same simulator that I used for both training and the SBC procedure.

I get this for other parallel estimates as well. What should I do?

@janfb
Copy link
Contributor

janfb commented Jan 23, 2025

OK, good to know that posterior itself is fast to sample.

Then it's indeed something within the run_sbc method. Can you please try it with setting use_batched_sampling=False (True by default). I can imagen that because you have 2D data and are using an embedding network, passing the entire batch of 200 xs at once to the density estimator might be too much.

Does this help?

@paarth-dudani
Copy link
Author

No, unfortunately it does not help.

Just as a clarification. My data is still 1D, it's just that I am trying to infer the standard deviation of the noise as well. This is the reason why my prior is 2D although my data is still 1D.

As a result, I have also not used an embedding network in this example.

As you can see, it takes too long and I don't see any output for minutes.

Image

@patriglesias
Copy link

I am experiencing the same problem with the new version: No error message but the sampler won't sample

@janfb
Copy link
Contributor

janfb commented Jan 28, 2025

OK, interesting! Can one of you please share a minimal code example? Or at least some details on the use-case, the sbi method, the hyperparameters etc?

@michaeldeistler
Copy link
Contributor

I have also just heard from another user that is experiencing this issue: slow sampling, but no warning.

@rdgao
Copy link
Contributor

rdgao commented Feb 14, 2025

this was somebody else's issue that I tried to help with but will try to describe it here:

  • NPE with MAF, 50 parameters (tried NSF as well)
  • tried various dimensionality for summary statistics (1 up to ~400)

they report that at some dimensionality of parameter space (around 35), this problem arises where NPE trains, but is very slow to sample and no warnings. It does however produce samples, and is faster on higher end GPU, but still on the order of 10s per sample.

manually the sampling the density estimator via:

net = posterior.density_estimator
net.sample(...)

samples much faster, basically "normal" speed. But we did not check extensively the quality of the samples, i.e., if some subset of the 50 parameters are often out of prior bounds.

Not sure if its the same problem as the OP, since they report that sampling from the posterior is normal but SBC is slow, whereas here posterior sampling is slow.

@dgedon dgedon self-assigned this Mar 18, 2025
@dgedon
Copy link
Collaborator

dgedon commented Mar 19, 2025

Trying to reproduce this. Batched sampling is "stuck" without warning for some seeds with the following code. So this problem is not unique to run_sbc or the method within get_posterior_samples_on_batch.

import torch
from sbi.inference import NPE
from sbi.utils import BoxUniform

# Set the seed for reproducibility
torch.manual_seed(41)

if __name__ == "__main__":
    # define shifted Gaussian simulator.
    def simulator(theta): return theta + 0.1 * torch.rand_like(theta)

    dim = 2
    num_simulations = 10
    num_posterior_samples = 100

    # draw parameters from Gaussian prior.
    prior = BoxUniform(low=-1.0 * torch.ones(dim), high=1.0 * torch.ones(dim))
    theta = prior.sample((num_simulations,))
    # simulate data
    x = simulator(theta)

    # choose sbi method and train
    inference = NPE()
    inference.append_simulations(theta, x).train()

    # do batched inference on multiple observations
    num_observations = 100
    x_o = 50 + torch.randn(num_observations, dim)

    # posterior samples
    posterior = inference.build_posterior(
        prior=prior,
        sample_with="direct", 
    )
    posterior_samples = posterior.sample_batched(
        (num_posterior_samples,),
        x=x_o,

    )
    print(posterior_samples.shape)

Issue seems to lie in the rejection sampling, specifically here. If not at least 1000 samples are already generated, then the warning (after the if-condition) is never triggered. So if the acceptance rate is 0% constantly, the warning will never be triggered.

if (
    num_sampled_total.min().item() > 1000
    and min_acceptance_rate < warn_acceptance
    and not leakage_warning_raised
    ):

I would suggest to replace the first line in the if-condition with either of the following:

  1. complete removal. Then we always trigger this in the first round and maybe don't have a good estimate of the acceptance rate yet
  2. number of samples which we have drawn so far. This way we ensure that we could get a good enough estimate
  3. checking for acceptance rate equal to zero (or zero samples drawn), then trigger as well.

I think 2. is the best option. @janfb @michaeldeistler, opinions?

@janfb
Copy link
Contributor

janfb commented Mar 19, 2025

good catch!

yes, option 2 sounds good. and does help in your test case above?

@dgedon
Copy link
Collaborator

dgedon commented Mar 19, 2025

Yes it does trigger the warning in the test case above.

PR created in #1490

@janfb janfb linked a pull request Mar 19, 2025 that will close this issue
@janfb
Copy link
Contributor

janfb commented Mar 19, 2025

@paarth-dudani and @patriglesias this should be fixed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working hackathon urgent
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants