Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recapitation attempts to use null genomes #357

Open
Hongjinwu opened this issue Dec 24, 2024 · 12 comments
Open

recapitation attempts to use null genomes #357

Hongjinwu opened this issue Dec 24, 2024 · 12 comments

Comments

@Hongjinwu
Copy link

Hongjinwu commented Dec 24, 2024

edit: Scroll down for the underlying issue.

the code:
import pyslim
import tskit
import msprime

orig_ts = tskit.load("test_true.trees")

demography = msprime.Demography()
demography.add_population(name="pop_0", initial_size=10000)
demography.add_population(name="p1", initial_size=10000)
demography.add_population(name="p2", initial_size=10000)
demography.add_population(name="p3", initial_size=10000)
demography.add_population(name="p4", initial_size=10000)

rts = pyslim.recapitate(orig_ts, ancestral_Ne=1e4)
rts.dump("true_recapitated.trees")

the issue:
Traceback (most recent call last):
File "/sim_data/sim28/recapitate.py", line 14, in
rts = pyslim.recapitate(orig_ts, ancestral_Ne=1e4)
File "/home/miniconda3/lib/python3.9/site-packages/pyslim/methods.py", line 77, in recapitate
raise ValueError(message)
ValueError: Not all roots of the provided tree sequence are at the time expected by recapitate(). This could happen if you've simplified in python before recapitating (fix: don't simplify first). If could also happen in other situations, e.g., you added new individuals without parents in SLiM during the course of the simulation with sim.addSubPop(), in which case you will probably need to recapitate with msprime.sim_ancestry(initial_state=ts, ...). (Expected root time: 27372; Observed root times: [0.0, 27372.0])

and I do not understand why the there is a population named "pop_0" in the true trees. but "pop_0" dose not exit in my slim simulation code.
1 first(){
sim.addSubpop("p1",50,haploid=T); //add population with size 50
sim.addSubpop("p2",50,haploid=T);
sim.addSubpop("p3",50,haploid=T);
sim.addSubpop("p4",50,haploid=T);
}
dose anyone know how to fix this?

@bhaller
Copy link
Collaborator

bhaller commented Dec 24, 2024

Hi! I think pop_0 is just a placeholder occupying slot 0 in the populations table. According to the present design, p1 needs to be in slot 1, p2 in slot 2, etc., which leaves slot 0 empty, so we put a placeholder there. pop_0 should contain no individuals/nodes, and can simply be ignored. You are including it in your demography, however, with an initial size of 10,000; just don't do that? If that does not solve the problem, then please provide the complete SLiM code for your simulation (and please make it a minimal script that still reproduces the problem, stripped of all unnecessary cruft). @petrelharp might have more to say on this. (Peter: I tried to find mention of this problem in the pyslim doc, and did not succeed; perhaps it needs to be made more prominent?)

@Hongjinwu
Copy link
Author

Thanks for the answer. The defined demography in the python script is not used in recapitate function (I forgot to comment them). Indeed the defined demography is for sim_ancestor() function and it will give another error like "infinite time for coalescent ......., balabala"
Another thing I need to say is that I didn't simplify the tree sequence before trying to recapitate

@bhaller
Copy link
Collaborator

bhaller commented Dec 24, 2024

OK. Please provide a complete, minimal SLiM script that can be used to reproduce this problem, and please also provide a correct and minimal Python script to reproduce the problem.

@Hongjinwu
Copy link
Author

Hongjinwu commented Dec 24, 2024 via email

@Hongjinwu
Copy link
Author

recapitate.py.zip
sweep_sim.slim.zip

please find the scripts in the attachments. Thanks.
And if I use msprime.sim_ancestry() to recaptate, it will give an error:
"msprime._msprime.LibraryError: Infinite waiting time until next simulation event."

@bhaller
Copy link
Collaborator

bhaller commented Dec 25, 2024

Thanks. It is Christmas here; I imagine @petrelharp will have a look at this some time after that. :->

@Hongjinwu
Copy link
Author

Hongjinwu commented Dec 25, 2024 via email

@petrelharp petrelharp changed the title pyslim.recapitate error recapitation attempts to use null genomes Jan 3, 2025
@petrelharp
Copy link
Contributor

Thanks for the report, @Hongjinwu - it turns out that recapitate is not dealing properly with haploids. This is a bug (I'm not yet sure how bad).

Here is a MWE (substantially pared down from your code):

initialize()
{
	initializeSLiMModelType("nonWF"); // non Wright Fisher model: useful to model haploids and HGT
	defineConstant("L", 1e5); // chromosome length
	defineConstant("Ne", 100);
	
	initializeTreeSeq();
	initializeMutationRate(6e-7);
	initializeRecombinationRate(0);
	initializeMutationType("m1", 1, "f", 0.0);
	initializeGenomicElementType("g1", m1, 1.0);
	initializeGenomicElement(g1, 0, L-1);
}


reproduction(NULL)
{
	nbOffspring = rpois(1, individual.tagF);
	for (i in seqLen(nbOffspring)) {
          subpop.addRecombinant(individual.genome1, NULL, NULL, NULL, NULL, NULL);
          // THIS WORKS:
          // subpop.addRecombinant(individual.genome1, NULL, NULL, individual.genome1, NULL, NULL);
    }
}

1 first(){
	sim.addSubpop("p1",5,haploid=T);
    inds = p1.individuals;
    inds.tagF = rep(1.0, p1.individualCount);
}
early(){
 	sim.recalculateFitness();

    inds = p1.individuals;
    inds[inds.age > 0].fitnessScaling = 0.0;
    inds = inds[inds.age == 0];
    fit = p1.cachedFitness(inds.index);

    inds.fitnessScaling = 1 / min(fit);
    inds.tagF = (Ne / size(inds)) * (1 / mean(fit)) * fit;
}

27 late() {
			sim.treeSeqOutput("test_true.trees");
			sim.simulationFinished();
}

and then

import pyslim
import tskit
import msprime

orig_ts = tskit.load("test_true.trees")

t = orig_ts.first()

root_times = set([orig_ts.node(r).time for r in t.roots])
print(f"root times: {root_times}")

non_null_root_times = set([orig_ts.node(r).time for r in t.roots if not orig_ts.node(r).metadata['is_null']])

print(f"non-null root times: {non_null_root_times}")
assert len(non_null_root_times) == 1

I'll follow up with a workaround for you.

@petrelharp
Copy link
Contributor

Okay - here's a workaround:

# remove null genomes
samples = [n for n in orig_ts.samples() if not orig_ts.node(n).metadata['is_null']]
simp_ts = orig_ts.simplify(samples, keep_input_roots=True)

rts = pyslim.recapitate(simp_ts, ancestral_Ne=1e1)

(Note I'm setting a small Ne just so it runs fast.)

@bhaller
Copy link
Collaborator

bhaller commented Jan 3, 2025

Interesting. Note that the next SLiM will not use null genomes in a pure haploid model; but will still use them for the haploids in, say, a model of haplodiploidy. So both way of recording haploids will be possible, depending on the model. Just injecting that into the conversation here, since it might influence the fix you decide to do.

@petrelharp
Copy link
Contributor

Well, thankfully we have that sanity check in; otherwise this would be quite a bad and silent error!

petrelharp added a commit to petrelharp/pyslim that referenced this issue Jan 3, 2025
@Hongjinwu
Copy link
Author

Thanks very much for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants