Returning the full ARG with ts.simplify() #3215
Replies: 14 comments
-
This isn't very easy right now all right - @hyanwong I think we have some hacks for this in sc2ts? Something like, identify the recombinants as the nodes that have greater than 1 parent, and then mark feed them in as samples or something? |
Beta Was this translation helpful? Give feedback.
-
Yes, I think it would have to be a hack. You could either mark them as samples (and then unmark them afterwards, or maybe even use The advantage to the "mark samples" method is that it is reasonably clean. The advantage to the I'm not aware of any hacks in sc2ts that keep these in (apart from here, where we don't reset the sample flags). Mostly we just use Here's some (semi-untested) code for the two methods. The new "filter_nodes=False" option to simplify is very handy to check that the plots are sensible. import msprime
import numpy as np
arg = msprime.sim_ancestry(2, sequence_length=1e3, recombination_rate=0.001, record_full_arg=True)
re_nodes = np.where(arg.nodes_flags & msprime.NODE_IS_RE_EVENT)[0]
style = "".join(f".n{u} > .sym {{fill: red}}" for u in re_nodes)
arg.draw_svg(style=style) # "sample" method
s1_arg = arg.simplify(np.concatenate((arg.samples(), re_nodes)), update_sample_flags=False, filter_nodes=False)
s1_arg.draw_svg(style=style) # "individual" method
tables = arg.dump_tables()
individual_arr = tables.nodes.individual
for u in re_nodes:
individual_arr[u] = tables.individuals.add_row()
tables.nodes.individual = individual_arr
tables.simplify(keep_unary_in_individuals=True, filter_nodes=False)
tables.nodes.individual = arg.nodes_individual # set the individuals back to the original
s2_arg = tables.tree_sequence()
s2_arg.draw_svg(style=style) |
Beta Was this translation helpful? Give feedback.
-
A |
Beta Was this translation helpful? Give feedback.
-
Thanks, @hyanwong! We were doing something similar to the "samples" method, except SLiM doesn't automatically flag the recombination nodes, so we identified nodes with multiple parents, which raised two questions
`def ts_to_ARG(ts):
|
Beta Was this translation helpful? Give feedback.
-
There isn't. But it should be possible from the edge table arrays, more-or-less in a single pass right? You could do an
I guess you want the node itself |
Beta Was this translation helpful? Give feedback.
-
Ah, I was wrong. It depends how you represent the recombination event. In msprime, we create 2 nodes per recombination event (because it helps us to calculate the likelihood under the Hudson coalescent, see tskit-dev/msprime#1942). Quoting from there:
In this case you might want to keep the parents. In the SLiM case, my guess is that you have one recombination node per event, so you want the children. This is all a bit messy! |
Beta Was this translation helpful? Give feedback.
-
@kitchensjn : I think this finds the nodes with multiple parents, doesn't it? Could you check my logic, and if it's correct, I can add it as a Q&A to the discussions forum. uniq_child_parent = np.unique(np.column_stack((ts.edges_child, ts.edges_parent)), axis=0)
nd, count = np.unique(uniq_child_parent[:, 0], return_counts=True)
multiple_parents = nd[count > 1]
print(f"Nodes with multiple parents are {multiple_parents}") |
Beta Was this translation helpful? Give feedback.
-
Yup, this will return all of the nodes with more than one parent. Then checking that it identifies the parents with recombination flags should be something like:
|
Beta Was this translation helpful? Give feedback.
-
Yes, although the recombination nodes created by |
Beta Was this translation helpful? Give feedback.
-
Just to clarify, so please correct me if I've misunderstood: My code should work for tree sequences with the 2-RE-node encoding (msprime) as it returns the parents of the nodes with multiple parents. It is equivalent to The final two lines of my code would not be needed for a tree sequence that uses a 1-RE-node encoding (SLiM, most likely). For those tree sequences, the array |
Beta Was this translation helpful? Give feedback.
-
There may not be a good reason for doing things this way in msprime with the two re nodes, now that we can keep unary nodes more flexibly. @GertjanBisschop can you comment? It would be good to make a decision here regarding how we record recombs before we release the new additional nodes API (This is an msprime issue though - can someone open an issue on msprime to discuss potentially changing how we record re nodes for the new additional nodes API please?) |
Beta Was this translation helpful? Give feedback.
-
No you are right - I didn't read your code fully, sorry! |
Beta Was this translation helpful? Give feedback.
-
I opened an issue. As @hyanwong already mentioned, the 2-nodes vs 1-node recombination event encoding has been discussed before. Not entirely sure yet how the more flexible node recording would help resolve why we stuck with the 2-node encoding. |
Beta Was this translation helpful? Give feedback.
-
Just to note in passing that a large number of the ARG nodes that are not in a tree sequence are not recombination nodes, but common-ancestor-non-coalescent nodes. You would probably want to keep these too. I think a flexible thing would be to be able to pass a bit array of flags to |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Is there a method when simplifying a tree sequence to remove all unary nodes except the recombination nodes (a middle ground between ts.simplify(keep_unary=False) versus ts.simplify(keep_unary=True))? We are working with the tree sequence output from a SLiM simulation with
initializeTreeSeq(retainCoalescentOnly=F)
, which contains lots of unary nodes, and we want to simplify it down to just the nodes that affect the ARG structure, the full ARG. As the output tree sequence from SLiM does not have marked recombination nodes, these would first need to be identified before simplifying. Copying @pderaje as he is working on this with me.(See MesserLab/SLiM#376 for the initial post before determining it was better suited here.)
Beta Was this translation helpful? Give feedback.
All reactions