Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Phylogenetic Tree Visualization #58

Open
6 tasks
josiahseaman opened this issue Apr 7, 2020 · 5 comments
Open
6 tasks

Phylogenetic Tree Visualization #58

josiahseaman opened this issue Apr 7, 2020 · 5 comments
Labels
enhancement New feature or request

Comments

@josiahseaman
Copy link
Member

josiahseaman commented Apr 7, 2020

For: Kaytie Innamorati @innamoratika

Goal: We want to show related individuals and blocks of individuals for viral sequences. We should be equivalent or harness information on nextstrain.org.

  • Collab with #phylogeny to get list of individuals and sequences (provenance matters)
  • Get newick tree (with distances) and put it into JSON format (Toggle compressed view #16) v15
  • Visualize tree aligned with matrix rows on the left of Schematize browser window
    • Toggle checkbox for turning tree on or off
  • Poke nextstrain.org for collaboration
  • Augment pipeline to preserve phylo name matching provenance.
@josiahseaman josiahseaman added the enhancement New feature or request label Apr 7, 2020
@tpook92
Copy link

tpook92 commented Apr 7, 2020

@innamoratika
I used the HaploBlocker input to generate some phylogenetic trees by writting a same wrapper function for the R-package ape. Statistically not 100% sound but fine for some initial testing.

Here is how and unrooted tree for the 169 sars2 sequencences looks like:
phylo_total

And here when removing the 7 most extreme outliers:
phylo_162

@innamoratika
Copy link

@tpook92 Wonderful, thank you. I'll chat with the phylogeny folks in about 2 hours to discuss the genomes we want to include. Planning on using RaxML with some other programs, but will also use parts of the ape package to create a distance matrix and dendrogram.

@josiahseaman
Copy link
Member Author

We'll need to change the specification for v13 JSON format. graph-genome/component_segmentation#16 I think it's likely we'll only want one tree for the entire genome. We had discussed having a dendrogram for each gene or section of the genome. Haploblocker make a new row ordering per breakpoint (which is a recombination region (something the virus doesn't have)). @innamoratika do you see any reason that we might want more than one dendrogram for the whole pangenome? Would you ever want to do it per gene? Would that be useful to researchers?

@subwaystation
Copy link
Member

I think https://github.com/neherlab/pan-genome-visualization already has SNP and gene trees. I know Richar Neher is an elaborated expert when it comes to viruses so I expect doing it per gene makes sense. We might be able to learn something from that tool, too.

@tpook92
Copy link

tpook92 commented Apr 7, 2020

I would assume that on a single gene level most SARS2 variants are just 100% the same. To get any differantiation between the sequences i would assume that using as much information as possible (usually the whole genome) should be the way to go.

When working on a more diverse set (e.g. including SARS1 / application in other species) single gene trees should be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants