You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Sankey plot visualization has difficulties when taxonomic ranks are missing. This can be solved to some extent by introducing "unclassified" ranks based on the parent rank. However, this is currently not working properly for all levels. For example, using the assembly.fasta test file:
As illustrated, the problem is that the subfamily Guernseyvirinae does not have a family or an order rank; only a class _ Caudoviricetes_. Now, the current script introduces Unclassified Caudoviricetes to fill the order rank but then the family rank is still missing and the arrangement will be wrong (see first figure).
I think we can fix that by
a) introducing multiple "unclassified" (or better: "undefined" !) ranks
b) adding the rank level to the label (because we need unique labels)
For example, for Jerseyvirus we would then have in the Sankey:
The Sankey plot visualization has difficulties when taxonomic ranks are missing. This can be solved to some extent by introducing "unclassified" ranks based on the parent rank. However, this is currently not working properly for all levels. For example, using the
assembly.fasta
test file:https://github.com/EBI-Metagenomics/emg-viral-pipeline/blob/master/nextflow/test/assembly.fasta
produces such a sankey:
but correct is:
As illustrated, the problem is that the subfamily Guernseyvirinae does not have a family or an order rank; only a class _ Caudoviricetes_. Now, the current script introduces Unclassified Caudoviricetes to fill the order rank but then the family rank is still missing and the arrangement will be wrong (see first figure).
I think we can fix that by
a) introducing multiple "unclassified" (or better: "undefined" !) ranks
b) adding the rank level to the label (because we need unique labels)
For example, for Jerseyvirus we would then have in the Sankey:
Caudoviricetes --> Undefined Caudoviricetes (Order) --> Undefined Caudoviricetes (Family) --> Guernseyvirinae --> Jerseyvirus
The text was updated successfully, but these errors were encountered: