Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent results from phyloP --branch and --subtree #60

Open
heavywatal opened this issue Dec 21, 2022 · 0 comments
Open

inconsistent results from phyloP --branch and --subtree #60

heavywatal opened this issue Dec 21, 2022 · 0 comments

Comments

@heavywatal
Copy link
Contributor

According to phyloP -h, and if I understand it correctly, --branch human and --subtree human should output the same results:

"--subtree human" will create one partition
consisting only of human and the branch leading to it and another
partition consisting of the rest of the tree.

But sometimes the results differ. Here are the minimal code and data set to reproduce the problem:
attachment.tar.gz (~16KB)

phyloP --method LRT --features features.bed --branch  hg38 hg38.phyloP20way.mod chrY.maf >hg38_branch.out
phyloP --method LRT --features features.bed --subtree hg38 hg38.phyloP20way.mod chrY.maf >hg38_subtree.out
phyloP --method LRT --features features.bed --branch  mm10 hg38.phyloP20way.mod chrY.maf >mm10_branch.out
phyloP --method LRT --features features.bed --subtree mm10 hg38.phyloP20way.mod chrY.maf >mm10_subtree.out
--- hg38_branch.out     2022-12-21 22:04:05.230447348 +0900
+++ hg38_subtree.out    2022-12-21 22:04:05.293524261 +0900
@@ -3,6 +3,6 @@
 chrY   7092767 7093122 bed.2   0.88850 0.88849 1.00000 0.00000 1.00000
 chrY   7106321 7106776 bed.3   0.87610 0.91348 0.00000 4.05759 0.00219
 chrY   7168887 7169338 bed.4   1.09471 1.11324 0.36497 1.25706 0.05642
-chrY   7169476 7184158 bed.5   1.27310 1.31838 0.94608 0.00513 0.45967
+chrY   7169476 7184158 bed.5   1.00000 1.00000 1.00000 0.00000 1.00000
 chrY   7185905 7186578 bed.6   0.95396 0.95396 1.00000 0.00000 0.50000
 chrY   7188655 7188967 bed.7   1.41867 1.45536 0.00000 4.45398 0.00142
--- mm10_branch.out     2022-12-21 22:04:05.695226790 +0900
+++ mm10_subtree.out    2022-12-21 22:04:05.729244354 +0900
@@ -1,8 +1,8 @@
 #chr   start   end     name    null_scale      alt_scale       alt_subscale    lnlratio        pval
 chrY   7087262 7087515 bed.1   0.66535 0.81125 0.47486 5.19760 0.00063
 chrY   7092767 7093122 bed.2   0.88850 0.95953 0.77027 1.30603 0.05303
-chrY   7106321 7106776 bed.3   0.87610 0.87610 0.90000 0.00000 1.00000
-chrY   7168887 7169338 bed.4   1.09471 1.09470 0.90000 0.00000 1.00000
-chrY   7169476 7184158 bed.5   1.27310 1.26506 0.90062 0.00486 0.46074
-chrY   7185905 7186578 bed.6   0.95396 0.95396 0.90000 0.00000 1.00000
-chrY   7188655 7188967 bed.7   1.41867 1.41867 0.90001 0.00000 1.00000
+chrY   7106321 7106776 bed.3   1.00000 1.00000 1.00000 0.00000 1.00000
+chrY   7168887 7169338 bed.4   1.00000 1.00000 1.00000 0.00000 1.00000
+chrY   7169476 7184158 bed.5   1.00000 1.00000 1.00000 0.00000 1.00000
+chrY   7185905 7186578 bed.6   1.00000 1.00000 1.00000 0.00000 1.00000
+chrY   7188655 7188967 bed.7   1.00000 1.00000 1.00000 0.00000 1.00000

The differences seem to come up from the blocks with only two sequences (7169476 has only hg38 and panTro4) or without target sequences (7106321 lacks mm10). The output values should be blank/NA on such blocks. So 1.00000 by --subtree makes sense, and I guess --branch is doing something wrong.

The original data were downloaded from UCSC:
https://hgdownload.soe.ucsc.edu/goldenPath/hg38/phyloP20way/hg38.phyloP20way.mod
https://hgdownload.soe.ucsc.edu/goldenPath/hg38/multiz20way/maf/chrY.maf.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant