Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about select_variance() #29

Open
kleejin opened this issue Sep 6, 2023 · 1 comment
Open

Question about select_variance() #29

kleejin opened this issue Sep 6, 2023 · 1 comment

Comments

@kleejin
Copy link

kleejin commented Sep 6, 2023

Hi there. I have been exploring how Augur is filtering out genes and therefore looking more closely at the select_variance function. The reason I wound up on this track is because one of the cell types I was expecting to be rankly highly by Augur (based on previous DE gene analysis) was not performing well, and when I dug into the output of Augur, it looked like the most significant DE genes from previous analysis were getting filtered out using the default var_quantile 0.5 threshold. So I've been trying to understand why.

Upon looking more closely at the select_variance() function, it looks like in the comments you intend to calculate CV (sds / mean), but in the actual code, you are calculating a form of Z-score (mean / sds). If this is what you are meaning to calculate, can you explain a little more conceptually how this is filtering out the genes with the lowest variance? Using (sds / mean) makes more sense to me, but perhaps I am misunderstanding the goal of this step.

Thank you for maintaining this tool and thank you for your help!

@AlanTeoYueYang
Copy link
Collaborator

Hi @kleejin, thank you for your comment and sorry for the delay getting back to you. I looked into the issue and you are indeed correct that this was a bug. Having said that, the bug seems to have little to no effect on the relative rankings within the cell type prioritisation although it does increase the AUC globally, which is not surprising. The bug fix you suggested will be implemented in an upcoming release or if you want, we can push the fix to a branch or you can fix locally and reinstall with devtools.

We implemented the fix and applied it on a dataset (Kang et al 2018) mentioned in the paper (shown below)
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants