Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about using flashweave with sequencing data of multiple organisms #42

Open
AnnaClo opened this issue Nov 27, 2024 · 5 comments

Comments

@AnnaClo
Copy link

AnnaClo commented Nov 27, 2024

Hi,
I need to construct a network using sequncing data for multiple organisms, e.g. bacteria, fungi, protists, etc., each obtained from the sequencing of a different amplicon.
I understand that I could imput this data combined in one table to flashweave. I'm doubting if and how flashweave will handle the normalization for differing sequencing depth. Since each organism comes from the sequencing of different amplicons, it should be normalized independently from the other organisms.
If I would do the normalization out of flashweave, I would do clr transformation for bacteria, fungi and protists independently.
What is your advice? How will the flashweave algorithms deal with this kind of data? Is this data suitable for flashweave? Should I apply clr transformations before using flashweave?

Thank you in advance

@jtackm
Copy link
Member

jtackm commented Dec 2, 2024

Hi Anna,

Yes, FlashWeave supports providing several tables to be normalized independently (inspired by exactly the use case you mentioned). It's unfortunately poorly documented, but can be used like this: learn_network([<bac_data_path>, <fungi_data_path>], meta_data_path; <kwargs...>). Please let me know if this works for you.

@AnnaClo
Copy link
Author

AnnaClo commented Jan 23, 2025

Hi! I used Flashweave for multiple organisms as you indicated and it worked wonderfully.
I still have one aspect that I want to ask about. I would like to include also some non-sequencing variables as nodes in the network (e.g. pH, organic matter, bacterial biomass, ..). I see that it is possible to add a 'features' table, but I don't fully understand, how would Flashweave handle those variables? Ideally, I would need to apply log (or log+1) transformation to non-sequencing variables, rather than CLR with adaptive pseudocount. Is that possible in any way in Flashweave? Alternatively, is there any function for performing the sequencing data normalization ouside learn_network? Then I would provide the tables normalized differntly to learn_network and run it with normalize=false.

@jtackm
Copy link
Member

jtackm commented Jan 31, 2025

Hi Anna,

Metadata only undergoes the most basic preprocessing, separate from OTU normalization: discretization on the raw values if FlashWeave is run with sensitive=false and special treatment of metadata zeros (via pseudocounts) with heterogeneous=true. Beyond that, you should pre-normalize your metadata if you have special requirements and FlashWeave will then proceed make these values compatible with the tests being used internally. Hope that helps!

@AnnaClo
Copy link
Author

AnnaClo commented Feb 3, 2025

Hi, thank you! So if I understand well, I would input the ASV tavbles as normal (raw, which will undergo CLR normalization within flashweave) and the non-sequencing variables (which I pre-normalize outside flashweave) as metadata. In that case, both ASVs and metadata variables will be considered as nodes of the network, is that correct ?

These are my network parameters:
alpha = 0.01,
fdr = true,
sensitive = true,
feed_forward = true,
max_k = 2,
max_tests = 10000000,
conv = 0.01,
make_sparse = true,
n_obs_min = 2

As a side note, I have a doubt about make_sparse. When I run this network with make_sparse=true, in some cases the verbose output shows me sparse = false:
Run information:
sensitive - true
heterogeneous - false
max_k - 2
alpha - 0.01
sparse - false
workers - 1
OTUs - 20392
MVs - 0

I also get this warning :
-> multiple data sets provided, using separate normalization mode
┌ Warning: Adaptive CLR is inefficient with sparse data, using dense format for normalization
└ @ FlashWeave ~/.julia/packages/FlashWeave/j91Ng/src/preprocessing.jl:542

Why this?

@jtackm
Copy link
Member

jtackm commented Feb 17, 2025

Yes, ASVs and meta variables will all be nodes in the network. Regarding sparse: for your combination of flags, FlashWeave uses an adaptive clr normalization scheme which replaces 0s with adaptive pseudocounts. Hence, the table is no longer dominated by 0s and sparsity is turned off for efficiency. The warning you posted tries to convey that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants