Question about using flashweave with sequencing data of multiple organisms #42

AnnaClo · 2024-11-27T22:03:05Z

Hi,
I need to construct a network using sequncing data for multiple organisms, e.g. bacteria, fungi, protists, etc., each obtained from the sequencing of a different amplicon.
I understand that I could imput this data combined in one table to flashweave. I'm doubting if and how flashweave will handle the normalization for differing sequencing depth. Since each organism comes from the sequencing of different amplicons, it should be normalized independently from the other organisms.
If I would do the normalization out of flashweave, I would do clr transformation for bacteria, fungi and protists independently.
What is your advice? How will the flashweave algorithms deal with this kind of data? Is this data suitable for flashweave? Should I apply clr transformations before using flashweave?

Thank you in advance

jtackm · 2024-12-02T09:56:35Z

Hi Anna,

Yes, FlashWeave supports providing several tables to be normalized independently (inspired by exactly the use case you mentioned). It's unfortunately poorly documented, but can be used like this: learn_network([<bac_data_path>, <fungi_data_path>], meta_data_path; <kwargs...>). Please let me know if this works for you.

AnnaClo · 2025-01-23T18:16:54Z

Hi! I used Flashweave for multiple organisms as you indicated and it worked wonderfully.
I still have one aspect that I want to ask about. I would like to include also some non-sequencing variables as nodes in the network (e.g. pH, organic matter, bacterial biomass, ..). I see that it is possible to add a 'features' table, but I don't fully understand, how would Flashweave handle those variables? Ideally, I would need to apply log (or log+1) transformation to non-sequencing variables, rather than CLR with adaptive pseudocount. Is that possible in any way in Flashweave? Alternatively, is there any function for performing the sequencing data normalization ouside learn_network? Then I would provide the tables normalized differntly to learn_network and run it with normalize=false.

jtackm · 2025-01-31T16:34:33Z

Hi Anna,

Metadata only undergoes the most basic preprocessing, separate from OTU normalization: discretization on the raw values if FlashWeave is run with sensitive=false and special treatment of metadata zeros (via pseudocounts) with heterogeneous=true. Beyond that, you should pre-normalize your metadata if you have special requirements and FlashWeave will then proceed make these values compatible with the tests being used internally. Hope that helps!

AnnaClo · 2025-02-03T11:31:12Z

Hi, thank you! So if I understand well, I would input the ASV tavbles as normal (raw, which will undergo CLR normalization within flashweave) and the non-sequencing variables (which I pre-normalize outside flashweave) as metadata. In that case, both ASVs and metadata variables will be considered as nodes of the network, is that correct ?

These are my network parameters:
alpha = 0.01,
fdr = true,
sensitive = true,
feed_forward = true,
max_k = 2,
max_tests = 10000000,
conv = 0.01,
make_sparse = true,
n_obs_min = 2

As a side note, I have a doubt about make_sparse. When I run this network with make_sparse=true, in some cases the verbose output shows me sparse = false:
Run information:
sensitive - true
heterogeneous - false
max_k - 2
alpha - 0.01
sparse - false
workers - 1
OTUs - 20392
MVs - 0

I also get this warning :
-> multiple data sets provided, using separate normalization mode
┌ Warning: Adaptive CLR is inefficient with sparse data, using dense format for normalization
└ @ FlashWeave ~/.julia/packages/FlashWeave/j91Ng/src/preprocessing.jl:542

Why this?

jtackm · 2025-02-17T09:25:25Z

Yes, ASVs and meta variables will all be nodes in the network. Regarding sparse: for your combination of flags, FlashWeave uses an adaptive clr normalization scheme which replaces 0s with adaptive pseudocounts. Hence, the table is no longer dominated by 0s and sparsity is turned off for efficiency. The warning you posted tries to convey that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about using flashweave with sequencing data of multiple organisms #42

Question about using flashweave with sequencing data of multiple organisms #42

AnnaClo commented Nov 27, 2024

jtackm commented Dec 2, 2024

AnnaClo commented Jan 23, 2025

jtackm commented Jan 31, 2025

AnnaClo commented Feb 3, 2025

jtackm commented Feb 17, 2025

Question about using flashweave with sequencing data of multiple organisms #42

Question about using flashweave with sequencing data of multiple organisms #42

Comments

AnnaClo commented Nov 27, 2024

jtackm commented Dec 2, 2024

AnnaClo commented Jan 23, 2025

jtackm commented Jan 31, 2025

AnnaClo commented Feb 3, 2025

jtackm commented Feb 17, 2025