Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datatype of feature.tab not matrix after normalization #45

Open
pkirti33 opened this issue Apr 23, 2024 · 10 comments
Open

Datatype of feature.tab not matrix after normalization #45

pkirti33 opened this issue Apr 23, 2024 · 10 comments
Labels
Bug Fixed This bug has been addressed and resolved in the latest update. bug Something isn't working

Comments

@pkirti33
Copy link

pkirti33 commented Apr 23, 2024

Describe the Bug
When I use mStat_normalize_data() with the Rarefy-TSS method, the mStat_validate_data() function no longer passes because it doesn't recognize feature.tab as a matrix (Rule 5). When I don't use mStat_normalize_data(), all the tests pass.

Example

The following code fails at step 5 (Rule 5 failed: feature.tab should be a matrix.)

MicrobiomeData <- list(feature.tab = otu_table_matrix, 
                       meta.dat = metadata_df, 
                       feature.ann = taxonomy_matrix)

#normalize the data using rarefaction and total sum scaling
MicrobiomeData <- mStat_normalize_data(data.obj = MicrobiomeData, method = "Rarefy-TSS")
MicrobiomeData$data.obj.norm$feature.tab <- as.matrix(MicrobiomeData$data.obj.norm$feature.tab)

mStat_validate_data(MicrobiomeData)

However, the following code passes all validations. Furthermore, when I rarefy the data with mStat_rarefy_data(data.obj = MicrobiomeData) prior to validation, all validations pass.

MicrobiomeData <- list(feature.tab = otu_table_matrix, 
                       meta.dat = metadata_df, 
                       feature.ann = taxonomy_matrix)

#normalize the data using rarefaction and total sum scaling
#MicrobiomeData <- mStat_normalize_data(data.obj = MicrobiomeData, method = "Rarefy-TSS")
#MicrobiomeData$data.obj.norm$feature.tab <- as.matrix(MicrobiomeData$data.obj.norm$feature.tab)

mStat_validate_data(MicrobiomeData)

Environment Information:

  • R Version: 4.2.3
  • Package Version: 1.1.3
@pkirti33 pkirti33 added the bug Something isn't working label Apr 23, 2024
@cafferychen777
Copy link
Owner

Hi @pkirti33,

Thank you for bringing this issue to my attention. Indeed, it was a peculiar error where the feature.tab was not recognized as a matrix after applying the mStat_normalize_data() function with the Rarefy-TSS method. Although I couldn't pinpoint the exact cause of this anomaly, I've implemented a fix by adding a forceful conversion to matrix at the end of the normalization process.

I've already pushed the update to the GitHub repository. It should be available in a few hours. Please update the MicrobiomeStat package then, and let me know if the problem persists or if there's anything else I can help you with.

Best regards,
Chen YANG

@cafferychen777 cafferychen777 added the Bug Fixed This bug has been addressed and resolved in the latest update. label Apr 24, 2024
@pkirti33
Copy link
Author

pkirti33 commented May 9, 2024

Hello,
Thank you for your prompt reply and help! I tried re-running my code, but the issue has not resolved itself. My steps are below:

Detach and re-install MicrobiomeStat

detach("package:MicrobiomeStat", unload = TRUE)
devtools::install_github("cafferychen777/MicrobiomeStat")
library(MicrobiomeStat)

Make the microbiomeData object:

MicrobiomeData <- list(feature.tab = otu_table_matrix, 
                       meta.dat = metadata_df, 
                       feature.ann = taxonomy_matrix)
MicrobiomeData <- mStat_normalize_data(data.obj = MicrobiomeData, method = "Rarefy-TSS")
MicrobiomeData$data.obj.norm$feature.tab <- as.matrix(MicrobiomeData$data.obj.norm$feature.tab)
mStat_validate_data(MicrobiomeData)

The error is as follows:
Rule 1 passed: data.obj is a list.
Rule 2 passed: meta.dat has been converted to a data.frame.
Rule 3 passed: The row names of feature.tab match the row names of feature.ann.
Rule 4 passed: The order of rows in meta.dat has been adjusted to match feature.tab.
Error in mStat_validate_data(MicrobiomeData) :
Rule 5 failed: feature.tab should be a matrix.

@cafferychen777
Copy link
Owner

Hi pkirti33,

Thanks for following up and providing more details. I apologize that the issue is still not resolved. Based on the error message, it seems the root cause is that the feature.tab object is not being recognized as a matrix after the mStat_normalize_data() step, even when converting it explicitly using as.matrix().

One potential workaround is to skip the explicit normalization step. In the current version of MicrobiomeStat, almost all the functions perform "Rarefy-TSS" normalization by default under the hood. So you may be able to get the expected results without needing to call mStat_normalize_data() directly.

Try this simplified workflow and see if it resolves the validation error:

MicrobiomeData <- list(feature.tab = otu_table_matrix, 
                       meta.dat = metadata_df, 
                       feature.ann = taxonomy_matrix)

mStat_validate_data(MicrobiomeData)

If the issue persists, please let me know. I'll do some further testing on my end to identify the underlying problem with mStat_normalize_data() converting the data type. In the meantime, hopefully skipping that step provides a temporary solution.

Best regards,
Caffery

@pkirti33
Copy link
Author

Thank you for your help! I'll use your recommended solution for now.

@ctmlab4
Copy link

ctmlab4 commented May 30, 2024

Hi all,
I am new in MicrobiomeStat. I am having the same problem as @pkirti33.

"Error in mStat_validate_data(MicrobiomeData_rare) :
Rule 5 failed: feature.tab should be a matrix"

Is there any update or some alternative for Rarefy-TSS?

Thank you so much!
Carla.

@cafferychen777
Copy link
Owner

Hi @ctmlab4,

Thanks for reaching out regarding the issue you encountered with the mStat_validate_data() function after using mStat_normalize_data() with the "Rarefy-TSS" method.

As a workaround for now, you have two options:

  1. You can directly run other functions without any additional conversions.

  2. Alternatively, after running the mStat_normalize_data() function, you can convert the feature.tab element of the returned object to a matrix using as.matrix(). Here's an example:

MicrobiomeData_rare <- mStat_normalize_data(data.obj = MicrobiomeData, method = "Rarefy-TSS")
MicrobiomeData_rare$feature.tab <- as.matrix(MicrobiomeData_rare$feature.tab)
mStat_validate_data(MicrobiomeData_rare)

Either of these approaches should resolve the issue and allow the mStat_validate_data() function to pass all the validation rules.

We appreciate your patience and understanding. We are actively working on a more permanent solution to address this issue in a future update of the MicrobiomeStat package.

If you have any further questions or concerns, please don't hesitate to reach out.

Best regards,
Caffery

@bark9299
Copy link

bark9299 commented Jun 5, 2024

Hi @cafferychen777,

I believe I am having a similar problem as the others above. I turned my phyloseq object to a data.obj:
data.obj <- mStat_convert_phyloseq_to_data_obj(physeq_final_100k)

Then I wanted to use the 'mStat_rarefy_data' command to a read depth of 100,000:
rarefied_data<- mStat_rarefy_data(data.obj = data.obj, depth = 100000)

Then made my rarefied_data object a matrix which passed all the rules with 'mStat_validate_data(rarefied_data)':
rarefied_data$feature.tab <- as.matrix(rarefied_data$feature.tab)
mStat_validate_data(rarefied_data)

Then I wanted to use 'mStat_calculate_alpha_diversity':
alpha_rarefied <- mStat_calculate_alpha_diversity(x = rarefied_data, alpha.name = c("shannon", "simpson", "observed_species"))
But I get the following error: "Error in colSums(x) : 'x' must be an array of at least two dimensions"

So then I try:
alpha_rarefied <- mStat_calculate_alpha_diversity(x = rarefied_data$feature.tab, alpha.name = c("shannon", "simpson", "observed_species")) which looks like it runs properly, but when i run:
mStat_validate_data(alpha_rarefied) it throws an error:
"Rule 1 passed: data.obj is a list.
Rule 2 passed: meta.dat has been converted to a data.frame.
Rule 3 passed: The row names of feature.tab match the row names of feature.ann.
Rule 4 passed: The order of rows in meta.dat has been adjusted to match feature.tab.
Error in mStat_validate_data(alpha_rarefied) :
Rule 5 failed: feature.tab should be a matrix."

I also see this problem being addressed in #7, however reading that issue did not help me understand my issue.

When I try another normalization method like "TSS":
TSS_data <- mStat_normalize_data(data.obj = data.obj, method = "TSS")
And I try to make it a matrix:
**note: to access the "feature.tab" i have to first go through "$data.obj.norm" then "$feature.tab"
TSS_data$data.obj.norm$feature.tab <- as.matrix(TSS_data$data.obj.norm$feature.tab)
mStat_validate_data(TSS_data)
'mStat_validate_data(TSS_data)' throws an error:

"Rule 1 passed: data.obj is a list.
Rule 2 passed: meta.dat has been converted to a data.frame.
Rule 3 passed: The row names of feature.tab match the row names of feature.ann.
Rule 4 passed: The order of rows in meta.dat has been adjusted to match feature.tab.
Error in mStat_validate_data(TSS_data) :
Rule 5 failed: feature.tab should be a matrix."

How do I tweak my code to be able to use different normalization methods with mStat_calculate_alpha_diversity? Should I use one of the other alpha diversity commands? Thank you for your help.

MicrobiomeStat version 1.2.0
R version 4.3.2

@cafferychen777
Copy link
Owner

Hi @bark9299 @pkirti33 @ctmlab4 ,

I think I may have found the cause of the error. After normalizing the data using mStat_normalize_data(), you should use the $data.obj.norm element of the returned object instead of the original data.obj. For example:

norm.data.obj <- mStat_normalize_data(data.obj, "TSS")$data.obj.norm

Then, in subsequent function calls, use norm.data.obj instead of data.obj.

The reason for this is that during the normalization process, a new data.obj.norm (in the form of a list) is generated and stored within the original data.obj. Therefore, you need to replace the usage of the original data.obj with the newly generated data.obj.norm, rather than only using the new feature.tab.

So your workflow should look something like this:

data.obj <- mStat_convert_phyloseq_to_data_obj(physeq_final_100k)
norm.data.obj <- mStat_normalize_data(data.obj, "TSS")$data.obj.norm
mStat_validate_data(norm.data.obj)
alpha_diversity <- mStat_calculate_alpha_diversity(x = norm.data.obj$feature.tab, alpha.name = c("shannon", "simpson", "observed_species"))

By using norm.data.obj consistently after the normalization step, the mStat_validate_data() function should pass all validation rules, and the mStat_calculate_alpha_diversity() function should work as expected.

Please give this a try and let me know if it resolves the issues you were encountering. If you have any further questions or need additional assistance, don't hesitate to ask.

Best regards,
Caffery

@ctmlab4
Copy link

ctmlab4 commented Jun 11, 2024

Hi @bark9299 @pkirti33 @ctmlab4 ,

I think I may have found the cause of the error. After normalizing the data using mStat_normalize_data(), you should use the $data.obj.norm element of the returned object instead of the original data.obj. For example:

norm.data.obj <- mStat_normalize_data(data.obj, "TSS")$data.obj.norm

Then, in subsequent function calls, use norm.data.obj instead of data.obj.

The reason for this is that during the normalization process, a new data.obj.norm (in the form of a list) is generated and stored within the original data.obj. Therefore, you need to replace the usage of the original data.obj with the newly generated data.obj.norm, rather than only using the new feature.tab.

So your workflow should look something like this:

data.obj <- mStat_convert_phyloseq_to_data_obj(physeq_final_100k)
norm.data.obj <- mStat_normalize_data(data.obj, "TSS")$data.obj.norm
mStat_validate_data(norm.data.obj)
alpha_diversity <- mStat_calculate_alpha_diversity(x = norm.data.obj$feature.tab, alpha.name = c("shannon", "simpson", "observed_species"))

By using norm.data.obj consistently after the normalization step, the mStat_validate_data() function should pass all validation rules, and the mStat_calculate_alpha_diversity() function should work as expected.

Please give this a try and let me know if it resolves the issues you were encountering. If you have any further questions or need additional assistance, don't hesitate to ask.

Best regards, Caffery

Hi Caffery,

I tried it and I could do it without any problems! Thank you very much for your help!

Kind regards,
Carla.

@bark9299
Copy link

Hi @bark9299 @pkirti33 @ctmlab4 ,

I think I may have found the cause of the error. After normalizing the data using mStat_normalize_data(), you should use the $data.obj.norm element of the returned object instead of the original data.obj. For example:

norm.data.obj <- mStat_normalize_data(data.obj, "TSS")$data.obj.norm

Then, in subsequent function calls, use norm.data.obj instead of data.obj.

The reason for this is that during the normalization process, a new data.obj.norm (in the form of a list) is generated and stored within the original data.obj. Therefore, you need to replace the usage of the original data.obj with the newly generated data.obj.norm, rather than only using the new feature.tab.

So your workflow should look something like this:

data.obj <- mStat_convert_phyloseq_to_data_obj(physeq_final_100k)
norm.data.obj <- mStat_normalize_data(data.obj, "TSS")$data.obj.norm
mStat_validate_data(norm.data.obj)
alpha_diversity <- mStat_calculate_alpha_diversity(x = norm.data.obj$feature.tab, alpha.name = c("shannon", "simpson", "observed_species"))

By using norm.data.obj consistently after the normalization step, the mStat_validate_data() function should pass all validation rules, and the mStat_calculate_alpha_diversity() function should work as expected.

Please give this a try and let me know if it resolves the issues you were encountering. If you have any further questions or need additional assistance, don't hesitate to ask.

Best regards, Caffery

Hi @cafferychen777 ,

That worked for me as well. Thank you for your help and speedy reply!

Best,

E

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Fixed This bug has been addressed and resolved in the latest update. bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants