Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ensembldb and dplyr: filter, select #135

Open
Shicheng-Guo opened this issue May 6, 2022 · 7 comments
Open

ensembldb and dplyr: filter, select #135

Shicheng-Guo opened this issue May 6, 2022 · 7 comments

Comments

@Shicheng-Guo
Copy link

Dear ensembldb team,

Both ensembldb and dplyr are widely and frequently used in my analysis, there are several functions are conflicted between ensembldb and dplyr, for example, filter, select. Maybe it will be helpful to think about a solution how to handle them well.

error in evaluating the argument 'x' in selecting a method for function 'select': ensembldb::filter requires an 'EnsDb' object as input. To call the filter function from the stats or dplyr package use stats::filter and dplyr::filter instead.

Thanks.

Shicheng

@jorainer
Copy link
Owner

Any suggestions (or even better pull request) solving this would be highly appreciated. The problem for me is that both the stats and dplyr packages define filter as a function (and also overwrite each other). A solution using a S4Generic would be maybe cleaner. The only solution I found is the one mentioned by the error message above, i.e. to use the package prefix to use any of the other filter functions that overwrite each other upon package loading (i.e. use dplyr::filter or stats::filter or ensembldb::filter).

@cui-shuang
Copy link

Hello,could you say detailed this solution? Can I see your code?
"The only solution I found is the one mentioned by the error message above, i.e. to use the package prefix to use any of the other filter functions that overwrite each other upon package loading (i.e. use dplyr::filter or stats::filter or ensembldb::filter)."

@cui-shuang
Copy link

The code I run is as follows:
Signature <- FeatureSelect.V4(CellLines.matrix = NULL,
Heatmap = FALSE,
export = TRUE,
sigName = "MyReference00",
Stroma.matrix = RefData,
deltaBeta = 0.2,
FDR = 0.01,
MaxDMRs = 100,
Phenotype.stroma = RefPheno)

Then an error occurs,
“Error in filter(., adj.P.Val < FDR) :
ensembldb::filter requires an 'EnsDb' object as input. To call the filter function from the stats or dplyr package use stats::filter and dplyr::filter instead.‘

@jorainer
Copy link
Owner

The FeatureSelect.V4 function is not defined in ensembldb so I can not say how this error was generated or how to solve this. From which package is this function? What might help is to load ensembldb before this other package. That way the filter function of ensembldb will be overwritten.

Example:

library(ensembldb)
library(dplyr)

would cause that dplyr's filter function overwrites the ensembldb filter function. Just calling filter will then use the version of dplyr. If you want to use the filter function from ensembldb you would need to specifically call ensembldb::filter.

@cui-shuang
Copy link

Thank you very much for your reply and suggestion. Now the problem has been solved. The R package I used is MethylCIBERSORT. It should be that this package conflicts with other packages. I can run the code after uninstalling dplyr. Thank you again.

@mschubert
Copy link

mschubert commented Aug 14, 2024

Loading the libraries in the following order, it makes sense that ensembldb hides dplyr functions:

library(dplyr)
library(ensembldb)

A similar (and a bit less obvious) issue is when using an AnnotationHub record after library(dplyr):

# library(ensembldb) # uncomment to work around the error
library(dplyr)
ens106 = AnnotationHub::AnnotationHub()[["AH100643"]]

genes(ens106) |> as.data.frame() |> filter(gene_name == "BRAF") # or select(), rename()
# Error: ensembldb::filter requires an 'EnsDb' object as input.

Here, just using an EnsDb object will hide the dplyr functions. It would be great if this did not happen, but the workaround is the same as above (loading the ensembldb library first).

Any suggestions (or even better pull request) solving this would be highly appreciated. A solution using a S4Generic would be maybe cleaner

Maybe the Bioconductor folks would be able to provide one via BiocGenerics? With how widely dplyr is used nowadays, it would be great if this was working out of the box (and ensembldb is not the only Bioconductor package affected by this). This doesn't quite work, because rename is already a generic there and shows the same behavior.

@jorainer
Copy link
Owner

the issue is mainly that in Bioconductor (and ensembldb) filter is defined as a S4 generic while dblyr defines it as a function (or S3 generic - I can't remember). Thus, depending on the order the libraries are loaded, the previous function gets simply overwritten. One workaround would be to explicitly call the function e.g. ensembldb::filter and dplyr::filter instead of just filter. That way you can also be sure which one gets used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants