Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout error when using GO_analyse() #2

Open
AliciaHadingham opened this issue Sep 30, 2021 · 3 comments
Open

Timeout error when using GO_analyse() #2

AliciaHadingham opened this issue Sep 30, 2021 · 3 comments

Comments

@AliciaHadingham
Copy link

AliciaHadingham commented Sep 30, 2021

I'm running the GO_analyse() command on my expression set and I keep getting a timeout error.

Here is my command and the output and error message I receive:

> ES_results <- GO_analyse(eSet = ES, #ExpressionSet of the Biobase package 
+                          f = "clusters_hc_k4" #A column name in phenodata used as the grouping factor for the analysis.
+                          )
First feature identifier in dataset: ENSG00000000003
Looks like Ensembl gene identifier.
Loading detected dataset hsapiens_gene_ensembl ...
Object of class 'Mart':
  Using the ENSEMBL_MART_ENSEMBL BioMart database
  Using the hsapiens_gene_ensembl datasetFetching ensembl_gene_id/go_id mappings from BioMart ...
Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: [www.ensembl.org:80] Operation timed out after 300015 milliseconds with 7860965 bytes received

I then tried to add the biomart dataset and name to see if it would help but it made no difference:

> ES_results <- GO_analyse(eSet = ES, #ExpressionSet of the Biobase package 
+                          f = "clusters_hc_k4", #A column name in phenodata used as the grouping factor for the analysis.
+                          biomart_dataset = "hsapiens_gene_ensembl", #human
+                          biomart_name="ENSEMBL_MART_ENSEMBL", ntree=100 #db name to hopefully speed up running as it keeps timing out
+                          )
Using biomart dataset hsapiens_gene_ensembl
First feature identifier in dataset: ENSG00000000003
Did not recognise microarray data.
ENSG00000000003 feature identifier in expression data cannot be resolved to a microarray. Assuming Ensembl gene 
identifiers.
Loading requested dataset hsapiens_gene_ensembl ...
Object of class 'Mart':
  Using the ENSEMBL_MART_ENSEMBL BioMart database
  Using the hsapiens_gene_ensembl datasetFetching ensembl_gene_id/go_id mappings from BioMart ...
Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: [www.ensembl.org:443] Operation timed out after 300010 milliseconds with 7860965 bytes received

My ExpressionSet has dimensions 13485 features x 10 samples

Any suggestions on how I can get this function to run across my data?
The GOexpress package seems really good so I would really like to get it to work on my data!

@kevinrue
Copy link
Owner

Argh - I recognise the issue (not specific to GOexpress) but haven't used GOexpress or biomaRt myself lately, so I can't offer an immediate workaround.

Oh yes, actually, if you're a minimum familiar with the package biomaRt package, you can download the relevant gene annotation tables yourself, and pass them to GOexpress as arguments GO_genes, all_GO, and all_genes. See the vignette.

I've been meaning to deprecate the automated fetching of annotation in GOexpress for a long time now anyway, as this is unnecessarily painful and slightly more open to irreproducible behaviour than users providing their own annotations or using annotation packages or the AnnotationHub. I'm just desperately short on time to commit to this :/

@kevinrue
Copy link
Owner

I realise that with all that I haven't given an explanation: the biomaRt package has been timing out regularly in recent years, I believe due to the increasing size of the Ensembl BioMart database.
This is something that is not specific to GOexpress, but also affects users of the biomaRt package itself.
In interactive sessions, users can fetch data in a more controlled "paginated" way, i.e. in a few smaller batches rather than one big batch. I haven't personally revisited a way to do that reliably within a package. As I said, I am keener to use annotation packages or educate users to fetch annotations themselves. Both of those options having the advantage of being more reliably reproducible, as annotation packages can be tracked by version and users can also locally store copies of the annotation that they supplied to GOexpress, for the record and future reference.

@AliciaHadingham
Copy link
Author

Thanks for your help :) I will have a go at making local annotations then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants