Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Download Data with Aspera #190

Open
roshankern opened this issue Dec 17, 2023 · 3 comments
Open

Unable to Download Data with Aspera #190

roshankern opened this issue Dec 17, 2023 · 3 comments

Comments

@roshankern
Copy link

Hello,

Thanks for making this repo and data public. I have been working in the Way Lab for the past year to develop a tool for streaming IDR image data downloading and processing (IDR_stream). This tool was originally developed in 2022, and thus uses Aspera high-speed transfer client to download data from IDR at high speeds.

I attempted to run the command below within IDR_stream to download a video:
sudo /home/roshankern/.aspera/ascli/sdk/ascp -TQ -l500m -P 33001 -i example_files/asperaweb_id_dsa.openssh [email protected]:20150916-mitocheck-analysis/mitocheck/LT0001_02--ex2005_11_16--sp2005_02_17--tt17--c3/hdf5/00049_01.ch5 ../tmp/downloads/LT0001_02

This command had worked for me in the past but this time I got the following output:
Session Stop (Error: Server aborted session: Permission denied)

The IDR download page and #189 both indicate that downloading IDR data with Aspera is no longer possible.

Thus, I have the following questions:

  • Is it still possible to download IDR data with Aspera?
  • If yes, what is the best way to download this data? Do I need to modify my command or redownload the Aspera public key (I couldn't find a new version online)?
  • If no, will FTP or IDR API be significantly slower for downloading image data? Is there any way to approach Aspera-like speeds with these other image downloading methods?

Thanks in advance!
Roshan

@sbesson
Copy link
Member

sbesson commented Dec 19, 2023

Hi @roshankern, thanks for opening this issue. You are correct the instructions has been updated recently and this conversation is a good opportunity to provide a bit more context.

EBI has been consolidating its data services and engaged with us in July 2023 on how this would affect the way data would be uploaded and downloaded for IDR. In the last few months, we have been working with them in the background to migrate the 500TB of IDR raw data and make them available through the Public Data Services - see here for more technical details. The old Aspera workflow using accession-based usernames has been decommissioned by EBI in December 2023 and is no longer functional.

  • Is it still possible to download IDR data with Aspera?

In short, yes. The IDR website has been updated to document the download workflow using anonymous FTP as we felt this was the easiest for most end-users. But EBI Public Data Services support multiple transfer protocols including FTP, Aspera & Globus.

  • If yes, what is the best way to download this data? Do I need to modify my command or redownload the Aspera public key (I couldn't find a new version online)?

The public key should be left unchanged but you will need two modifications to your command:

  • the user should be changed to fasp-public (independently of the study)
  • the path to the source should be prefixed with /pub/databases/IDR/<study>/ - see https://ftp.ebi.ac.uk/pub/databases/IDR/ for a full list of the available studies

I modified the command you pasted above as in

/home/data/.aspera/connect/bin/ascp -TQ -l500m -P 33001 -i  /home/data/.aspera/connect/etc/asperaweb_id_dsa.openssh [email protected]:/pub/databases/IDR/idr0013-neumann-mitocheck/20150916-mitocheck-analysis/mitocheck/LT0001_02--ex2005_11_16--sp2005_02_17--tt17--c3/hdf5/00049_01.ch5  LT0001_02/

and was able to download the HDF file on my end.

  • If no, will FTP or IDR API be significantly slower for downloading image data? Is there any way to approach Aspera-like speeds with these other image downloading methods?

The IDR API cannot be used for downloading the raw data associated with a submission. On FTP vs Aspera, my understanding is that latter should technically offer higher download speed but in practice, this could depend on many parameters including firewall, latency, network topology.
Since both FTP and Aspera protocols are now supported, I think you should be in a good position to benchmark both protocols and decide which one is most advantageous for your use case.

@dominikl
Copy link
Member

We should probably add Aspera and Globus to the download instructions too. Aspera is tricky, it doesn't work for me with latest version, but the aspera-client-docker image works. Globus access would be via Globus personal connect app and URL https://app.globus.org/file-manager?origin_id=47772002-3e5b-4fd3-b97c-18cee38d6df2&origin_path=%2Fpub%2Fdatabases%2FIDR%2F .

@mkcor
Copy link
Contributor

mkcor commented Apr 23, 2024

We should probably add Aspera and Globus to the download instructions too.

Yes! Sorry, I wrote #193 (comment) before noticing this thread.

Aspera is tricky, it doesn't work for me with latest version, but the aspera-client-docker image works.

Good to know; I'll try on my end. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants