Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importing files from user-defined AWS S3 bucket not working #18750

Open
Slushy-seg opened this issue Aug 30, 2024 · 7 comments
Open

Importing files from user-defined AWS S3 bucket not working #18750

Slushy-seg opened this issue Aug 30, 2024 · 7 comments
Assignees

Comments

@Slushy-seg
Copy link

I have set up a Galaxy to allow users to add their own remote storage locations (private and public AWS S3 buckets). For that I used the file_source_templates_config_file configuration and added the template for private & public AWS S3 buckets (from https://docs.galaxyproject.org/en/latest/admin/data.html#file-source-templates).

On Galaxy, the user preference to set up a (private or public) AWS bucket is shown correctly and the set up works as intended. Now, the user can select files from the bucket through "Upload" => "Choose remote files". When selecting a file and triggering the import, the import fails.

Galaxy Version: 24.1
(same behavior on usegalaxy.eu)

To Reproduce
Steps to reproduce the behavior:

  1. Go to user preferences and define a public S3 bucket under "Manage Your Remote File Sources"
    Screenshot 2024-08-30 085827
  2. Go to "Upload" and selected "Choose remote files".
  3. Select the previously defined remote storage location. Files on the bucket are shown
    Screenshot 2024-08-30 085852
  4. Select a file from the remote S3 bucket and import that file to the current history
  5. Import fails
    Screenshot 2024-08-30 085744_2
  6. Further log message shown:
    Screenshot 2024-08-30 085712

Expected behavior
Files imported from a user defined private S3 bucket are stored in the user's history.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context

  • Exporting files through "Send Data" => "Export datasets" tool does work as expected for private buckets set to "writeable".
  • Data import from public buckets defined in file_sources.yml does work as expected. Using a public bucket also defined in file_sources.yml (e.g. s3://1000genomes/) as a user defined bucket does not work with user templates.
@sanjaysrikakulam
Copy link
Contributor

sanjaysrikakulam commented Aug 30, 2024

I briefly looked at the upload of data from a public S3 bucket, and it works fine.

I added the same bucket, 1000genomes, to user preferences through Manage Your Remote File Sources.

image

image

The bucket name should be plain without the s3:// (I am not entirely sure whether this matters since you are able to browse the data; maybe it matters when it tries to fetch; I did not look into the implementation details)

Can you give this a try?

--EDIT--
Please ensure that these are set and defined in your galaxy.yml

  1. object_store_cache_path (a path for Galaxy to use for caching, this is optional, defaults to mutable_data dir, I think)
  2. object_store_cache_size (in Gb) (mandatory, default is -1)

Tracebacks from the handler logs would help debug your problems.

@Slushy-seg
Copy link
Author

Thanks a lot. Just specifying the bucket without s3:// does indeed work!! I think I was confused as Galaxy tries to validate the bucket name against the AWS syntax (an error I got with earlier tries),

This is quite a nice feature for my use-case. In my current set up, changes made to the S3 bucket (i.e. file deletions or creations) are not reflected in the Galaxy UI. Is there a way to force Galaxy to read in the file directory each time a user browses the bucket?

@bgruening
Copy link
Member

@Slushy-seg the files should be visible as soon as they are created. Can you look with a different viewer into your S3 and see if the files are really there?

@mvdbeek
Copy link
Member

mvdbeek commented Sep 3, 2024

You probably want to set a listings_expiry_time if you're using s3fs, that gets passed on to the underlying library which by default never expires (fsspec/s3fs#851). 60 seems to work well.

@Slushy-seg
Copy link
Author

listings_expiry_time is exactly what I need. Where can I set this parameter?

@Slushy-seg
Copy link
Author

Any idea were to set this parameter @mvdbeek ?

@sanjaysrikakulam
Copy link
Contributor

sanjaysrikakulam commented Sep 12, 2024

This would require a change in the s3fs file-source plugin likely in this class as a property that can be passed to the open_fs function, as a default value for everyone globally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants