-
Notifications
You must be signed in to change notification settings - Fork 621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workflow cannot access public reference data S3 bucket in a different region #1596
Comments
@dtenenba - if you omit the region from the config file the underlying |
I just tried that and I am still getting the same error. I also commented out the region in my |
Are you running from a local workstation or on an EC2 instance? If the latter, is your account part of an organization, and is there an SCP applied to the account? |
@wleepang Local workstation. |
The region is set when the client is created. @wleepang Does the AWS client can only access buckets for that region? |
I'm not sure if this translates to the Java SDK. I whipped up a quick test with the Python SDK, and explicitly setting the region on the client does not affect accessibility to the bucket - was able to list and get. |
I had the same results with a python test -- no problem accessing a bucket in eu-west-1 with a client created with us-west-2, and vice versa. Not sure if the Java SDK behaves differently or if it's an implementation issue. I do notice that the error message ( |
I wonder if this (explicitly setting the api endpoint) could be the root cause: although, again - this isn't an issue with the Python SDK |
@dtenenba - ran a more elaborate test today: Installed nextflow on an EC2 instance in us-east-2 with a "restricted" instance profile (very minimal permissions) and ran the following demo workflow: https://github.com/wleepang/demo-genomics-workflow-nextflow which sources public input data from us-west-2 and public reference data from us-east-1. I was only able to replicate your error when I specifically added
to
When I remove that config option, the workflow runs fine. It looks like @pditommaso - what is the suggested way to override this? Could one set |
@wleepang Thanks for the testing! I am curious how this test would have gone if you had been on a local workstation and not on an EC2 instance. How would nextflow know which region you are using for e.g. your AWS Batch queue, if it didn't have instance metadata to fall back on? I guess it would look in I guess I already know what happens because as I mentioned above I had tried commenting out the region from my |
I just wanted to refresh this thread and see if there has been any progress. It sounds like Nextflow does not currently support data in multiple S3 regions, and if that is likely to continue then I can just start working around it by copying reference data into my own region (at a cost, unfortunately). Any update would be greatly appreciated! |
We need some aws guru that deep into this issue. Also, you may want to upvote the following feature that would allow NF to use a better support for S3 storage. |
Sorry, forgot to include the link aws/aws-sdk-java-v2#1388 |
Hi @pditommaso , I am definitely not a guru - have not touched java for many years, but I was able to get a simple test working which can list buckets in two different regions with the same client. So it is definitely possible and this does not seem to be an issue at the Java level. Here is what I did:
Then I made the following changes to diff --git a/src/main/java/com/amazonaws/samples/S3Sample.java b/src/main/java/com/amazonaws/samples/S3Sample.java
index 39beedd..3d665f8 100644
--- a/src/main/java/com/amazonaws/samples/S3Sample.java
+++ b/src/main/java/com/amazonaws/samples/S3Sample.java
@@ -63,8 +63,8 @@ public class S3Sample {
*/
AmazonS3 s3 = new AmazonS3Client();
- Region usWest2 = Region.getRegion(Regions.US_WEST_2);
- s3.setRegion(usWest2);
+ // Region usWest2 = Region.getRegion(Regions.US_WEST_2);
+ // s3.setRegion(usWest2);
String bucketName = "my-first-s3-bucket-" + UUID.randomUUID();
String key = "MyObjectKey";
@@ -82,17 +82,17 @@ public class S3Sample {
* You can optionally specify a location for your bucket if you want to
* keep your data closer to your applications or users.
*/
- System.out.println("Creating bucket " + bucketName + "\n");
- s3.createBucket(bucketName);
+ // System.out.println("Creating bucket " + bucketName + "\n");
+ // s3.createBucket(bucketName);
/*
* List the buckets in your account
*/
- System.out.println("Listing buckets");
- for (Bucket bucket : s3.listBuckets()) {
- System.out.println(" - " + bucket.getName());
- }
- System.out.println();
+ // System.out.println("Listing buckets");
+ // for (Bucket bucket : s3.listBuckets()) {
+ // System.out.println(" - " + bucket.getName());
+ // }
+ // System.out.println();
/*
* Upload an object to your bucket - You can easily upload a file to
@@ -102,8 +102,8 @@ public class S3Sample {
* like content-type and content-encoding, plus additional metadata
* specific to your applications.
*/
- System.out.println("Uploading a new object to S3 from a file\n");
- s3.putObject(new PutObjectRequest(bucketName, key, createSampleFile()));
+ // System.out.println("Uploading a new object to S3 from a file\n");
+ // s3.putObject(new PutObjectRequest(bucketName, key, createSampleFile()));
/*
* Download an object - When you download an object, you get all of
@@ -117,10 +117,10 @@ public class S3Sample {
* conditional downloading of objects based on modification times,
* ETags, and selectively downloading a range of an object.
*/
- System.out.println("Downloading an object");
- S3Object object = s3.getObject(new GetObjectRequest(bucketName, key));
- System.out.println("Content-Type: " + object.getObjectMetadata().getContentType());
- displayTextInputStream(object.getObjectContent());
+ // System.out.println("Downloading an object");
+ // S3Object object = s3.getObject(new GetObjectRequest(bucketName, key));
+ // System.out.println("Content-Type: " + object.getObjectMetadata().getContentType());
+ // displayTextInputStream(object.getObjectContent());
/*
* List objects in your bucket by prefix - There are many options for
@@ -130,10 +130,19 @@ public class S3Sample {
* use the AmazonS3.listNextBatchOfObjects(...) operation to retrieve
* additional results.
*/
- System.out.println("Listing objects");
+ System.out.println("Listing objects in broad-references");
ObjectListing objectListing = s3.listObjects(new ListObjectsRequest()
- .withBucketName(bucketName)
- .withPrefix("My"));
+ .withBucketName("broad-references"));
+ for (S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) {
+ System.out.println(" - " + objectSummary.getKey() + " " +
+ "(size = " + objectSummary.getSize() + ")");
+ }
+ System.out.println();
+
+ System.out.println("Listing objects in ngi-igenomes/igenomes");
+ objectListing = s3.listObjects(new ListObjectsRequest()
+ .withBucketName("ngi-igenomes")
+ .withPrefix("igenomes"));
for (S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) {
System.out.println(" - " + objectSummary.getKey() + " " +
"(size = " + objectSummary.getSize() + ")");
@@ -144,16 +153,16 @@ public class S3Sample {
* Delete an object - Unless versioning has been turned on for your bucket,
* there is no way to undelete an object, so use caution when deleting objects.
*/
- System.out.println("Deleting an object\n");
- s3.deleteObject(bucketName, key);
+ // System.out.println("Deleting an object\n");
+ // s3.deleteObject(bucketName, key);
/*
* Delete a bucket - A bucket must be completely empty before it can be
* deleted, so remember to delete any objects from your buckets before
* you try to delete them.
*/
- System.out.println("Deleting bucket " + bucketName + "\n");
- s3.deleteBucket(bucketName);
+ // System.out.println("Deleting bucket " + bucketName + "\n");
+ // s3.deleteBucket(bucketName);
} catch (AmazonServiceException ase) {
System.out.println("Caught an AmazonServiceException, which means your request made it "
+ "to Amazon S3, but was rejected with an error response for some reason.");
Then build and run with:
First it lists the bucket It works successfully, so it is definitely possible to interact with buckets in different regions with the same client. Things to note:
Hope this is helpful. |
I happened to hit this issue the other day in Cloud9. The fix was to remove the region specification from both |
@pditommaso - will clients from the client factory fail if the region is left null? nextflow/modules/nf-amazon/src/main/nextflow/cloud/aws/AmazonClientFactory.groovy Lines 120 to 122 in 28e8c5a
I optionally set that when creating the client for CodeCommit. Speaking of which, I noticed that the code got refactored and I can't find the AwsCodeCommitRepository provider. Where did that get moved to? |
The S3 file system uses its own client. Based on what you are saying it would be enough to remove the region setting (or not specifying it) (codecommit feature was is here) |
I confirm that this only happens when setting the As a quick workaround, it's enough to not specify the above config options. |
When I tried this fix, I got this error:
Can you advise? |
Maybe I was too optimistic about the workaround. The next release will be patched. |
Included in release |
Bug report
This is on the edge between a bug report and a new feature request. I am honestly not sure which it is, or maybe it's both. After some debate I am filing it as a bug report.
Expected behavior and actual behavior
I am running a workflow in
nf-core
: https://github.com/nf-core/rnaseqThat workflow makes use of a reference data bucket:
https://github.com/nf-core/rnaseq/blob/master/conf/base.config#L71-L75
That bucket is in the
eu-west-1
region while all my infrastructure is inus-west-2
.When I run the workflow, I get:
This essentially means that nobody can run this
nf-core
workflow unless they happen to be in theeu-west-
region.It seems to me that there are a couple of approaches that could fix this issue.
One would be to create an S3 client object that is not tied to a specific region. I am not an expert in the Java/Groovy SDK for AWS but in python's boto3 it seems to ignore the
region_name
argument when creating an S3 client object. At any rate, I can specify a region and then interact with buckets from multiple regions.Another approach would be to catch the error that comes back and re-try the operation using a client that was created in the appropriate region.
Otherwise there are only two workarounds and they are both prohibitively cumbersome and/or expensive:
eu-west-1
just to get around this error.Steps to reproduce the problem
Run
nextflow run nf-core/rnaseq
with a config that specifies your AWS region as one other thaneu-west-1
.Program output
Environment
Additional context
(Add any other context about the problem here)
The text was updated successfully, but these errors were encountered: