Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload all my photos to a secure S3 bucket #4

Closed
3 tasks done
simonw opened this issue Apr 18, 2020 · 14 comments
Closed
3 tasks done

Upload all my photos to a secure S3 bucket #4

simonw opened this issue Apr 18, 2020 · 14 comments

Comments

@simonw
Copy link
Contributor

simonw commented Apr 18, 2020

  • Create a bucket with bucket credentials
  • Programmatically upload some recent photos to it (from a notebook)
  • Turn this into a script
@simonw
Copy link
Contributor Author

simonw commented Apr 18, 2020

Research thread: https://twitter.com/simonw/status/1249049694984011776

I want to build some software that lets people store their own data in their own S3 bucket, but if possible I'd like not to have to teach people the incantations needed to get their bucket setup and minimum-permission credentials figures out

https://testdriven.io/blog/storing-django-static-and-media-files-on-amazon-s3/ looks useful

@simonw
Copy link
Contributor Author

simonw commented Apr 18, 2020

I'm going to call my bucket dogsheep-photos-simon.

@simonw
Copy link
Contributor Author

simonw commented Apr 18, 2020

https://console.aws.amazon.com/s3/bucket/create?region=us-west-1

S3_Management_Console

I created it with no public read-write access. I plan to use signed URLs via a transforming proxy to access images for display on the web.

@simonw
Copy link
Contributor Author

simonw commented Apr 18, 2020

Creating IAM groups called dogsheep-photos-simon-read-write and dogsheep-photos-simon-read: https://console.aws.amazon.com/iam/home#/groups - I created them with no attached policies.

Now I can attach an "inline policy" to each one. For the read-write group I go here:

https://console.aws.amazon.com/iam/home#/groups/dogsheep-photos-simon-read-write

IAM_Management_Console

Example policies are here: https://docs.aws.amazon.com/AmazonS3/latest/dev/example-bucket-policies.html

For the read-write one I went with:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::dogsheep-photos-simon/*"
            ]
        }
    ]
}

For the read-only policy I'm going to guess that this is appropriate:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject*",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::dogsheep-photos-simon/*"
            ]
        }
    ]
}

I tried the policy simulator to test this out: https://policysim.aws.amazon.com/home/index.jsp?#groups/dogsheep-photos-simon-read - this worked:

IAM_Policy_Simulator

@simonw
Copy link
Contributor Author

simonw commented Apr 18, 2020

Next step: create two IAM users, one for each of those groups.

https://console.aws.amazon.com/iam/home#/users$new?step=details

IAM_Management_Console

IAM_Management_Console

I copied the keys into a secure note in 1password.

Couldn't get into Transmit with them though! https://library.panic.com/transmit/transmit5/iam-roles/ may help.

@simonw
Copy link
Contributor Author

simonw commented Apr 18, 2020

I'm going to create another user just for Transmit, with full S3 access.

name: dogsheep-photos-simon-s3-all-access

Rather than creating a group for that user, I'm trying the "Attach existing policies directly" option:

IAM_Management_Console

That user DID work with Transmit. I uploaded a test HEIC image. I used Transmit to copy a signed URL for it.

~ $ curl -i 'https://dogsheep-photos-simon.s3.us-west-1.amazonaws.com/IMG_7195.HEIC?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAWXFXAI...' | head -n 100
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0HTTP/1.1 200 OK
x-amz-id-2: gBOCYqZfbNAnv0R/uJ++qm2NbW5SgD4TapgF9RQjzzeDIThcCz/BkKU+YoxlG4NJHlcmMgAHyh4=
x-amz-request-id: C2FE7FCC3BD53A84
Date: Sat, 18 Apr 2020 20:28:54 GMT
Last-Modified: Sat, 18 Apr 2020 20:13:49 GMT
ETag: "fe3e081239a123ef745517878c53b854"
Accept-Ranges: bytes
Content-Type: image/heic
Content-Length: 1913097
Server: AmazonS3

@simonw
Copy link
Contributor Author

simonw commented Apr 18, 2020

Next step: attempt a programmatic upload using the dogsheep-photos-simon-read-write credentials from a Jupyter notebook.

Also attempt a programmatic bucket listing and read using dogsheep-photos-simon-read credentials.

@simonw
Copy link
Contributor Author

simonw commented Apr 18, 2020

This worked!

Dogsheep_Photos_S3_access

And this worked:

Dogsheep_Photos_S3_access

@simonw
Copy link
Contributor Author

simonw commented Apr 18, 2020

But... list_objects failed for both of my keys (read and write):

Dogsheep_Photos_S3_access

@simonw
Copy link
Contributor Author

simonw commented Apr 18, 2020

How about generating a signed URL?

read_client.generate_presigned_url(
    "get_object",
    Params={
        "Bucket": "dogsheep-photos-simon",
        "Key": "this_is_fine.jpg",
    },
    ExpiresIn=600
)

Gave me https://dogsheep-photos-simon.s3.amazonaws.com/this_is_fine.jpg?AWSAccessKeyId=AKIAWXFXAIOZNZ3JFO7I&Signature=x1zrS4w4OTGAACd7yHp9mYqXvN8%3D&Expires=1587243398

Which does this:

~ $ curl -i 'https://dogsheep-photos-simon.s3.amazonaws.com/this_is_fine.jpg?AWSAccessKeyId=AKIAWXFXAIOZNZ3JFO7I&Signature=x1zrS4w4OTGAACd7yHp9mYqXvN8%3D&Expires=1587243398'
HTTP/1.1 307 Temporary Redirect
x-amz-bucket-region: us-west-1
x-amz-request-id: E78CD859AEE21D33
x-amz-id-2: 648mx+1+YSGga7NDOU7Q6isfsKnEPWOLC+DI4+x2o9FCc6pSCdIaoHJUbFMI8Vsuh1ADtx46ymU=
Location: https://dogsheep-photos-simon.s3-us-west-1.amazonaws.com/this_is_fine.jpg?AWSAccessKeyId=AKIAWXFXAIOZNZ3JFO7I&Signature=x1zrS4w4OTGAACd7yHp9mYqXvN8%3D&Expires=1587243398
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Sat, 18 Apr 2020 20:47:21 GMT
Server: AmazonS3

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>TemporaryRedirect</Code><Message>Please re-send this request to the specified temporary endpoint. Continue to use the original request endpoint for future requests.</Message><Endpoint>dogsheep-photos-simon.s3-us-west-1.amazonaws.com</Endpoint><Bucket>dogsheep-photos-simon</Bucket><RequestId>E78CD859AEE21D33</RequestId><HostId>648mx+1+YSGga7NDOU7Q6isfsKnEPWOLC+DI4+x2o9FCc6pSCdIaoHJUbFMI8Vsuh1ADtx46ymU=</HostId></Error>~ $ 

So it redirects to another URL... which returns this:

~ $ curl -i 'https://dogsheep-photos-simon.s3-us-west-1.amazonaws.com/this_is_fine.jpg?AWSAccessKeyId=AKIAWXFXAIOZNZ3JFO7I&Signature=x1zrS4w4OTGAACd7yHp9mYqXvN8%3D&Expires=1587243398'
HTTP/1.1 200 OK
x-amz-id-2: XafOl6mswj3yz0GJC9+Ptot1ll5sROVwqsMc10CUUfgpaUANTdIx2GhnONb5d1GVFJ6wlS2j3UY=
x-amz-request-id: 258387C180411AFE
Date: Sat, 18 Apr 2020 20:47:52 GMT
Last-Modified: Sat, 18 Apr 2020 20:37:35 GMT
ETag: "ee04081c3182a44a1c6944e94012e977"
Accept-Ranges: bytes
Content-Type: binary/octet-stream
Content-Length: 53072
Server: AmazonS3

????JFIF??C

So that worked! It did come back with Content-Type: binary/octet-stream though.

@simonw
Copy link
Contributor Author

simonw commented Apr 18, 2020

Running the upload again like this resulted in the correct content-type:

client.upload_file(
    "/Users/simonw/Desktop/this_is_fine.jpg",
    "dogsheep-photos-simon",
    "this_is_fine.jpg",
    ExtraArgs={
        "ContentType": "image/jpeg"
    }
)

@simonw
Copy link
Contributor Author

simonw commented Apr 18, 2020

This is great! I now have a key that can upload photos, and a separate key that can download photos OR generate signed URLs to access those photos.

Next step: a script that starts uploading my photos.

@simonw
Copy link
Contributor Author

simonw commented Apr 18, 2020

I'm going to start with this:

photos-to-sqlite upload photos.db ~/path/to/directory

This will scan the provided directory (and all sub-directories) for image files. It will then:

  • Calculate a sha256 of the contents of that file
  • Upload the file to a key that's sha256.jpg or .heic
  • Upload a sha256.json file with the original path to the image
  • Add that image to a uploads table in photos.db

Stretch goal: grab the EXIF data and include that in the .json upload AND the uploads database table.

@simonw
Copy link
Contributor Author

simonw commented Apr 18, 2020

Got this working! I'll do EXIF in a separate ticket #3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant