Upload recording feature #787

KIRA009 · 2024-06-21T13:51:48Z

What kind of change does this PR introduce?
This PR addresses #724

Summary
This PR adds a script to deploy an app to AWS lambda, that is then used by the openadapt app to upload zipfiles of recordings from users.

Checklist

My code follows the style guidelines of OpenAdapt
I have performed a self-review of my code
If applicable, I have added tests to prove my fix is functional/effective
I have linted my code locally prior to submission
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation (e.g. README.md, requirements.txt)
New and existing unit tests pass locally with my changes

How can your code be run and tested?
From the project root, run python -m scripts.recording_uploader.deploy (ensure that you have the necessary aws creds configured). Once the command completes, note the api url in the output, and paste that onto the config.py's RECORDING_UPLOAD_URL variable. Once that is done, start the app, navigate to a recording detail page, and click on the "Upload recording" button. Check the s3 bucket to confirm that the recording has been uploaded (in the form of a zip file)

Other information

abrichr

Thank you for putting this together @KIRA009 ! Just left a few small comments, happy to chat about any of it if you like! 🙏 😄

abrichr · 2024-06-22T15:55:46Z

openadapt/utils.py

+    with open(file_path, "rb") as file:
+        files = {"file": (filename, file)}
+        resp = requests.put(upload_url, files=files)
+        resp.raise_for_status()


What do you think about returning the response here?

abrichr · 2024-06-22T15:57:05Z

scripts/recording_uploader/.gitignore

@@ -0,0 +1,244 @@
+
+# Created by https://www.gitignore.io/api/osx,linux,python,windows,pycharm,visualstudiocode


Interesting, can you please clarify why this is necessary / preferable to a minimal .gitignore? What did you need to ignore here? Why not keep it in the root .gitignore?

This is autogenerated from the template. You are right though, this isn't needed

abrichr · 2024-06-22T15:58:12Z

scripts/recording_uploader/README.md

+
+## Deploy the application
+
+There is a `deploy` script that creates the s3 bucket and deploys the application using the SAM CLI (included as part of the dev dependencies of this project). The bucket name is hardcoded in the script. The SAM CLI is set up to run in `guided` mode, which will prompt the user every time befor deploying, in case the user wants to change the default values.


Typo: befor -> before

Can we make the bucket name configurable, with the aws region etc?

abrichr · 2024-06-22T16:01:00Z

scripts/recording_uploader/__init__.py

@@ -0,0 +1 @@
+"""Init file for the recording_uploader package."""


If this is a package, should we move it outside of scripts?

It should be named as a module instead 😅

abrichr · 2024-06-22T16:11:58Z

scripts/recording_uploader/uploader/app.py

+def get_presigned_url() -> dict:
+    """Generate a presigned URL for uploading a recording to S3."""
+    bucket = "openadapt"
+    region_name = "us-east-1"


What do you think about putting these in NAMED_CONSTANTS at the top of this file, and passing them in as default keyword arguments to get_presigned_url?

abrichr · 2024-06-22T16:25:55Z

scripts/recording_uploader/deploy.py

+    if guided:
+        commands.append("--guided")
+    subprocess.run(commands, cwd=CURRENT_DIR, check=True)
+    print("Lambda function deployed successfully.")


What do you think about using logger here?

abrichr · 2024-06-22T16:26:33Z

scripts/recording_uploader/deploy.py

+            Bucket=bucket,
+        )
+    except (s3.exceptions.BucketAlreadyExists, s3.exceptions.BucketAlreadyOwnedByYou):
+        proceed = input(f"Bucket '{bucket}' already exists. Proceed? [y/N] ")


Is this necessary? What happens if the user proceeds if the bucket has already been created?

I thought this would be good to have in case the user doesn't want to overwrite existing buckets, but I realise that is quite a niche case

Will this remove existing data?

Its unlikely that it will, given we are generating random filenames, so yes maybe we can remove this part

I have removed this

abrichr · 2024-06-22T16:27:53Z

scripts/recording_uploader/uploader/app.py

+            "Bucket": bucket,
+            "Key": key,
+        },
+        ExpiresIn=3600,


Can you please make this a named constant, e.g. ONE_HOUR_IN_SECONDS = 60 * 60?

abrichr · 2024-06-22T16:29:07Z

scripts/recording_uploader/uploader/app.py

+        region_name=region_name,
+        endpoint_url=f"https://s3.{region_name}.amazonaws.com",
+    )
+    key = f"recordings/{uuid4()}.zip"


What do you think about adding the user's unique id to the path, e.g. recordings/{user_id}/{upload_id}.zip?

Makes sense

abrichr · 2024-06-22T16:38:30Z

Once we start scaling we should consider supporting B2, e.g. app.py:

"""Lambda-like function for generating a presigned URL for uploading a recording to B2."""

from typing import Any
from uuid import uuid4
import json
from b2sdk.v2 import B2Api, InMemoryAccountInfo

def get_b2_client() -> B2Api:
    """Create and return a B2 client."""
    info = InMemoryAccountInfo()
    b2_api = B2Api(info)
    b2_api.authorize_account("production", "applicationKeyId", "applicationKey")
    return b2_api

def lambda_handler(*args: Any, **kwargs: Any) -> dict:
    """Main entry point for the function."""
    return {
        "statusCode": 200,
        "body": json.dumps(get_presigned_url()),
    }

def get_presigned_url() -> dict:
    """Generate a presigned URL for uploading a recording to B2."""
    bucket_name = "openadapt"
    b2_api = get_b2_client()
    bucket = b2_api.get_bucket_by_name(bucket_name)
    file_name = f"recordings/{uuid4()}.zip"
    file_info = {'how': 'good-file'}

    presigned_url = bucket.get_upload_url(file_name, file_info=file_info)
    
    return {"url": presigned_url['upload_url'], "upload_auth_token": presigned_url['authorization_token']}

For now let's stick with s3.

KIRA009 · 2024-07-01T12:45:23Z

Before this is merged, we need to setup the upload url and update in on config.py - RECORDING_UPLOAD_URL

abrichr · 2024-07-05T17:05:05Z

scripts/recording_uploader/deploy.py

+        region_name (str): The AWS region to deploy the Lambda function to.
+        guided (bool): Whether to use the guided SAM deployment.
+    """
+    s3 = boto3.client(


@KIRA009 can you please modify this to use the credentials specified in openadapt.config? Specifically we should add AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY or similar.

Also, please add some documentation regarding the permissions required for this IAM user.

Do we want to add these keys to the config? They won't be used anywhere else in the project, and I think when you run the deploy script, if boto3 doesn't find appropriate keys in the place where its looking, the user is notified of that.

For the access keys, I followed this without any explicit permissions - https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#Using_CreateAccessKey. These keys are the ones that you need to run the deploy script, which then creates appropriate iam users on its own. Once its deployed, if you want you can delete the original access keys. Should I add this in the readme?

abrichr · 2024-07-05T17:06:49Z

scripts/recording_uploader/uploader/requirements.txt

@@ -0,0 +1 @@
+boto3==1.34.84


What do you think about moving all of this outside of /scripts, and into e.g. /admin or similar?

I think the uploader module has to remain inside the recording_uploader module. If you are talking about the recording_uploader module, we could, but I think scripts is also a good enough place for it to be in. Let me know

Yes I meant to move recording_uploader into a new directory, /admin or similar.

…loader stack

abrichr · 2024-07-15T15:47:32Z

README.md

+If you want to self host the app, you should run the following scripts
+
+**recording_uploader**
+- Ensure that you have valid AWS credentials added in your environment


What do you think about loading the AWS credentials from config.py?

KIRA009 added 3 commits June 21, 2024 16:28

feat: Add script to deploy uploader code to a lambda

19d8aca

feat: Add upload recording button in dashboard

9c0bf11

chore: Fix flake8 lint errors

38d4d94

abrichr requested changes Jun 22, 2024

View reviewed changes

abrichr reviewed Jun 22, 2024

View reviewed changes

KIRA009 added 3 commits June 24, 2024 12:02

feat: Upload recording to user id specific folders

cd3614e

Merge branch 'main' into feature/upload-recording

ce7abb0

chore: Replace package with module and remove unwanted code

59c3018

abrichr reviewed Jul 5, 2024

View reviewed changes

KIRA009 added 3 commits July 15, 2024 15:10

Merge branch 'main' into feature/upload-recording

e96dd07

chore: Move recording uploader to separate admin folder

78f10de

docs: Update README.md with details on how to deploy the recording up…

bfffb05

…loader stack

abrichr reviewed Jul 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upload recording feature #787

Upload recording feature #787

KIRA009 commented Jun 21, 2024 •

edited

Loading

abrichr left a comment

abrichr Jun 22, 2024

abrichr Jun 22, 2024

KIRA009 Jun 24, 2024

abrichr Jun 22, 2024

abrichr Jun 22, 2024

abrichr Jun 22, 2024

KIRA009 Jul 1, 2024

abrichr Jun 22, 2024

abrichr Jun 22, 2024

abrichr Jun 22, 2024

KIRA009 Jun 24, 2024 •

edited

Loading

abrichr Jun 24, 2024

KIRA009 Jun 24, 2024

KIRA009 Jul 1, 2024

abrichr Jun 22, 2024 •

edited

Loading

abrichr Jun 22, 2024

KIRA009 Jun 24, 2024

abrichr commented Jun 22, 2024

KIRA009 commented Jul 1, 2024

abrichr Jul 5, 2024

KIRA009 Jul 8, 2024

abrichr Jul 5, 2024

KIRA009 Jul 8, 2024

abrichr Jul 12, 2024

abrichr Jul 15, 2024

		@@ -0,0 +1,244 @@

		# Created by https://www.gitignore.io/api/osx,linux,python,windows,pycharm,visualstudiocode


		## Deploy the application

		There is a `deploy` script that creates the s3 bucket and deploys the application using the SAM CLI (included as part of the dev dependencies of this project). The bucket name is hardcoded in the script. The SAM CLI is set up to run in `guided` mode, which will prompt the user every time befor deploying, in case the user wants to change the default values.

		@@ -0,0 +1 @@
		"""Init file for the recording_uploader package."""

		@@ -0,0 +1 @@
		boto3==1.34.84

Upload recording feature #787

Are you sure you want to change the base?

Upload recording feature #787

Conversation

KIRA009 commented Jun 21, 2024 • edited Loading

abrichr left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KIRA009 Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abrichr Jun 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abrichr commented Jun 22, 2024

KIRA009 commented Jul 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KIRA009 commented Jun 21, 2024 •

edited

Loading

KIRA009 Jun 24, 2024 •

edited

Loading

abrichr Jun 22, 2024 •

edited

Loading