Skip to content

Commit

Permalink
Merge pull request #96 from uc-cdis/feat/s3-functionality
Browse files Browse the repository at this point in the history
[HP-1539] Add AWS functionality to cirrus
  • Loading branch information
mfshao authored Aug 26, 2024
2 parents 39a12ea + bde8e0f commit a0d311b
Show file tree
Hide file tree
Showing 11 changed files with 714 additions and 239 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,6 @@ repos:
- id: no-commit-to-branch
args: [--branch, develop, --branch, master, --branch, main, --pattern, release/.*]
- repo: https://github.com/psf/black
rev: 20.8b1
rev: 22.3.0
hooks:
- id: black
31 changes: 28 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,23 @@ using the library like above.

So... you should at least read how to set up your environment.

For AWS functionality you can use an example like

```
import boto3
from gen3cirrus import AwsService
client = boto3.client()
aws = AwsService(client)
object = "test.txt"
bucket = "testBucket"
expiration = 3600
url = aws.requester_pays_download_presigned_url(bucket, object, expiration)
```

## Setting up Environment for `cirrus`
`cirrus`'s wispy clouds must dwell in the great blue expanse with other Clouds.
Thus, you'll need to configure `cirrus` with necessary information about those Clouds
Expand All @@ -32,7 +49,7 @@ before being able to bask in its beauty.
You *should* only have to do this once so don't freak out.

By default, all the configurations needed by `cirrus` are assumed to be environmental
variables. You can also provide the configuration programatically in Python (instructions are later in the README).
variables. You can also provide the configuration programmatically in Python (instructions are later in the README).

**Note:** This guide should cover necessary configuration,
but in the effort of not having to maintain everything in two places,
Expand Down Expand Up @@ -106,7 +123,7 @@ a few guides on settings that up, as it requires you to enable access to the
Cloud Identity/GSuite API.

Follow directions [here](https://developers.google.com/identity/protocols/OAuth2ServiceAccount#delegatingauthority)
to deletgate domain-wide authority for your service account that you're using
to delegate domain-wide authority for your service account that you're using
for `GOOGLE_APPLICATION_CREDENTIALS`.

For the API scopes, authorize these:
Expand Down Expand Up @@ -141,7 +158,7 @@ GOOGLE_API_KEY="abcdefghijklmnopqrstuvwxyz"
```

### Setting Configuration Programatically
`cirrus`, by default, reads in necessary configurations from environmental variables. You can, however, provide all these config vars programatically by calling the `update` function on the config object in `cirrus` and passing in a dictionary.
`cirrus`, by default, reads in necessary configurations from environmental variables. You can, however, provide all these config vars programmatically by calling the `update` function on the config object in `cirrus` and passing in a dictionary.

For example:
```
Expand Down Expand Up @@ -185,6 +202,14 @@ cirrus_config.update(**settings)

*Still uses Google libraries for auth*

## AWS Specific Implentation Details

### Method for communication with AWS's API(s)

For AWS you must bring your own Boto3 client that you have configured.

You can then setup the AWS service and your client will be passed as an argument to the AWS API.

## Building the Documentation
- `pipenv install --dev`
- `pipenv run python docs/create_docs.py`
Expand Down
1 change: 1 addition & 0 deletions gen3cirrus/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
# Expose public API from each cloud provider
from .google_cloud import GoogleCloudManager
from .aws import AwsService
1 change: 1 addition & 0 deletions gen3cirrus/aws/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .services import AwsService
57 changes: 57 additions & 0 deletions gen3cirrus/aws/services.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
"""
Amazon service for interacting with APIs
"""


from gen3cirrus.aws.utils import (
generate_presigned_url,
generate_presigned_url_requester_pays,
generate_multipart_upload_url,
)

from cdislogging import get_logger

logger = get_logger(__name__, log_level="info")


class AwsService(object):
"""
Generic Amazon services using Boto3
"""

def __init__(self, boto3_client):
self.client = boto3_client

def download_presigned_url(self, bucket, key, expiration, additional_info=None):
"""
Wrapper function for generating a presigned URL for downloading an object
"""
return generate_presigned_url(
self.client, "get", bucket, key, expiration, additional_info
)

def upload_presigned_url(self, bucket, key, expiration, additional_info=None):
"""
Wrapper function for generating a presigned URL for uploading an object
"""
return generate_presigned_url(
self.client, "put", bucket, key, expiration, additional_info
)

def multipart_upload_presigned_url(self, bucket, key, expiration, upload_id, part):
"""
Wrapper function for generating a presigned URL for uploading an object using multipart upload
"""
return generate_multipart_upload_url(
self.client, bucket, key, expiration, upload_id, part
)

def requester_pays_download_presigned_url(
self, bucket, key, expiration, additional_info=None
):
"""
Wrapper function for generating a presigned URL for downloading an object from a requester pays bucket
"""
return generate_presigned_url_requester_pays(
self.client, bucket, key, expiration, additional_info
)
191 changes: 191 additions & 0 deletions gen3cirrus/aws/utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
from urllib.parse import urlencode
from botocore.exceptions import ClientError

from cdislogging import get_logger

logger = get_logger(__name__, log_level="info")

custom_params = ["user_id", "username", "client_id", "x-amz-request-payer"]


def is_custom_params(param_key):
"""
Little helper function for checking if a param key should be skipping from validation
Args:
param_key (string): a key of a param
"""
if param_key in custom_params:
return True
else:
return False


def client_param_handler(*, params, context, **_kw):
"""
Little helper function for removing customized params before validating
Args:
params (dict): a dict of parameters
context (context): for temporarily storing those removed parameters
"""
# Store custom parameters in context for later event handlers
context["custom_params"] = {k: v for k, v in params.items() if is_custom_params(k)}
# Remove custom parameters from client parameters,
# because validation would fail on them
return {k: v for k, v in params.items() if not is_custom_params(k)}


def request_param_injector(*, request, **_kw):
"""
Little helper function for adding customized params back into url before signing
Args:
request (request): request for presigned url
"""
if request.context["custom_params"]:
request.url += "&" if "?" in request.url else "?"
request.url += urlencode(request.context["custom_params"])


def customize_s3_client_param_events(s3_client):
"""
Function for modifying the params that need to be included when signing
This is needed because we need to include some customized params in the signed url, but boto3 won't allow them to exist out of the box
See https://stackoverflow.com/a/59057975
Args:
s3_client (S3.Client): boto3 S3 client
"""
s3_client.meta.events.register(
"provide-client-params.s3.GetObject", client_param_handler
)
s3_client.meta.events.register("before-sign.s3.GetObject", request_param_injector)
s3_client.meta.events.register(
"provide-client-params.s3.PutObject", client_param_handler
)
s3_client.meta.events.register("before-sign.s3.PutObject", request_param_injector)
return s3_client


def generate_presigned_url(
client, method, bucket_name, object_name, expires, additional_info=None
):
"""
Function for generating a presigned URL for upload or download
Args:
client (S3.Client): boto3 S3 client
method (string): ["get", "put"] "get" for download and "put" for upload
bucket_name (string): s3 bucket name
object_name (string): s3 bucket object key
expires (int): time for presigned URL to exist (in seconds)
additional_info (dict): dict of additional parameters to pass to s3 for signing
"""

params = {}
params["Bucket"] = bucket_name
params["Key"] = object_name

additional_info = additional_info or {}
for key in additional_info:
params[key] = additional_info[key]

s3_client = customize_s3_client_param_events(client)

if method == "get":
client_method = "get_object"
elif method == "put":
client_method = "put_object"
else:
logger.error(
"method for generating presigned URL must be 'get' for download or 'put' for upload"
)
return None

try:
response = s3_client.generate_presigned_url(
client_method,
Params=params,
ExpiresIn=expires,
)

except ClientError as e:
logger.error(e)
return None

return response


def generate_multipart_upload_url(
client, bucket_name, object_name, expires, upload_id, part_no
):
"""
Function for generating a presigned URL only for one part of multipart upload
Args:
client (S3.Client): boto3 S3 client
method (string): ["get", "put"] "get" for download and "put" for upload
bucket_name (string): s3 bucket name
object_name (string): s3 bucket object key
expires (int): time for presigned URL to exist (in seconds)
upload_id (string): ID for upload to s3
part_no (int): part number of multipart upload
"""
s3_client = client
try:
response = s3_client.generate_presigned_url(
ClientMethod="upload_part",
Params={
"Bucket": bucket_name,
"Key": object_name,
"UploadId": upload_id,
"PartNumber": part_no,
},
ExpiresIn=expires,
)

except ClientError as e:
logger.error(e)
return None

return response


def generate_presigned_url_requester_pays(
client, bucket_name, object_name, expires, additional_info=None
):
"""
Function for generating a presigned URL only for requester pays buckets
Args:
client (S3.Client): boto3 S3 client
method (string): ["get", "put"] "get" for download and "put" for upload
bucket_name (string): s3 bucket name
object_name (string): s3 bucket object key
expires (int): time for presigned URL to exist (in seconds)
additional_info (dict): dict of additional parameters to pass to s3 for signing
"""
params = {}
params["Bucket"] = bucket_name
params["Key"] = object_name
params["RequestPayer"] = "requester"

additional_info = additional_info or {}
for key in additional_info:
params[key] = additional_info[key]

s3_client = customize_s3_client_param_events(client)

try:
response = s3_client.generate_presigned_url(
"get_object",
Params=params,
ExpiresIn=expires,
)

except ClientError as e:
logger.error(e)
return None

return response
1 change: 1 addition & 0 deletions gen3cirrus/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
Current Capabilities:
- Manage Google resources, policies, and access (specific Google APIs
are abstracted through a Management class that exposes needed behavior)
- Manage AWS resources amd access S3 APIs
"""


Expand Down
4 changes: 3 additions & 1 deletion gen3cirrus/errors.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
class CirrusError(Exception):
def __init__(self, message="There was an error within the gen3cirrus library.", *args):
def __init__(
self, message="There was an error within the gen3cirrus library.", *args
):
super(CirrusError, self).__init__(message)


Expand Down
Loading

0 comments on commit a0d311b

Please sign in to comment.