Skip to content

Commit

Permalink
Add docs on log retention and allowed origins, fix cdkproxy docs (dat…
Browse files Browse the repository at this point in the history
…a-dot-all#1554)

### Feature or Bugfix
<!-- please choose -->
- Documentation



### Detail
- Add docs on configurable CloudWatch log retention
- Add docs on allowed origin configuration
- Update docs on cdkproxy code walkthrough


### Relates
- data-dot-all#1527
- data-dot-all#1486

### Security
Please answer the questions below briefly where applicable, or write
`N/A`. Based on
[OWASP 10](https://owasp.org/Top10/en/).

- Does this PR introduce or modify any input fields or queries - this
includes
fetching data from storage outside the application (e.g. a database, an
S3 bucket)?
  - Is the input sanitized?
- What precautions are you taking before deserializing the data you
consume?
  - Is injection prevented by parametrizing queries?
  - Have you ensured no `eval` or similar functions are used?
- Does this PR introduce any functionality or component that requires
authorization?
- How have you ensured it respects the existing AuthN/AuthZ mechanisms?
  - Are you logging failed auth attempts?
- Are you using or adding any cryptographic features?
  - Do you use a standard proven implementations?
  - Are the used keys controlled by the customer? Where are they stored?
- Are you introducing any new policies/roles/users?
  - Have you used the least-privilege principle? How?


By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
  • Loading branch information
noah-paige authored Sep 17, 2024
1 parent 1a994c0 commit 3bdfa9f
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 13 deletions.
17 changes: 4 additions & 13 deletions pages/code.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,21 +171,12 @@ The data.all `base.api` package contains the `gql` sub-package to support GraphQ

#### cdkproxy
This package contains the code associated with the deployment of CDK stacks that correspond to data.all resources.
`cdkproxy` is a package that exposes a REST API to run registered cloudformation stacks using AWS CDK. It is deployed as a docker container running on AWS ECS.
`cdkproxy` is a package that runs registered cloudformation stacks using AWS CDK. It is bundled as a docker image and run as a AWS ECS task which is triggered on infastrcutre as code (IaC) operations on data.all (e.g. CRUD of data.all resources).

When a data.all resource is created, the API sends an HTTP request
to the docker service and the code runs the appropriate stack using `cdk` cli.
When an API request is made to create a data.all resource, such as a new dataset, the data.all backend sends a new message to an SQS Queue to asynchronously be read off the queue and start a new cdkproxy ECS task.
The code uses a `cdk` cli wrapper to register infrastructure and manage cdk commands, and runs the appropriate stack using `cdk` cli to deploy the IaC of the respective data.all resource.

These stacks are deployed with the `cdk` cli wrapper
The API itself consists of 4 actions/paths:

- GET / : checks if the server is running
- POST /stack/{stackid} : creates or updates the stack
- DELETE /stack/{stackid} : deletes the stack
- GET /stack/{stackid] : returns stack status

The webserver is running on docker, using Python's [FASTAPI](https://fastapi.tiangolo.com/)
web framework and running using [uvicorn](https://www.uvicorn.org/) ASGI server.
For local data.all deployments, a webserver runs on docker using Python's [FASTAPI](https://fastapi.tiangolo.com/) web framework and [uvicorn](https://www.uvicorn.org/) ASGI server. Subsequnetly, data.all sends POST API Requests to the `cdkproxy` web server to start the data.all infrastructure task.

### core/ <a name="core"></a>
Core contains those functionalities that are indispensable to run data.all. Customization of the core should be limited
Expand Down
7 changes: 7 additions & 0 deletions pages/deploy/deploy_aws.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,7 @@ of our repository. Open it, you should be seen something like:
"repository_source": "string_VERSION_CONTROL_SERVICE|(codecommit, codestar_connection) DEFAULT=codecommit",
"repo_string": "string_REPOSITORY_IN_GITHUB_OWNER/REPOSITORY|DEFAULT=awslabs/aws-dataall, REQUIRED if repository_source=codestar_connection",
"repo_connection_arn": "string_CODESTAR_SOURCE_CONNECTION_ARN_FOR_GITHUB_arn:aws:codestar-connections:region:account-id:connection/connection-id|DEFAULT=None, REQUIRED if repository_source=codestar_connection",
"log_retention_duration": "string_LOG_RETENTION_DURATION|DEFAULT=TWO_YEARS",
"DeploymentEnvironments": [
{
"envname": "string_ENVIRONMENT_NAME|REQUIRED",
Expand All @@ -193,6 +194,7 @@ of our repository. Open it, you should be seen something like:
"enable_cw_canaries": "boolean_SET_CLOUDWATCH_CANARIES_FOR_FRONTEND_TESTING|DEFAULT=false",
"shared_dashboards_sessions": "string_TYPE_SESSION_SHARED_DASHBOARDS|(reader, anonymous) DEFAULT=anonymous",
"enable_pivot_role_auto_create": "boolean_ENABLE_PIVOT_ROLE_AUTO_CREATE_IN_ENVIRONMENT|DEFAULT=false",
"allowed_origins": "string_TYPE_DOMAIN_ORIGIN|DEFAULT=*",
"enable_update_dataall_stacks_in_cicd_pipeline": "boolean_ENABLE_UPDATE_DATAALL_STACKS_IN_CICD_PIPELINE|DEFAULT=false",
"enable_opensearch_serverless": "boolean_USE_OPENSEARCH_SERVERLESS|DEFAULT=false",
"cognito_user_session_timeout_inmins": "integer_COGNITO_USER_SESSION_TIMEOUT_INMINS|DEFAULT=43200",
Expand Down Expand Up @@ -235,6 +237,8 @@ and find 2 examples of cdk.json files.
| source | Optional | The version control source for the repository. It can take 2 values 'codecommit' or 'codestar_connection'. (default: 'codecommit') |
| repo_string | Optional | The repository path as string. Required if source='codestar_connection' (default: 'awslabs/aws-dataall') |
| repo_connection_arn | Optional | The arn of the CodeStar connection connecting with the source repository. Required if source='codestar_connection'(default: None) |
| log_retention_duration | Optional | The CloudWatch log retention days for all data.all compute log groups (e.g. Lambda and ECS Tasks), VPC flow logs, and API Activity logs - this parameter is specified as a string value of one of the AWS CDK enum RetentionDays members (default: `TWO_YEARS`) |

| **Deployment environments Parameters** | **Optional/Required** | **Definition** |
| ---------------------------- | --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| envname | REQUIRED | The name of the deployment environment (e.g dev, qa, prod,...). It must be in lower case without any special character. |
Expand All @@ -258,6 +262,7 @@ and find 2 examples of cdk.json files.
| cognito_user_session_timeout_inmins | Optional | The number of minutes to set the refresh token validity time for user session's in Cognito before a user must re-login to the data.all UI (default: 43200 - i.e. 30 days) |
| reauth_config | Optional | A dictionary containing a list of API operations that require a user to re-authenticate before proceedind (`reauth_apis`) and a time to live (`ttl`) for how long a user's re-auth session is valid to perform re-auth APIs before having to re-authenticate again |
| custom_auth | Optional | A dictionary containing set of parameters to setup external IDP ( Authentication and Authorization) in data.all. Custom Auth Configuration : `provider`, `url`, `redirect_url`, `client_id`, `response_types`, `scopes`, `jwks_url`, `claims_mapping` (Nested dictionary containing configuration : `user_id`, `email`). All the configurations are required if setting data.all with an external OIDC supported IDP |
| allowed_origins | Optional | A string origin to be specified as the `Access-Control-Allow-Origin` response header when returning responses from bakend (default: `'*'`) |

**Example 1**: Basic deployment: this is an example of a minimum configured cdk.json file.

Expand Down Expand Up @@ -300,6 +305,7 @@ deploy to 2 deployments accounts.
"git_release": true,
"quality_gate": false,
"resource_prefix": "da",
"log_retention_duration": "SIX_YEARS",
"DeploymentEnvironments": [
{
"envname": "dev",
Expand Down Expand Up @@ -332,6 +338,7 @@ deploy to 2 deployments accounts.
"enable_update_dataall_stacks_in_cicd_pipeline": true,
"enable_opensearch_serverless": true,
"cognito_user_session_timeout_inmins": 240,
"allowed_origins": "https://example.com",
"reauth_config": {
"reauth_apis": ["CreateDataset", "ImportDataset", "deleteDataset"],
"ttl": 10
Expand Down

0 comments on commit 3bdfa9f

Please sign in to comment.