Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add website documentation for using S3 Tables with Spark Operator #722

Merged
merged 6 commits into from
Jan 11, 2025
Merged
2 changes: 1 addition & 1 deletion analytics/terraform/spark-k8s-operator/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ Checkout the [documentation website](https://awslabs.github.io/data-on-eks/docs/
| <a name="input_enable_vpc_endpoints"></a> [enable\_vpc\_endpoints](#input\_enable\_vpc\_endpoints) | Enable VPC Endpoints | `bool` | `false` | no |
| <a name="input_enable_yunikorn"></a> [enable\_yunikorn](#input\_enable\_yunikorn) | Enable Apache YuniKorn Scheduler | `bool` | `false` | no |
| <a name="input_kms_key_admin_roles"></a> [kms\_key\_admin\_roles](#input\_kms\_key\_admin\_roles) | list of role ARNs to add to the KMS policy | `list(string)` | `[]` | no |
| <a name="input_name"></a> [name](#input\_name) | Name of the VPC and EKS Cluster | `string` | `"spark-eks-s3tables"` | no |
| <a name="input_name"></a> [name](#input\_name) | Name of the VPC and EKS Cluster | `string` | `"spark-operator-doeks"` | no |
| <a name="input_private_subnets"></a> [private\_subnets](#input\_private\_subnets) | Private Subnets CIDRs. 254 IPs per Subnet/AZ for Private NAT + NLB + Airflow + EC2 Jumphost etc. | `list(string)` | <pre>[<br> "10.1.1.0/24",<br> "10.1.2.0/24"<br>]</pre> | no |
| <a name="input_public_subnets"></a> [public\_subnets](#input\_public\_subnets) | Public Subnets CIDRs. 62 IPs per Subnet/AZ | `list(string)` | <pre>[<br> "10.1.0.0/26",<br> "10.1.0.64/26"<br>]</pre> | no |
| <a name="input_region"></a> [region](#input\_region) | Region | `string` | `"us-west-2"` | no |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ In the table below we have taken the Median times from the output for each insta

To calculate the performance increase we are calculating a ratio of the query times. For example, to determine how much faster the r8g instances were compared to the r6g instances:
- Find the times corresponding to each query, using `q20-v2.4` as an example the r6g.12xlarge took `2.81s` and the r8g.12xlarge took `1.69s`.
- We then divide r5g.12xlarge/r8g.12xlarge, for q20-v2.4 thats `2.81s/1.69s = 1.66`. So for this query the r8g.12xlarge was able to complete the queries 1.66 times faster (or a ~66% percent improvement)
- We then divide r5g.12xlarge/r8g.12xlarge, for q20-v2.4 that's `2.81s/1.69s = 1.66`. So for this query the r8g.12xlarge was able to complete the queries 1.66 times faster (or a ~66% percent improvement)

The data has been sorted by the last column, showing the performance increase r8g.12xlarge has over the r6g.12xlarge.
<div class="benchmark-results">
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ When you enter the results directory you will see a list of folders which corres

![S3 bucket showing timestamp directories for results](./img/results-s3-timestamps.png)

You can find the latest result by selecting the timestamp thats largest, or find the folder that corresponds to the time of your test.
You can find the latest result by selecting the timestamp that's largest, or find the folder that corresponds to the time of your test.
Inside this folder you will see a file with a name like `part-00000-000000000-0000-0000-0000-000000000-0000.json`, this file includes the full spark configuration used for the job.
![S3 bucket showing results files](./img/results-s3-result-folder.png)

Expand Down
2 changes: 1 addition & 1 deletion website/docs/bestpractices/preload-container-images.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ spec:
ebs:
volumeSize: 150Gi
volumeType: gp3
kmsKeyID: "arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab" # Specify KMS ID if you use custom KMS key
kmsKeyID: "arn:aws:kms:<REGION>:<ACCOUNT_ID>:key/1234abcd-12ab-34cd-56ef-1234567890ab" # Specify KMS ID if you use custom KMS key
snapshotID: snap-0123456789 # Specify your snapshot ID here
```

Expand Down
2 changes: 1 addition & 1 deletion website/docs/blueprints/data-analytics/datahub-on-eks.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
sidebar_position: 4
sidebar_position: 6
sidebar_label: DataHub on EKS
---
# DataHub on EKS
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
sidebar_position: 3
sidebar_label: Observability Spark on EKS
sidebar_position: 5
sidebar_label: Spark Observability on EKS
---

import TaxiTripExec from './_taxi_trip_exec.md';
Expand Down
4 changes: 2 additions & 2 deletions website/docs/blueprints/data-analytics/spark-eks-ipv6.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Spark Operator running on Amazon EKS IPv6
sidebar_position: 6
title: Spark Operator on EKS with IPv6
sidebar_position: 3
---

This example showcases the usage of Spark Operator running on Amazon EKS in IPv6 mode. the idea is to show and demonstarte running spark workloads on EKS IPv6 cluster.
Expand Down
436 changes: 436 additions & 0 deletions website/docs/blueprints/data-analytics/spark-operator-s3tables.md

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion website/docs/blueprints/data-analytics/superset-on-eks.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
sidebar_position: 5
sidebar_position: 7
sidebar_label: Superset on EKS
---
# Superset on EKS
Expand Down
Loading