Skip to content

Commit

Permalink
Updated readme to use aws CLI and fixed paths. Updated spark_sql vers…
Browse files Browse the repository at this point in the history
…ion (#233)
  • Loading branch information
cabeaulac authored Dec 9, 2023
1 parent 0b81d2e commit 524c1a3
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 9 deletions.
32 changes: 25 additions & 7 deletions emr-serverless-spark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,28 @@ We will run a Java Spark job on EMR Serverless using a simple Java "Hello World"
## Prerequisites

* LocalStack
* `aws` CLI & `awslocal` script
* `aws` CLI
* Docker
* Java and Maven

## Installation

### Configuring a custom profile
Configure a custom profile to use with LocalStack. Add the following profile to your AWS configuration file (by default, this file is at ~/.aws/config):
```shell
[profile localstack]
region=us-east-1
output=json
endpoint_url = http://localhost:4566
```

Add the following profile to your AWS credentials file (by default, this file is at ~/.aws/credentials):
```shell
[localstack]
aws_access_key_id=test
aws_secret_access_key=test
```

Before creating the EMR Serverless job, we need to create a JAR file containing the Java code. We have the `java-demo-1.0.jar` file in the current directory. Alternatively, you can create the JAR file yourself by following the steps below.

```bash
Expand All @@ -21,14 +37,15 @@ mvn package
Next, we need to create an S3 bucket to store the JAR file. To do this, run the following command:

```bash
cd ..
export S3_BUCKET=test
awslocal s3 mb s3://$S3_BUCKET
aws s3 mb s3://$S3_BUCKET
```

You can now copy the JAR file from your current directory to the S3 bucket:

```bash
awslocal s3 cp java-demo-1.0.jar s3://${S3_BUCKET}/code/java-spark/
aws s3 cp hello-world/target/java-demo-1.0.jar s3://${S3_BUCKET}/code/java-spark/java-demo-1.0.jar
```

## Creating the EMR Serverless Job
Expand All @@ -42,7 +59,7 @@ export JOB_ROLE_ARN=arn:aws:iam::000000000000:role/emr-serverless-job-role
We can now create an EMR Serverless application, which will run Spark 3.3.0. Run the following command:

```bash
awslocal emr-serverless create-application \
aws emr-serverless create-application \
--type SPARK \
--name serverless-java-demo \
--release-label "emr-6.9.0" \
Expand Down Expand Up @@ -73,7 +90,7 @@ export APPLICATION_ID='<application-id>'
Start the EMR Serverless application:

```shell
awslocal emr-serverless start-application \
aws emr-serverless start-application \
--application-id $APPLICATION_ID
```

Expand All @@ -82,7 +99,7 @@ awslocal emr-serverless start-application \
You can now run the EMR Serverless job:

```bash
awslocal emr-serverless start-job-run \
aws emr-serverless start-job-run \
--application-id $APPLICATION_ID \
--execution-role-arn $JOB_ROLE_ARN \
--job-driver '{
Expand All @@ -103,6 +120,7 @@ awslocal emr-serverless start-job-run \
The Spark logs will be written to the S3 bucket specified in the `logUri` parameter. You can stop the EMR Serverless application with the following command:

```bash
awslocal emr-serverless stop-application \
aws emr-serverless stop-application \
--application-id $APPLICATION_ID

```
4 changes: 2 additions & 2 deletions emr-serverless-spark/hello-world/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@
<dependencies>
<dependency> <!-- Spark dependency -->
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.3.0</version>
<artifactId>spark-sql_2.13</artifactId>
<version>3.5.0</version>
<scope>provided</scope>
</dependency>
</dependencies>
Expand Down

0 comments on commit 524c1a3

Please sign in to comment.