Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding structure for Bulk Export #3543

Merged
merged 9 commits into from
Oct 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .github/config/chunks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,9 @@ chunks:
- splitter/splitter-core
- splitter/splitter-lambda
- garbage-collector
- bulk-export/bulk-export-core
- bulk-export/bulk-export-runner
- bulk-export/bulk-export-starter
rust:
name: Rust
workflow: chunk-rust.yaml
Expand Down Expand Up @@ -75,5 +78,3 @@ chunks:
- query/query-lambda
- athena
- trino


4 changes: 4 additions & 0 deletions .github/workflows/chunk-compaction.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,10 @@ on:
- 'java/common/dynamodb-tools/**'
- 'java/core/**'
- 'java/common/dynamodb-test/**'
- 'java/bulk-export/pom.xml'
- 'java/bulk-export/bulk-export-core/**'
- 'java/bulk-export/bulk-export-runner/**'
- 'java/bulk-export/bulk-export-starter/**'

jobs:
chunk-workflow:
Expand Down
4 changes: 2 additions & 2 deletions docs/01-getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ allow you to deploy an instance, ingest some files, and run reports and scripts

The quickest way to get an instance of Sleeper is to deploy to LocalStack in Docker on your local machine. Note that the
LocalStack version has very limited functionality in comparison to the AWS version, and can only handle small volumes of
data. See the documentation on [deploying to localstack](10-deploy-to-localstack.md) for more information.
data. See the documentation on [deploying to localstack](11-deploy-to-localstack.md) for more information.

## Deploy to AWS

Expand Down Expand Up @@ -72,7 +72,7 @@ Next, you'll need a VPC that is suitable for deploying Sleeper. You'll also want
avoid lengthy uploads of large jar files and Docker images. You can use the Sleeper CLI to create both of these.

If you prefer to use your own EC2, you'll need to build Sleeper there as described in
the [developer guide](11-dev-guide.md). The EC2 should run on an x86_64 architecture. If you prefer to use your own VPC,
the [developer guide](12-dev-guide.md). The EC2 should run on an x86_64 architecture. If you prefer to use your own VPC,
you'll need to ensure it meets Sleeper's requirements. Deployment of an EC2 to an existing VPC is documented in
the [deployment guide](02-deployment-guide.md#managing-environments).

Expand Down
2 changes: 1 addition & 1 deletion docs/02-deployment-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ cd sleeper # Change directory to the root of the Git repository
If you used the system test deployment described in the getting started guide, you will have already built Sleeper.

To build Sleeper locally to interact with an instance from elsewhere, you can follow the instructions in
the [developer guide](11-dev-guide.md#install-prerequisite-software).
the [developer guide](12-dev-guide.md#install-prerequisite-software).

### Configure AWS

Expand Down
2 changes: 1 addition & 1 deletion docs/04-tables.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Tables
A Sleeper instance contains one or more tables. Each table has four important
properties: a name, a schema for storing data for that table, a state store for
storing metadata about the table, and a flag to denote whether the table is
online or not (see [here](12-design.md#Tables) for more information about
online or not (see [here](14-design.md#Tables) for more information about
online tables). All resources for the instance, such as the S3 bucket used for
storing data in a table, ECS clusters and lambda functions are shared across
all the tables.
Expand Down
2 changes: 1 addition & 1 deletion docs/05-ingest.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ with columns matching the fields in your schema (note that the fields in the sch
all need to be non-optional).

Note that the descriptions below describe how data in Parquet files can be ingested by sending ingest job
definitions in JSON form to SQS queues. In practice it may be easier to use the [Python API](08-python-api.md).
definitions in JSON form to SQS queues. In practice it may be easier to use the [Python API](09-python-api.md).

When you have the data you want to ingest stored in Parquet files, a message should be sent
to Sleeper's ingest queue telling it that the data should be ingested. This message should have the following form:
Expand Down
4 changes: 2 additions & 2 deletions docs/07-data-retrieval.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ should either specify all of the row key fields, or the first one or more fields
fields, key1, key2 and key3, then a query should specify either ranges for key1, key2 and key3, or ranges for key1 and
key2, or ranges for key1.

The methods below describe how queries can be executed using scripts. See the docs on the [Python API](08-python-api.md)
The methods below describe how queries can be executed using scripts. See the docs on the [Python API](09-python-api.md)
for details of how to execute them from Python.

These instructions will assume you start in the project root directory and Sleeper has been built
(see [the developer guide](11-dev-guide.md) for how to set that up).
(see [the developer guide](12-dev-guide.md) for how to set that up).

## Running queries directly using the Java client

Expand Down
5 changes: 5 additions & 0 deletions docs/08-export.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Exporting data
==============

## Introduction
Feature coming soon.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ functionality and will only work with small volumes of data, but will allow you
ingest, and run reports and scripts against the instance.

These instructions will assume you start in the project root directory and Sleeper has been built
(see [the developer guide](11-dev-guide.md) for how to set that up).
(see [the developer guide](12-dev-guide.md) for how to set that up).

## Launch LocalStack container

Expand Down
10 changes: 5 additions & 5 deletions docs/11-dev-guide.md → docs/12-dev-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ mvn clean install -Pquick -DskipRust=true

## Using the codebase

The codebase is structured around the components explained in the [design document](12-design.md). The elements of the
The codebase is structured around the components explained in the [design document](14-design.md). The elements of the
design largely correspond to Maven modules. Core or common modules contain shared model code. Other modules contain
integrations with libraries which are not needed by all components of the system, eg. AWS API clients.

Expand Down Expand Up @@ -210,7 +210,7 @@ public void process(String foo, String bar) {

The Maven project includes unit tests, integration tests and system tests. We use JUnit 5, with AssertJ for assertions.
We also have a setup for manual testing against a deployed instance of Sleeper, documented in
the [system tests guide](13-system-tests.md#manual-testing).
the [system tests guide](15-system-tests.md#manual-testing).

A unit test is any test that runs entirely in-memory without any I/O operations (eg. file system or network calls).
If you configure your IDE to run all unit tests at once, they should finish in less than a minute. The unit of a test
Expand All @@ -219,7 +219,7 @@ should be a particular behaviour or scenario, rather than eg. a specific method.
A system test is a test that works with a deployed instance of Sleeper. These can be found in the
module `system-test/system-test-suite`. They use the class `SleeperSystemTest` as the entry point to work with an
instance of Sleeper. This is the acceptance test suite we use to define releasability of the system. This is documented
in the [system tests guide](13-system-tests.md#acceptance-tests). If you add a new feature, please add one or two simple
in the [system tests guide](15-system-tests.md#acceptance-tests). If you add a new feature, please add one or two simple
cases to this test suite, as a complement to more detailed unit testing.

An integration test is any test which does not meet the definition of a unit test or a system test. Usually it uses
Expand All @@ -229,7 +229,7 @@ Unit tests should be in a class ending with Test, like MyFeatureTest. Integratio
IT, like MyFeatureIT. Classes named this way will be picked up by Maven's Surefire plugin for unit tests, and Failsafe
for integration tests. System tests should be in a class ending with ST, like CompactionPerformanceST, and must be
tagged with the annotation `SystemTest`. This means they will only be run as part of a system test suite, or directly.
See the [system tests guide](13-system-tests.md#acceptance-tests).
See the [system tests guide](15-system-tests.md#acceptance-tests).

We avoid mocking wherever possible, and prefer to use test fakes, eg. implement an interface to a database with a
wrapper around a HashMap. Use test helper methods to make tests as readable as possible, and as close as possible to a
Expand Down Expand Up @@ -275,4 +275,4 @@ See the [deployment guide](02-deployment-guide.md) for notes on how to deploy Sl

## Release Process

See the [release process guide](14-release-process.md) for instructions on how to publish a release of Sleeper.
See the [release process guide](16-release-process.md) for instructions on how to publish a release of Sleeper.
File renamed without changes.
File renamed without changes.
File renamed without changes.
6 changes: 3 additions & 3 deletions docs/15-release-process.md → docs/16-release-process.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ The following steps explain how to prepare and publish a release for Sleeper, by

1. Update CHANGELOG.md with a summary of the issues fixed and improvements made in this version.

2. Update the [roadmap](16-roadmap.md) and remove any planned features that have been implemented in this release.
2. Update the [roadmap](18-roadmap.md) and remove any planned features that have been implemented in this release.

3. Make sure the [NOTICES](../NOTICES) file is up to date, particularly from any version changes made by Dependabot.

Expand All @@ -23,9 +23,9 @@ VERSION=0.12.0
7. Get the performance figures from the nightly system tests.

There should be a cron job configured to run these nightly. Running it manually and retrieving the results is documented
in the [system tests guide](13-system-tests.md#nightly-test-scripts).
in the [system tests guide](15-system-tests.md#nightly-test-scripts).

Update the performance figures in the [system tests guide](13-system-tests.md#performance-benchmarks).
Update the performance figures in the [system tests guide](15-system-tests.md#performance-benchmarks).

8. Run a deployment of the deployAll system test to test the functionality of the system. Note that it is best to
provide a fresh instance ID that has not been used before:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Common problems and their solutions
===================================

These instructions will assume you start in the project root directory and Sleeper has been built
(see [the developer guide](11-dev-guide.md) for how to set that up).
(see [the developer guide](12-dev-guide.md) for how to set that up).

## EOFException when using client classes

Expand Down
File renamed without changes.
26 changes: 26 additions & 0 deletions java/bulk-export/bulk-export-core/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Copyright 2022-2024 Crown Copyright
~
~ Licensed under the Apache License, Version 2.0 (the "License");
~ you may not use this file except in compliance with the License.
~ You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<groupId>sleeper</groupId>
<artifactId>bulk-export</artifactId>
<version>0.26.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>bulk-export-core</artifactId>
</project>
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
/*
* Copyright 2022-2024 Crown Copyright
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package sleeper.bulkexport.configuration;

/**
* The configuration for a bulk export.
*/
public class BulkExportPlatformSpec {

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
/*
* Copyright 2022-2024 Crown Copyright
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package sleeper.bulkexport.job;

/**
* The Bulk export job.
*/
public class BulkExportJob {

}
35 changes: 35 additions & 0 deletions java/bulk-export/bulk-export-runner/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Copyright 2022-2024 Crown Copyright
~
~ Licensed under the Apache License, Version 2.0 (the "License");
~ you may not use this file except in compliance with the License.
~ You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<groupId>sleeper</groupId>
<artifactId>bulk-export</artifactId>
<version>0.26.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>bulk-export-runner</artifactId>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
/*
* Copyright 2022-2024 Crown Copyright
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package sleeper.bulkexport;

/**
* Bulk export job runner.
*/
public class BulkExportJobRunner {

}
26 changes: 26 additions & 0 deletions java/bulk-export/bulk-export-starter/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Copyright 2022-2024 Crown Copyright
~
~ Licensed under the Apache License, Version 2.0 (the "License");
~ you may not use this file except in compliance with the License.
~ You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<groupId>sleeper</groupId>
<artifactId>bulk-export</artifactId>
<version>0.26.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<artifactId>bulk-export-starter</artifactId>
</project>
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
/*
* Copyright 2022-2024 Crown Copyright
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package sleeper.bulkexport;

/**
* Lambda to start the bulk export job.
*/
public class BulkExportStarterLambda {

}
34 changes: 34 additions & 0 deletions java/bulk-export/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
~ Copyright 2022-2024 Crown Copyright
~
~ Licensed under the Apache License, Version 2.0 (the "License");
~ you may not use this file except in compliance with the License.
~ You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<parent>
<artifactId>aws</artifactId>
<groupId>sleeper</groupId>
<version>0.26.0-SNAPSHOT</version>
</parent>

<packaging>pom</packaging>
<modelVersion>4.0.0</modelVersion>
<artifactId>bulk-export</artifactId>

<modules>
<module>bulk-export-core</module>
<module>bulk-export-runner</module>
<module>bulk-export-starter</module>
</modules>
</project>
Loading
Loading