diff --git a/.github/config/chunks.yaml b/.github/config/chunks.yaml
index ac00e824d9..2d6797ac94 100644
--- a/.github/config/chunks.yaml
+++ b/.github/config/chunks.yaml
@@ -46,6 +46,9 @@ chunks:
- splitter/splitter-core
- splitter/splitter-lambda
- garbage-collector
+ - bulk-export/bulk-export-core
+ - bulk-export/bulk-export-runner
+ - bulk-export/bulk-export-starter
rust:
name: Rust
workflow: chunk-rust.yaml
@@ -75,5 +78,3 @@ chunks:
- query/query-lambda
- athena
- trino
-
-
diff --git a/.github/workflows/chunk-compaction.yaml b/.github/workflows/chunk-compaction.yaml
index 2a248f6b98..b9eedb9617 100644
--- a/.github/workflows/chunk-compaction.yaml
+++ b/.github/workflows/chunk-compaction.yaml
@@ -33,6 +33,10 @@ on:
- 'java/common/dynamodb-tools/**'
- 'java/core/**'
- 'java/common/dynamodb-test/**'
+ - 'java/bulk-export/pom.xml'
+ - 'java/bulk-export/bulk-export-core/**'
+ - 'java/bulk-export/bulk-export-runner/**'
+ - 'java/bulk-export/bulk-export-starter/**'
jobs:
chunk-workflow:
diff --git a/docs/01-getting-started.md b/docs/01-getting-started.md
index b08df6a8f9..6fdec1a782 100644
--- a/docs/01-getting-started.md
+++ b/docs/01-getting-started.md
@@ -9,7 +9,7 @@ allow you to deploy an instance, ingest some files, and run reports and scripts
The quickest way to get an instance of Sleeper is to deploy to LocalStack in Docker on your local machine. Note that the
LocalStack version has very limited functionality in comparison to the AWS version, and can only handle small volumes of
-data. See the documentation on [deploying to localstack](10-deploy-to-localstack.md) for more information.
+data. See the documentation on [deploying to localstack](11-deploy-to-localstack.md) for more information.
## Deploy to AWS
@@ -72,7 +72,7 @@ Next, you'll need a VPC that is suitable for deploying Sleeper. You'll also want
avoid lengthy uploads of large jar files and Docker images. You can use the Sleeper CLI to create both of these.
If you prefer to use your own EC2, you'll need to build Sleeper there as described in
-the [developer guide](11-dev-guide.md). The EC2 should run on an x86_64 architecture. If you prefer to use your own VPC,
+the [developer guide](12-dev-guide.md). The EC2 should run on an x86_64 architecture. If you prefer to use your own VPC,
you'll need to ensure it meets Sleeper's requirements. Deployment of an EC2 to an existing VPC is documented in
the [deployment guide](02-deployment-guide.md#managing-environments).
diff --git a/docs/02-deployment-guide.md b/docs/02-deployment-guide.md
index c724b5389d..3fd3a7e446 100644
--- a/docs/02-deployment-guide.md
+++ b/docs/02-deployment-guide.md
@@ -23,7 +23,7 @@ cd sleeper # Change directory to the root of the Git repository
If you used the system test deployment described in the getting started guide, you will have already built Sleeper.
To build Sleeper locally to interact with an instance from elsewhere, you can follow the instructions in
-the [developer guide](11-dev-guide.md#install-prerequisite-software).
+the [developer guide](12-dev-guide.md#install-prerequisite-software).
### Configure AWS
diff --git a/docs/04-tables.md b/docs/04-tables.md
index 605cb1b287..9b9a14e362 100644
--- a/docs/04-tables.md
+++ b/docs/04-tables.md
@@ -4,7 +4,7 @@ Tables
A Sleeper instance contains one or more tables. Each table has four important
properties: a name, a schema for storing data for that table, a state store for
storing metadata about the table, and a flag to denote whether the table is
-online or not (see [here](12-design.md#Tables) for more information about
+online or not (see [here](14-design.md#Tables) for more information about
online tables). All resources for the instance, such as the S3 bucket used for
storing data in a table, ECS clusters and lambda functions are shared across
all the tables.
diff --git a/docs/05-ingest.md b/docs/05-ingest.md
index af968e6a38..b965ff2a4f 100644
--- a/docs/05-ingest.md
+++ b/docs/05-ingest.md
@@ -83,7 +83,7 @@ with columns matching the fields in your schema (note that the fields in the sch
all need to be non-optional).
Note that the descriptions below describe how data in Parquet files can be ingested by sending ingest job
-definitions in JSON form to SQS queues. In practice it may be easier to use the [Python API](08-python-api.md).
+definitions in JSON form to SQS queues. In practice it may be easier to use the [Python API](09-python-api.md).
When you have the data you want to ingest stored in Parquet files, a message should be sent
to Sleeper's ingest queue telling it that the data should be ingested. This message should have the following form:
diff --git a/docs/07-data-retrieval.md b/docs/07-data-retrieval.md
index f5cd71f83c..ab9ddd156f 100644
--- a/docs/07-data-retrieval.md
+++ b/docs/07-data-retrieval.md
@@ -7,11 +7,11 @@ should either specify all of the row key fields, or the first one or more fields
fields, key1, key2 and key3, then a query should specify either ranges for key1, key2 and key3, or ranges for key1 and
key2, or ranges for key1.
-The methods below describe how queries can be executed using scripts. See the docs on the [Python API](08-python-api.md)
+The methods below describe how queries can be executed using scripts. See the docs on the [Python API](09-python-api.md)
for details of how to execute them from Python.
These instructions will assume you start in the project root directory and Sleeper has been built
-(see [the developer guide](11-dev-guide.md) for how to set that up).
+(see [the developer guide](12-dev-guide.md) for how to set that up).
## Running queries directly using the Java client
diff --git a/docs/08-export.md b/docs/08-export.md
new file mode 100644
index 0000000000..33b4e7a3a6
--- /dev/null
+++ b/docs/08-export.md
@@ -0,0 +1,5 @@
+Exporting data
+==============
+
+## Introduction
+Feature coming soon.
\ No newline at end of file
diff --git a/docs/08-python-api.md b/docs/09-python-api.md
similarity index 100%
rename from docs/08-python-api.md
rename to docs/09-python-api.md
diff --git a/docs/09-trino.md b/docs/10-trino.md
similarity index 100%
rename from docs/09-trino.md
rename to docs/10-trino.md
diff --git a/docs/10-deploy-to-localstack.md b/docs/11-deploy-to-localstack.md
similarity index 98%
rename from docs/10-deploy-to-localstack.md
rename to docs/11-deploy-to-localstack.md
index e9c5542f5c..79fd522500 100644
--- a/docs/10-deploy-to-localstack.md
+++ b/docs/11-deploy-to-localstack.md
@@ -6,7 +6,7 @@ functionality and will only work with small volumes of data, but will allow you
ingest, and run reports and scripts against the instance.
These instructions will assume you start in the project root directory and Sleeper has been built
-(see [the developer guide](11-dev-guide.md) for how to set that up).
+(see [the developer guide](12-dev-guide.md) for how to set that up).
## Launch LocalStack container
diff --git a/docs/11-dev-guide.md b/docs/12-dev-guide.md
similarity index 97%
rename from docs/11-dev-guide.md
rename to docs/12-dev-guide.md
index 2a35943a23..a49c280dc6 100644
--- a/docs/11-dev-guide.md
+++ b/docs/12-dev-guide.md
@@ -135,7 +135,7 @@ mvn clean install -Pquick -DskipRust=true
## Using the codebase
-The codebase is structured around the components explained in the [design document](12-design.md). The elements of the
+The codebase is structured around the components explained in the [design document](14-design.md). The elements of the
design largely correspond to Maven modules. Core or common modules contain shared model code. Other modules contain
integrations with libraries which are not needed by all components of the system, eg. AWS API clients.
@@ -210,7 +210,7 @@ public void process(String foo, String bar) {
The Maven project includes unit tests, integration tests and system tests. We use JUnit 5, with AssertJ for assertions.
We also have a setup for manual testing against a deployed instance of Sleeper, documented in
-the [system tests guide](13-system-tests.md#manual-testing).
+the [system tests guide](15-system-tests.md#manual-testing).
A unit test is any test that runs entirely in-memory without any I/O operations (eg. file system or network calls).
If you configure your IDE to run all unit tests at once, they should finish in less than a minute. The unit of a test
@@ -219,7 +219,7 @@ should be a particular behaviour or scenario, rather than eg. a specific method.
A system test is a test that works with a deployed instance of Sleeper. These can be found in the
module `system-test/system-test-suite`. They use the class `SleeperSystemTest` as the entry point to work with an
instance of Sleeper. This is the acceptance test suite we use to define releasability of the system. This is documented
-in the [system tests guide](13-system-tests.md#acceptance-tests). If you add a new feature, please add one or two simple
+in the [system tests guide](15-system-tests.md#acceptance-tests). If you add a new feature, please add one or two simple
cases to this test suite, as a complement to more detailed unit testing.
An integration test is any test which does not meet the definition of a unit test or a system test. Usually it uses
@@ -229,7 +229,7 @@ Unit tests should be in a class ending with Test, like MyFeatureTest. Integratio
IT, like MyFeatureIT. Classes named this way will be picked up by Maven's Surefire plugin for unit tests, and Failsafe
for integration tests. System tests should be in a class ending with ST, like CompactionPerformanceST, and must be
tagged with the annotation `SystemTest`. This means they will only be run as part of a system test suite, or directly.
-See the [system tests guide](13-system-tests.md#acceptance-tests).
+See the [system tests guide](15-system-tests.md#acceptance-tests).
We avoid mocking wherever possible, and prefer to use test fakes, eg. implement an interface to a database with a
wrapper around a HashMap. Use test helper methods to make tests as readable as possible, and as close as possible to a
@@ -275,4 +275,4 @@ See the [deployment guide](02-deployment-guide.md) for notes on how to deploy Sl
## Release Process
-See the [release process guide](14-release-process.md) for instructions on how to publish a release of Sleeper.
+See the [release process guide](16-release-process.md) for instructions on how to publish a release of Sleeper.
diff --git a/docs/12-dependency-conflicts.md b/docs/13-dependency-conflicts.md
similarity index 100%
rename from docs/12-dependency-conflicts.md
rename to docs/13-dependency-conflicts.md
diff --git a/docs/13-design.md b/docs/14-design.md
similarity index 100%
rename from docs/13-design.md
rename to docs/14-design.md
diff --git a/docs/14-system-tests.md b/docs/15-system-tests.md
similarity index 100%
rename from docs/14-system-tests.md
rename to docs/15-system-tests.md
diff --git a/docs/15-release-process.md b/docs/16-release-process.md
similarity index 93%
rename from docs/15-release-process.md
rename to docs/16-release-process.md
index f9b9ce6682..52ace783eb 100644
--- a/docs/15-release-process.md
+++ b/docs/16-release-process.md
@@ -5,7 +5,7 @@ The following steps explain how to prepare and publish a release for Sleeper, by
1. Update CHANGELOG.md with a summary of the issues fixed and improvements made in this version.
-2. Update the [roadmap](16-roadmap.md) and remove any planned features that have been implemented in this release.
+2. Update the [roadmap](18-roadmap.md) and remove any planned features that have been implemented in this release.
3. Make sure the [NOTICES](../NOTICES) file is up to date, particularly from any version changes made by Dependabot.
@@ -23,9 +23,9 @@ VERSION=0.12.0
7. Get the performance figures from the nightly system tests.
There should be a cron job configured to run these nightly. Running it manually and retrieving the results is documented
-in the [system tests guide](13-system-tests.md#nightly-test-scripts).
+in the [system tests guide](15-system-tests.md#nightly-test-scripts).
-Update the performance figures in the [system tests guide](13-system-tests.md#performance-benchmarks).
+Update the performance figures in the [system tests guide](15-system-tests.md#performance-benchmarks).
8. Run a deployment of the deployAll system test to test the functionality of the system. Note that it is best to
provide a fresh instance ID that has not been used before:
diff --git a/docs/16-common-problems-and-their-solutions.md b/docs/17-common-problems-and-their-solutions.md
similarity index 99%
rename from docs/16-common-problems-and-their-solutions.md
rename to docs/17-common-problems-and-their-solutions.md
index 122cb0f21c..8f6256ce93 100644
--- a/docs/16-common-problems-and-their-solutions.md
+++ b/docs/17-common-problems-and-their-solutions.md
@@ -2,7 +2,7 @@ Common problems and their solutions
===================================
These instructions will assume you start in the project root directory and Sleeper has been built
-(see [the developer guide](11-dev-guide.md) for how to set that up).
+(see [the developer guide](12-dev-guide.md) for how to set that up).
## EOFException when using client classes
diff --git a/docs/17-roadmap.md b/docs/18-roadmap.md
similarity index 100%
rename from docs/17-roadmap.md
rename to docs/18-roadmap.md
diff --git a/java/bulk-export/bulk-export-core/pom.xml b/java/bulk-export/bulk-export-core/pom.xml
new file mode 100644
index 0000000000..80a5e140eb
--- /dev/null
+++ b/java/bulk-export/bulk-export-core/pom.xml
@@ -0,0 +1,26 @@
+
+
+
+
+ sleeper
+ bulk-export
+ 0.26.0-SNAPSHOT
+
+ 4.0.0
+ bulk-export-core
+
\ No newline at end of file
diff --git a/java/bulk-export/bulk-export-core/src/main/java/sleeper/bulkexport/configuration/BulkExportPlatformSpec.java b/java/bulk-export/bulk-export-core/src/main/java/sleeper/bulkexport/configuration/BulkExportPlatformSpec.java
new file mode 100644
index 0000000000..72201260e2
--- /dev/null
+++ b/java/bulk-export/bulk-export-core/src/main/java/sleeper/bulkexport/configuration/BulkExportPlatformSpec.java
@@ -0,0 +1,23 @@
+/*
+ * Copyright 2022-2024 Crown Copyright
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package sleeper.bulkexport.configuration;
+
+/**
+ * The configuration for a bulk export.
+ */
+public class BulkExportPlatformSpec {
+
+}
diff --git a/java/bulk-export/bulk-export-core/src/main/java/sleeper/bulkexport/job/BulkExportJob.java b/java/bulk-export/bulk-export-core/src/main/java/sleeper/bulkexport/job/BulkExportJob.java
new file mode 100644
index 0000000000..75d63b43f6
--- /dev/null
+++ b/java/bulk-export/bulk-export-core/src/main/java/sleeper/bulkexport/job/BulkExportJob.java
@@ -0,0 +1,23 @@
+/*
+ * Copyright 2022-2024 Crown Copyright
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package sleeper.bulkexport.job;
+
+/**
+ * The Bulk export job.
+ */
+public class BulkExportJob {
+
+}
diff --git a/java/bulk-export/bulk-export-runner/pom.xml b/java/bulk-export/bulk-export-runner/pom.xml
new file mode 100644
index 0000000000..eb55c7d2fb
--- /dev/null
+++ b/java/bulk-export/bulk-export-runner/pom.xml
@@ -0,0 +1,35 @@
+
+
+
+
+ sleeper
+ bulk-export
+ 0.26.0-SNAPSHOT
+
+ 4.0.0
+ bulk-export-runner
+
+
+
+
+ org.apache.maven.plugins
+ maven-shade-plugin
+
+
+
+
\ No newline at end of file
diff --git a/java/bulk-export/bulk-export-runner/src/main/java/sleeper/bulkexport/BulkExportJobRunner.java b/java/bulk-export/bulk-export-runner/src/main/java/sleeper/bulkexport/BulkExportJobRunner.java
new file mode 100644
index 0000000000..7052c74952
--- /dev/null
+++ b/java/bulk-export/bulk-export-runner/src/main/java/sleeper/bulkexport/BulkExportJobRunner.java
@@ -0,0 +1,23 @@
+/*
+ * Copyright 2022-2024 Crown Copyright
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package sleeper.bulkexport;
+
+/**
+ * Bulk export job runner.
+ */
+public class BulkExportJobRunner {
+
+}
diff --git a/java/bulk-export/bulk-export-starter/pom.xml b/java/bulk-export/bulk-export-starter/pom.xml
new file mode 100644
index 0000000000..0dd4bcb40c
--- /dev/null
+++ b/java/bulk-export/bulk-export-starter/pom.xml
@@ -0,0 +1,26 @@
+
+
+
+
+ sleeper
+ bulk-export
+ 0.26.0-SNAPSHOT
+
+ 4.0.0
+ bulk-export-starter
+
\ No newline at end of file
diff --git a/java/bulk-export/bulk-export-starter/src/main/java/sleeper/bulkexport/BulkExportStarterLambda.java b/java/bulk-export/bulk-export-starter/src/main/java/sleeper/bulkexport/BulkExportStarterLambda.java
new file mode 100644
index 0000000000..335ba2f2b9
--- /dev/null
+++ b/java/bulk-export/bulk-export-starter/src/main/java/sleeper/bulkexport/BulkExportStarterLambda.java
@@ -0,0 +1,23 @@
+/*
+ * Copyright 2022-2024 Crown Copyright
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package sleeper.bulkexport;
+
+/**
+ * Lambda to start the bulk export job.
+ */
+public class BulkExportStarterLambda {
+
+}
diff --git a/java/bulk-export/pom.xml b/java/bulk-export/pom.xml
new file mode 100644
index 0000000000..aa8076671d
--- /dev/null
+++ b/java/bulk-export/pom.xml
@@ -0,0 +1,34 @@
+
+
+
+
+ aws
+ sleeper
+ 0.26.0-SNAPSHOT
+
+
+ pom
+ 4.0.0
+ bulk-export
+
+
+ bulk-export-core
+ bulk-export-runner
+ bulk-export-starter
+
+
diff --git a/java/pom.xml b/java/pom.xml
index 5fcf24701c..a13c7220a0 100644
--- a/java/pom.xml
+++ b/java/pom.xml
@@ -42,6 +42,7 @@
athena
common
bulk-import
+ bulk-export
cdk-custom-resources
cdk-environment
metrics