You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is split off from #1393. This issue is to specify what's needed for a minimum viable version of bulk export.
Description
We want a user to be able to submit a request for an entire Sleeper table to be written out to Parquet files. There should be one output file per leaf partition. This file contains all data for that leaf partition in sorted order.
Analysis
There will be need to be sub issues for the different components of this. The following list describes some of the things that will need to be done:
A new optional stack called the BulkExportStack. This will need to contain a queue for the export request. This request will be picked up by a lambda, which will act similarly to the query planner, i.e. break the request up into sub-export requests, one for each leaf partition. We will then need an ECS cluster to run tasks to process these sub requests. The scaling up of tasks can happen in the same way we scale up tasks in other situations, e.g. compactions.
A container to receive messages from the queue and execute the job, i.e. performing a query for the whole leaf partition that will export all the data.
There will be other future improvements to this capability, such as the ability to specify additional filters to restrict the data that is returned, and execution of the output using DataFusion. But those will be added once the basic functionality exists.
Background
This is split off from #1393. This issue is to specify what's needed for a minimum viable version of bulk export.
Description
We want a user to be able to submit a request for an entire Sleeper table to be written out to Parquet files. There should be one output file per leaf partition. This file contains all data for that leaf partition in sorted order.
Analysis
There will be need to be sub issues for the different components of this. The following list describes some of the things that will need to be done:
There will be other future improvements to this capability, such as the ability to specify additional filters to restrict the data that is returned, and execution of the output using DataFusion. But those will be added once the basic functionality exists.
Sub tasks
The text was updated successfully, but these errors were encountered: