Import/Export system #1272

srindom · 2022-03-29T09:33:40Z

srindom
Mar 29, 2022
Maintainer

Goal

User stories

As a user, I want to upload batches of data to create/update in Medusa
As a user, I want to extract batches of data from Medusa for further usage in other systems

Requirements

User should be able to start an import/export without having to wait for the entire operation to finish
The import/export should have minimal impact on the client side processing
Exported data should be “long-lived” i.e. an export file should be available for download multiple times/across multiple sessions. Potentially, with a config option that defines how long a file is available for download. Default: 48h?
Export/Import should initially be done in a CSV/Excel file format, but let’s investigate if we can make this generic enough to handle other file formats in the future: e.g. JSON.
Download files should be protected.

Proposed flow

Import (consider Products for the purpose of the example)

User uploads a CSV file containing the products that the user wishes to create/update in the backend.
The server stores the CSV file in a temporary location for further processing and records that a processing job has to be initiated.
The processing job streams data from the temporary location and incrementally parses it to a format that can be used for the final creation.
The processing job can continuously be polled to get the progress of the upload/parse status.
Once the job has completed the poll will respond with some summary data: e.g. the ids of the products that were created.
(Potentially make it possible to do this as a two-step process: e.g. POST /admin/batch { dry_run: true } and then POST /admin/batch/:id/complete . The dry run would be able to provide some feedback to the user about what will eventually be uploaded, so that we can seek final confirmation from the user)

Export

User applies filters in the overview that they wish to export from
User asks for an export of the data to start. This records that a processing job will have to be initiated.
The processing job will start parsing data as per the filters and stream the parsed data to a file storage location. E.g. local file system/S3/etc.
The processing job can continuously be polled to get the progress of the export status.
Once the processing job is complete the poll data will respond with a link to where the data can be downloaded.

DB Model

BatchJob
- status - created, processing, awaiting_confirmation, completed
- type - product_export, product_import, ...
- context - e.g. object containing product filters
- result - json blob of the resulting summary. Can e.g. contain a download_url
- created_by - the id of the user that created the job

🚨 Considerations

The result field may be handled better. If we introduce a File entity we could for example create a relation between a result_file and the BatchJob.
result could contain
- download url for exported file
- summary of dry_run status
- progress

GET /admin/batch/:id

const { batch } = request()

batch.result.download_url <-- type: "product_export"

API

Look into requirements for a WebSocket/SSE approach
- PostgresQL - NOTIFY?
- Emit batch.completed

Create a batch operation

Creates a batch operation. The type of the batch operation determines what should be included in the context. If the batch job is created with dry_run: true final confirmation through /batch/:id/complete will be required before the final data is uploaded to the DB.

POST /admin/batch

Body
- type
- context
- dry_run - true/false
- (potentially a location for a import file)

Response
- batch_job - { id, status ... }

🚨 Considerations

Check if it is possible to upload a file AND give json data - don’t think it is possible so in the case of an import I think we might need to first upload the file and then give the file location as part of the batch job creation.

Get a batch operation

Gets the BatchJob. This endpoint may be used for polling the status of a batch operation. To retrieve the BatchJob with id the authenticated user must be the user identified by created_by.

GET /admin/batch/:id

Response
- batch_job - { id, status, progress, ... }

List batch operations

Lists the BatchJobs created by a user.

GET /admin/batch

Response
- count
- batch_jobs - { id, status, progress, ... }

Cancel a batch operation

Cancels an operation that is in progress.

POST /admin/batch/:id/cancel

Complete a batch operation

Completes a previously dry_run'ed job.

POST /admin/batch/:id/complete

Business Logic

Rough sketch of the architecture

https://www.figma.com/file/HnGt26GuxefVkqcq1yUEoo/Import%2FExport-architecture?node-id=0%3A1

Product Import/Export Format

// pre-confirmation.json

[
  { operation: "create", type: "product", data: { .... },
  { operation: "create", type: "product-variant", data: { ... }
]

// ON CONFIRM

await Promise.all(preconfirmops.map((op) => {
  switch(op.operation) 
    case "create":
      productService.create(op.data)
      break
    case "update":
      productVariantService.update(op.existing_id, op.data)
}))

Here is an example of what the default product import template will look like: https://medusajs.notion.site/Default-Product-Import-template-40ceb2271c8d47d1ae1aee7e3ec3debb

When importing, we merge the “product columns” to form the data needed for the product part of the import and we then create multiple variants.

Some fields are required and some are optional; note that the column width is dynamic as there may be one or more product options/images/prices. We should potentially match the product columns by name. When exporting we should dynamically figure out how many columns to include in order to include all the data.

It should be possible to use the above format for updating existing data, matching product data by “Product Handle” (or maybe “External ID”), and product variant data by SKU. If only updating variant data it should be allowed to just upload a file with the columns that should be updated. E.g. SKU, Inventory Quantity.

The required fields for a new product are:

handle, title
Other fields that have defaults if not provided:
- Status - draft
- Product Option 1 - Default Option
- Profile - Default Profile
- Discountable - true

The required fields for a new product variant are:

If product has product option then Variant Option Value must be defined
Other fields that have defaults if not provided:
- Title - concat of option values
- Allow Backorder - false
- Manage Inventory - true
- Variant Option Value 1 (if not defined on product) - Default Variant

BatchJobService

Events

batch.created
batch.canceled
batch.completed

Functions

create
cancel
list
retrieve
update
complete

Ideas

Create ProductImportHandler, ProductExportHandler, OrderExportHandler etc. the Awilix container is allowed to have exactly one of each *Handler installed and this is the one that will be used when a BatchJob is created. The service is identified by an identifier that corresponds to the BatchJob type - this will also allow custom batch job types/handlers from plugins etc.
The handler interface contains the following methods:

validateContext - this is used in the API controller to verify that the context param is valid.

processJob - this does the actual processing of the job. Should report back on progress of the operation.

completeJob - this performs the completion of the job. Will not be run if processJob has already moved the BatchJob to a complete status.
Use EventBusService logic to process jobs - e.g. eventBusService.emit("batch.product_import", { id: [batch job id] }).
Expand the current FileService API to include protectedUpload, protectedDownload, maybe also some methods that can take handle streaming.

Other ideas - not necessary, but worth keeping in mind

Would be nice if a background job worker could be deployed independently of the main server. E.g. if doing medusa serve --worker it would only spin up all the stuff necessary for processing event bus jobs. This would allow auto-scaling infra to handle big loads.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import/Export system #1272

{{title}}

Replies: 0 comments

Select a reply

Import/Export system #1272

srindom Mar 29, 2022 Maintainer

Goal

Proposed flow

DB Model

API

Create a batch operation

Get a batch operation

List batch operations

Cancel a batch operation

Complete a batch operation

Business Logic

Product Import/Export Format

BatchJobService

Events

Functions

Ideas

Other ideas - not necessary, but worth keeping in mind

Replies: 0 comments

srindom
Mar 29, 2022
Maintainer