You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a user, I want to upload batches of data to create/update in Medusa
As a user, I want to extract batches of data from Medusa for further usage in other systems
Requirements
User should be able to start an import/export without having to wait for the entire operation to finish
The import/export should have minimal impact on the client side processing
Exported data should be “long-lived” i.e. an export file should be available for download multiple times/across multiple sessions. Potentially, with a config option that defines how long a file is available for download. Default: 48h?
Export/Import should initially be done in a CSV/Excel file format, but let’s investigate if we can make this generic enough to handle other file formats in the future: e.g. JSON.
Download files should be protected.
Proposed flow
Import (consider Products for the purpose of the example)
User uploads a CSV file containing the products that the user wishes to create/update in the backend.
The server stores the CSV file in a temporary location for further processing and records that a processing job has to be initiated.
The processing job streams data from the temporary location and incrementally parses it to a format that can be used for the final creation.
The processing job can continuously be polled to get the progress of the upload/parse status.
Once the job has completed the poll will respond with some summary data: e.g. the ids of the products that were created.
(Potentially make it possible to do this as a two-step process: e.g. POST /admin/batch { dry_run: true } and then POST /admin/batch/:id/complete . The dry run would be able to provide some feedback to the user about what will eventually be uploaded, so that we can seek final confirmation from the user)
Export
User applies filters in the overview that they wish to export from
User asks for an export of the data to start. This records that a processing job will have to be initiated.
The processing job will start parsing data as per the filters and stream the parsed data to a file storage location. E.g. local file system/S3/etc.
The processing job can continuously be polled to get the progress of the export status.
Once the processing job is complete the poll data will respond with a link to where the data can be downloaded.
Look into requirements for a WebSocket/SSE approach
PostgresQL - NOTIFY?
Emit batch.completed
Create a batch operation
Creates a batch operation. The type of the batch operation determines what should be included in the context. If the batch job is created with dry_run: true final confirmation through /batch/:id/complete will be required before the final data is uploaded to the DB.
POST/admin/batchBody-type-context-dry_run-true/false-(potentiallyalocationforaimportfile)Response-batch_job-{ id, status ... }
🚨 Considerations
Check if it is possible to upload a file AND give json data - don’t think it is possible so in the case of an import I think we might need to first upload the file and then give the file location as part of the batch job creation.
Get a batch operation
Gets the BatchJob. This endpoint may be used for polling the status of a batch operation. To retrieve the BatchJob with id the authenticated user must be the user identified by created_by.
When importing, we merge the “product columns” to form the data needed for the product part of the import and we then create multiple variants.
Some fields are required and some are optional; note that the column width is dynamic as there may be one or more product options/images/prices. We should potentially match the product columns by name. When exporting we should dynamically figure out how many columns to include in order to include all the data.
It should be possible to use the above format for updating existing data, matching product data by “Product Handle” (or maybe “External ID”), and product variant data by SKU. If only updating variant data it should be allowed to just upload a file with the columns that should be updated. E.g. SKU, Inventory Quantity.
The required fields for a new product are:
handle, title
Other fields that have defaults if not provided:
Status - draft
Product Option 1 - Default Option
Profile - Default Profile
Discountable - true
The required fields for a new product variant are:
If product has product option then Variant Option Value must be defined
Other fields that have defaults if not provided:
Title - concat of option values
Allow Backorder - false
Manage Inventory - true
Variant Option Value 1 (if not defined on product) - Default Variant
BatchJobService
Events
batch.created
batch.canceled
batch.completed
Functions
create
cancel
list
retrieve
update
complete
Ideas
Create ProductImportHandler, ProductExportHandler, OrderExportHandler etc. the Awilix container is allowed to have exactly one of each *Handler installed and this is the one that will be used when a BatchJob is created. The service is identified by an identifier that corresponds to the BatchJob type - this will also allow custom batch job types/handlers from plugins etc.
The handler interface contains the following methods:
validateContext - this is used in the API controller to verify that the context param is valid.
processJob - this does the actual processing of the job. Should report back on progress of the operation.
completeJob - this performs the completion of the job. Will not be run if processJob has already moved the BatchJob to a complete status.
Use EventBusService logic to process jobs - e.g. eventBusService.emit("batch.product_import", { id: [batch job id] }).
Expand the current FileService API to include protectedUpload, protectedDownload, maybe also some methods that can take handle streaming.
Other ideas - not necessary, but worth keeping in mind
Would be nice if a background job worker could be deployed independently of the main server. E.g. if doing medusa serve --worker it would only spin up all the stuff necessary for processing event bus jobs. This would allow auto-scaling infra to handle big loads.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Goal
User stories
Requirements
Proposed flow
Import (consider Products for the purpose of the example)
POST /admin/batch { dry_run: true }
and thenPOST /admin/batch/:id/complete
. The dry run would be able to provide some feedback to the user about what will eventually be uploaded, so that we can seek final confirmation from the user)Export
DB Model
🚨 Considerations
result
field may be handled better. If we introduce a File entity we could for example create a relation between aresult_file
and the BatchJob.result
could containdry_run
statusGET /admin/batch/:id
API
NOTIFY
?batch.completed
Create a batch operation
Creates a batch operation. The type of the batch operation determines what should be included in the context. If the batch job is created with
dry_run: true
final confirmation through/batch/:id/complete
will be required before the final data is uploaded to the DB.🚨 Considerations
Get a batch operation
Gets the BatchJob. This endpoint may be used for polling the status of a batch operation. To retrieve the BatchJob with
id
the authenticated user must be the user identified bycreated_by
.List batch operations
Lists the BatchJobs created by a user.
Cancel a batch operation
Cancels an operation that is in progress.
Complete a batch operation
Completes a previously
dry_run
'ed job.Business Logic
Rough sketch of the architecture
https://www.figma.com/file/HnGt26GuxefVkqcq1yUEoo/Import%2FExport-architecture?node-id=0%3A1
Product Import/Export Format
Here is an example of what the default product import template will look like: https://medusajs.notion.site/Default-Product-Import-template-40ceb2271c8d47d1ae1aee7e3ec3debb
When importing, we merge the “product columns” to form the data needed for the product part of the import and we then create multiple variants.
Some fields are required and some are optional; note that the column width is dynamic as there may be one or more product options/images/prices. We should potentially match the product columns by name. When exporting we should dynamically figure out how many columns to include in order to include all the data.
It should be possible to use the above format for updating existing data, matching product data by “Product Handle” (or maybe “External ID”), and product variant data by SKU. If only updating variant data it should be allowed to just upload a file with the columns that should be updated. E.g. SKU, Inventory Quantity.
The required fields for a new product are:
The required fields for a new product variant are:
BatchJobService
Events
batch.created
batch.canceled
batch.completed
Functions
create
cancel
list
retrieve
update
complete
Ideas
Create
ProductImportHandler
,ProductExportHandler
,OrderExportHandler
etc. the Awilix container is allowed to have exactly one of each*Handler
installed and this is the one that will be used when a BatchJob is created. The service is identified by anidentifier
that corresponds to the BatchJob type - this will also allow custom batch job types/handlers from plugins etc.The handler interface contains the following methods:
validateContext
- this is used in the API controller to verify that thecontext
param is valid.processJob
- this does the actual processing of the job. Should report back on progress of the operation.completeJob
- this performs the completion of the job. Will not be run ifprocessJob
has already moved the BatchJob to acomplete
status.Use
EventBusService
logic to process jobs - e.g.eventBusService.emit("batch.product_import", { id: [batch job id] })
.Expand the current FileService API to include
protectedUpload
,protectedDownload
, maybe also some methods that can take handle streaming.Other ideas - not necessary, but worth keeping in mind
medusa serve --worker
it would only spin up all the stuff necessary for processing event bus jobs. This would allow auto-scaling infra to handle big loads.Beta Was this translation helpful? Give feedback.
All reactions