You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A major use case of mine for this workflow is to index files in non-tower buckets that are attached to Synapse. Using a workflow like this is much more time efficient and less babysitting than indexing using on a single EC2 instance, especially as datasets get large (>1TB).
However, occasionally I've had to re-run this workflow multiple times on the same bucket when additional data has been added. This means the entire bucket gets re-downloaded and indexed when running the workflow. It would be helpful to have one or both of the following features to make this more time and cost efficient:
add a modified / created parameter that skips re-indexing any files that are prior to the date entered in the param
add an option to skip any S3 keys that are already in the target synapse project/folder
The text was updated successfully, but these errors were encountered:
A major use case of mine for this workflow is to index files in non-tower buckets that are attached to Synapse. Using a workflow like this is much more time efficient and less babysitting than indexing using on a single EC2 instance, especially as datasets get large (>1TB).
However, occasionally I've had to re-run this workflow multiple times on the same bucket when additional data has been added. This means the entire bucket gets re-downloaded and indexed when running the workflow. It would be helpful to have one or both of the following features to make this more time and cost efficient:
The text was updated successfully, but these errors were encountered: