You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using AWS Glue for processing data coming from Kinesis.
We currently use the library for executing jobs that, in other words, have an end where the library deletes the shuffles, but jobs such as streaming have no end and the library does not proceed to clean up these shuffle files that have come to occupy 1TB of S3 storage.
Is there any way to force the shuffle files cleaning in a point in the script or automatically clean the shuffles for each batch?
The text was updated successfully, but these errors were encountered:
I have no experience with Spark streaming, however this plugin cleans up the shuffle data automatically. It relies on the unregister shuffle API to clean up shuffle files. So I believe it should work in your case.
There's no way to force cleanup since this shuffle plugin does not have knowledge of the Spark DAG.
I am using AWS Glue for processing data coming from Kinesis.
We currently use the library for executing jobs that, in other words, have an end where the library deletes the shuffles, but jobs such as streaming have no end and the library does not proceed to clean up these shuffle files that have come to occupy 1TB of S3 storage.
Is there any way to force the shuffle files cleaning in a point in the script or automatically clean the shuffles for each batch?
The text was updated successfully, but these errors were encountered: