You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I have an issue that I am hoping you can help me with.
I am trying to archive a rather large site. Because the resulting archives quickly filled up my VPS' storage, I mounted an S3 space with s3fs. But when I run the scraper I get this error.
I tried it many times and it always causes the same error. The s3fs mount is stable during that time, so it isn't just a network disconnect. Do you have any ideas what could cause this issue?
The text was updated successfully, but these errors were encountered:
We haven't had a chance to test with s3fs, so can't really help there specifically. However, Browsertrix Crawler actually has native support for uploading to S3-compatible storage. For security, the S3 settings are only provided via environment variables.
You can set the --sizeLimit on the crawl, where it will upload to S3 and exit, and you can run it in a script that restarts the crawler in this way. (We use it this way in Browsertrix app with Kubernetes).
At this time, only upload WACZs is supported, and the WACZ upload should stream directly to S3, without requiring any additional local disk space. Hope this helps!
ikreymer
changed the title
Error received during crawl
s3fs IO errors (was: Error received during crawl)
Feb 20, 2025
Hi, I have an issue that I am hoping you can help me with.
I am trying to archive a rather large site. Because the resulting archives quickly filled up my VPS' storage, I mounted an S3 space with s3fs. But when I run the scraper I get this error.
I tried it many times and it always causes the same error. The s3fs mount is stable during that time, so it isn't just a network disconnect. Do you have any ideas what could cause this issue?
The text was updated successfully, but these errors were encountered: