Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider multi part exports #383

Open
tokee opened this issue Jun 28, 2023 · 1 comment
Open

Consider multi part exports #383

tokee opened this issue Jun 28, 2023 · 1 comment
Labels
backend complex Complex issue that requires focus and/or in-depth knowledge enhancement

Comments

@tokee
Copy link
Contributor

tokee commented Jun 28, 2023

Exporting a WARC that takes up hundreds of gigabytes is unfeasible: Tool support is dubious and the risk of an aborted transfer due to timeouts is real.

As the export size of the individual parts of a WARC is approximately known, it should be possible to generate a list of download links, each resulting in a WARC of a given size, e.g. 1 gigabyte. This would require underlying support for exporting subsets of a result set as well as GUI support for providing such lists of download links. The situation where the user manually starts all the downloads at the same time should also be handled: If downloads are queued, some of the downloads are likely to timeout due to a long period with no activity. Possibly subsequent links could be inactive until the previous parts has been fully downloaded?

@tokee
Copy link
Contributor Author

tokee commented Jun 28, 2023

Addendum: Maybe the timeouts are not an issue as throttling takes place at the inner paging stage of the export so starting 20 concurrent downloads simply means 20 slowly trickling downloads instead of x active downloads and 20-x waiting downloads.

@tokee tokee added the complex Complex issue that requires focus and/or in-depth knowledge label Aug 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend complex Complex issue that requires focus and/or in-depth knowledge enhancement
Projects
None yet
Development

No branches or pull requests

2 participants