-
-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use multithreaded YouTube downloader from zimscraperlib #301
Use multithreaded YouTube downloader from zimscraperlib #301
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #301 +/- ##
========================================
+ Coverage 1.52% 1.54% +0.01%
========================================
Files 11 11
Lines 1112 1102 -10
Branches 166 158 -8
========================================
Hits 17 17
+ Misses 1095 1085 -10 ☔ View full report in Codecov by Sentry. |
Changed reviewer as @benoit74 is coming back |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not working exactly as intended, when a video fails to download, it is not counted as failed and final ZIM is hence created with no video at all (in my test where I not-so-intentionally had an issue with youtube-dlp).
download_video
and download_thumbnail
are now returning a future, so one has to check its status after completion (not an easy feat tbh).
9d3e4e4
to
b1304aa
Compare
b1304aa
to
962b6b3
Compare
@benoit74 I've updated the code to handle errors correctly. It now properly returns both the
However, I realized that most ZIMs in zimfarm are downloaded using the S3 cache. With this approach, only the videos that are not in the S3 cache and are downloaded directly using Given that most downloads rely on the S3 cache, would it be better to keep the current approach of downloading in batches and using the WDYT? |
This is indeed a very important point! Current implementation allow concurrent download of both S3 cached videos and Youtube videos, and also reencoding. I'm sorry to not have noticed this before. I feel a bit ashamed to have wasted your time on this. Thank you for speaking up, it would have been a bad move indeed! I propose to close the issue and the PR as "non-relevant", WDYT? Or at least we should enhance the scraperlib, but this is not a small change to implement. |
No worries at all! I should have considered this earlier before moving forward with the implementation. I'm fine with closing this issue as "non-relevant." However, I agree that using the YouTube downloader from |
Close #118
Use newer multithreaded YouTube downloader from
zimscraperlib
to download videos, thumbnails and subtitles.Changes:
download_video_files_batch
method.download_video
,download_thumbnail
anddownload_subtitle
methods to use new multithreaded downloader from scraperlib.