Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: youtube: Polymer UI and JSON endpoints for playlists #151

Merged
merged 5 commits into from
Nov 10, 2020
Merged

RFC: youtube: Polymer UI and JSON endpoints for playlists #151

merged 5 commits into from
Nov 10, 2020

Conversation

wlritchi
Copy link
Contributor

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

We already had a few copies of Polymer-style pagination handling logic for certain circumstances, but now we're forced into using it for all playlists since we can no longer disable Polymer. Refactor the logic to move it to the parent class for all entry lists (including e.g. search results, feeds, and list of playlists), and generify a bit to cover the child classes' use cases.

Fixes #148.

We already had a few copies of Polymer-style pagination handling logic
for certain circumstances, but now we're forced into using it for all
playlists since we can no longer disable Polymer. Refactor the logic to
move it to the parent class for all entry lists (including e.g. search
results, feeds, and list of playlists), and generify a bit to cover the
child classes' use cases.
@wlritchi
Copy link
Contributor Author

So far this is not tested for many use cases, including ones that we know will be affected (search, feeds, list of playlists). I've only tested it for downloading all of a channel's uploaded videos.

@someziggyman someziggyman mentioned this pull request Nov 10, 2020
8 tasks
@someziggyman
Copy link

So far this is not tested for many use cases, including ones that we know will be affected (search, feeds, list of playlists). I've only tested it for downloading all of a channel's uploaded videos.

Will test this fix with these IDs:
Regular playlist ID:
PLszW2az_oxFd7dFeCb1FFhk7c_eEer5n1
Mix playlist ID:
RDKR9wGi7gVLQ
Search playlist:
https://www.youtube.com/results?search_query=linkin+park+numb
And channel playlist:
https://www.youtube.com/user/TheLinuxFoundation/playlists

We now have two fixes released almost at the same time
#150

@wlritchi
Copy link
Contributor Author

wlritchi commented Nov 10, 2020

See also #150 (edit: some already have while I was typing this, apparently) which appears to be doing essentially the same thing. There's a few things it does better than this one, including noticing that the base extractor already has a _get_yt_initial_data method and using that, rather than borrowing the copy from the search implementation. But I'm out of cycles for the night. Edits welcome!

@blackjack4494
Copy link
Owner

@wlritchi doesn't work with
python3 -m youtube_dlc "https://www.youtube.com/results?search_query=linkin+park+numb" -F -v
it will throw out error

  File "D:\gitkraken\yt-dlc\youtube_dlc\__main__.py", line 19, in <module>
    youtube_dlc.main()
  File "D:\gitkraken\yt-dlc\youtube_dlc\__init__.py", line 487, in main
    _real_main(argv)
  File "D:\gitkraken\yt-dlc\youtube_dlc\__init__.py", line 477, in _real_main
    retcode = ydl.download(all_urls)
  File "D:\gitkraken\yt-dlc\youtube_dlc\YoutubeDL.py", line 2104, in download
    url, force_generic_extractor=self.params.get('force_generic_extractor', False))
  File "D:\gitkraken\yt-dlc\youtube_dlc\YoutubeDL.py", line 830, in extract_info
    ie_result = ie.extract(url)
  File "D:\gitkraken\yt-dlc\youtube_dlc\extractor\common.py", line 532, in extract
    ie_result = self._real_extract(url)
  File "D:\gitkraken\yt-dlc\youtube_dlc\extractor\youtube.py", line 3450, in _real_extract
    data_json = self._process_initial_data(webpage)
AttributeError: 'YoutubeSearchURLIE' object has no attribute '_process_initial_data'

@blackjack4494 blackjack4494 marked this pull request as ready for review November 10, 2020 22:21
@blackjack4494 blackjack4494 self-requested a review November 10, 2020 22:21
@wlritchi
Copy link
Contributor Author

Ah, yup, that's definitely an artifact left over from refactoring. I had intended to let search just process the initial data without requesting any further pages (which I think matches what it did before), but specifying the number of pages to fetch is probably more useful. I think your change as presented above will cap all playlist downloads to the first page, though.

@blackjack4494
Copy link
Owner

blackjack4494 commented Nov 10, 2020

Ah, yup, that's definitely an artifact left over from refactoring. I had intended to let search just process the initial data without requesting any further pages (which I think matches what it did before), but specifying the number of pages to fetch is probably more useful. I think your change as presented above will cap all playlist downloads to the first page, though.

yup you are right. Haven't thought about it lol.
Any idea @wlritchi ? D:
this seems to work but well it's rather hacky
for page_num in (range(n) if n>0 else itertools.count(1)):

Also drop a few leftover methods in search that are no longer used.
@wlritchi
Copy link
Contributor Author

Ah, yup, that's definitely an artifact left over from refactoring. I had intended to let search just process the initial data without requesting any further pages (which I think matches what it did before), but specifying the number of pages to fetch is probably more useful. I think your change as presented above will cap all playlist downloads to the first page, though.

yup you are right. Haven't thought about it lol.
Any idea @wlritchi ? D:
this seems to work but well it's rather hacky
for page_num in (range(n) if n>0 else itertools.count(1)):

Basically doing this except defaulting to None, and adjusting the counting so it always starts from 1 (convention: page 0 is the initial page and continuations start from 1). This way 0 can be passed to say "only download the initial page with no continuations". Also open to tweaking the counting so the initial page counts as page 1.

@wlritchi
Copy link
Contributor Author

Oh, hmm, search with 0 continuations returns nothing. I thought it would be the first page of results.

@blackjack4494
Copy link
Owner

blackjack4494 commented Nov 10, 2020

Oh, hmm, search with 0 continuations returns nothing. I thought it would be the first page of results.

that's why max_pages=5 is needed in class YoutubeSearchURLIE
if you set it to 1 there will be 30 results. But I thought it would be good to get ~100 so I put 5 pages 😆

@wlritchi
Copy link
Contributor Author

Oh I see, YoutubeSearchIE has its own implementation of _entries still, which interprets the parameter differently. That should also be merged into the superclass - will do.

@blackjack4494
Copy link
Owner

Oh I see, YoutubeSearchIE has its own implementation of _entries still, which interprets the parameter differently. That should also be merged into the superclass - will do.

i set it to 5 now. If you plan any more changes hold on for now. Will merge this into master and release it. really have to sleep, can barely keep my eyes open :o

@wlritchi
Copy link
Contributor Author

Oh I see, YoutubeSearchIE has its own implementation of _entries still, which interprets the parameter differently. That should also be merged into the superclass - will do.

i set it to 5 now. If you plan any more changes hold on for now. Will merge this into master and release it. really have to sleep, can barely keep my eyes open :o

👍 I'll put anything else on a separate branch since it's more maintenance/refactoring than bugfixing anyway. While I'm at it - thoughts on whether the initial page should be page 0, or page 1 (i.e. do most users prefer 0-based or 1-based indexing)?

@blackjack4494 blackjack4494 merged commit 00c38ef into blackjack4494:master Nov 10, 2020
@Fabian42
Copy link

I prefer 0-based index at least, but I'm a programmer, so maybe not very representative. I don't think that there is anything paginated with page numbers left on YouTube, but if there was, it would likely start at 1. Things like "--playlist-start" are already 1-based.

@ultrasound1372
Copy link

If things like --playlist-start are already 1 based then it should start at 1.

@tygore587
Copy link

@blackjack4494 There is a bug in here. Playlist title for the output format is not set correct and is always NA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

youtube channel page downloads broken
6 participants