Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Vlive] Fix extrator for handling playlist #223

Closed
wants to merge 1 commit into from
Closed

[Vlive] Fix extrator for handling playlist #223

wants to merge 1 commit into from

Conversation

kyuyeunk
Copy link
Contributor

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

This addresses the issue #221 .

Although substantional work has been done to modify code for the revamped website, code regarding playlists has not seen an update.

This fixe uses similar code as recently modified VLiveIE to parse playlist info and download it.

Also, this fix modifies _TESTS to contain valid examples as the previous examples no longer belongs to a playlist.
I assume this is related to the website being revamped and giving lower focus to the playlist feature.

@kyuyeunk kyuyeunk changed the title [Vlive] fix extrator for handling playlist [Vlive] Fix extrator for handling playlist Nov 16, 2020
@SeonjaeHyeon
Copy link
Contributor

SeonjaeHyeon commented Nov 16, 2020

I'm just wondering where you found playlist_id. I searched the video page and tried to find it, but I couldn't.
I think there is no way to find it after the site was revamped, or I might have missed it.

I tested the example you wrote and it seems to work with any values like https://www.vlive.tv/video/113764/playlist/00000 or https://www.vlive.tv/video/113764/playlist/999999.
So, could you tell me what playlist_id actually does?

@exwm
Copy link
Contributor

exwm commented Nov 16, 2020

From what I can tell the playlist_id in the url does nothing, and the playlist is loaded whether /playlist/<playlist_id> is in the url or not. Additionally, the playlist_id is not used in this updated playlist extractor to actually extract any data. Adding /playlist/<playlist_id> is perhaps then just acting as a flag telling youtube_dl(c) to download the whole playlist associated with a video. This behavior is probably not expected though by users. And personally I think, just using the site, I never saw any links to playlists or ended up on urls with playlist in the path.

I also don't think this code accounts for video posts connected to a playlist, eg https://www.vlive.tv/post/1-18363798 which matches https://www.vlive.tv/video/111110, like the plain video extractor does.

Edit: I also want to add that as I noted in #101, I don't think it is possible to detect whether a video or video post is connected to a playlist from the text of the url alone. https://www.vlive.tv/video/111110 is connected with a playlist, but just looking at the url you cannot possibly tell.

Edit 2: I think the best way forward would be to check for the --yes-playlist flag, though I'm not sure if the usual semantics of that flag correspond fully to this specific case. In the readme, the description of this flag reads Download the playlist, if the URL refers to a video and a playlist.

@kyuyeunk
Copy link
Contributor Author

I'm just wondering where you found playlist_id

@SeonjaeHyeon I found playlist_id when addressing the issue #222 .
The fix to the issue required slightly different method from this pull request, thus creating a separate pull request #224 .

When the program goes through entries in a channel, it checks videoSeq to obtain video_id and use http://www.vlive.tv/video/<video_id> to download a video.

However, for some entries, the link points to an invalid address.

I found that for entries with invalid addresses, videoType are set to PLAYLIST.
This led me to believe that for videoSeq might indicate playlist_id for these cases.

I assumed that only http://www.vlive.tv/video/<video_id>/playlist/<playlist_id> points to a valid address, but I wasn't aware that http://www.vlive.tv/video/<video_id>/playlist/<random_number> also points to a valid address.

@kyuyeunk
Copy link
Contributor Author

kyuyeunk commented Nov 17, 2020

From what I can tell the playlist_id in the url does nothing, and the playlist is loaded whether /playlist/<playlist_id> is in the url or not.

@exwm This is true. I have compared webpage data from _download_webpage when using http://www.vlive.tv/video/<video_id>/ and http://www.vlive.tv/video/<video_id>/playlist/<playlist_id> and concluded that both results in same data.

Adding /playlist/<playlist_id> is perhaps then just acting as a flag telling youtube_dl(c) to download the whole playlist associated with a video. This behavior is probably not expected though by users.

As you have mentioned, adding /playlist/<playlist_id> directs the program to use VLivePlaylistIE, instead of VLiveIE.
And in current implementation, http://www.vlive.tv/video/<video_id>/ type of url uses VLiveIE.

I found that if params['postDetail']['post']['playlist']['data'] exists (as seen in this line), the video is in a playlist.
So ideally, when extracting info from VLiveIE._real_extract, the program should be redirected to VLivePlaylistIE if the above value is found.
However, I am not aware of method to do so.

Some workarounds might include modifying VLiveIE code to also handle a playlist.
However, this solution requires a considerable rewrite and I think discussion is required before implementing the feature.

Edit 2: I think the best way forward would be to check for the --yes-playlist flag, though I'm not sure if the usual semantics of that flag correspond fully to this specific case.

If this flag redirects the program to VLivePlaylistIE instead of VLiveIE, I think this is a valid solution.
However, looking through the code, this doesn't appear to be the case.

--yes-playlist sets noplaylist to false and that value is checked only after the program reaches _real_extract (e.g., VLivePlaylistIE._real_extract).

VLiveIE._rearl_extract doesn't check the value, and even if it is modified to check it, it still has above mentioned problems (i.e., unware of method of doing so or requiring considerable rewrite).

@exwm
Copy link
Contributor

exwm commented Nov 17, 2020

So ideally, when extracting info from VLiveIE._real_extract, the program should be redirected to VLivePlaylistIE if the above value is found.

I don't think it should redirect to the playlist extractor when a user provides the url for a single video. Since playlists don't seem to have a dedicated url, youtube_dl(c) should take a flag somehow to download a playlist if it finds one connected with a video. I didn't look too deeply into --yes-playlist, but it sounds like it may not be appropriate for various reasons. Perhaps a new flag is warranted, or if nothing else some documentation that to download a playlist one must append /playlist or similar to the url.

It does seem like merging VLivePlaylistIE into VLiveIE may be warranted since vlive now works in a way that playlists are pointed at by videos and posts, but not by any other means.

@SeonjaeHyeon
Copy link
Contributor

As @kyuyeunk said, it seems that the plugin cannot redirect to VLivePlaylistIE before VLiveIE._real_extract is called.
Maybe we can use _real_initialize method or something like this, but I don't think it is a fundamental solution.
VLivePlaylistIE returns self.playlist_result(), which have to be checked with _VALID_URL one by one again. So each element will call VLiveIE and because of noplaylist value checking, the plugin redirects to VLivePlaylistIE again. This will be repeated.

It does seem like merging VLivePlaylistIE into VLiveIE may be warranted since vlive now works in a way that playlists are pointed at by videos and posts, but not by any other means.

I agree with @exwm . Like I said above, we may get some problems if we separate them and add redirection way.
Even though merging them requires considerable rewriting, I think it would be better.

@kyuyeunk
Copy link
Contributor Author

I agree with comments of @exwm and @SeonjaeHyeon that VLivePlaylistIE and VLiveIE should be merged.
Therefore, I made commit 9b539c3 that merges those two.

One of the problem I encountered during implementation is that there is no way to tell if a url is intended for video only or a playlist.

Specifically, I tried modifying VLiveIE._real_extract to return playlist containing video links if it detects that a link is in a playlist, and return video info otherwise.
When the playlist is returned, the program iterates urls within the playlist and parse the info of url using VLiveIE._real_extract.
However, since videos from a playlist also contains information about playlist, VLiveIE._real_extract returns a playlist causing an infinite loop.
I could not find a way to pass a flag inside playlist_result and tell VLiveIE._real_extract to ignore playlist feature.

To solve the issue, I implemented VLiveBaseIE which doesn’t look for a playlist and only downloads a video, which is the same behavior of VLiveIE before it had been modified.

Meanwhile, VLiveIE has been modified to make a playlist that utilizes VLiveBaseIE using ie=VLiveBaseIE.ie_key()) if a link contains playlist.
And if the link doesn’t contain a playlist, it will only download the video.
By setting ie value, the program is forced to use VLiveBaseIE, not VLiveIE, to download the link.

The only concern is the possibility of VLiveBaseIE.suitable(cls, url) being called before VLiveIE.suitable(cls, url) when it initially looks for a suitable extractor.
If this happens, url will be bound to VLiveBaseIE and no playlist will be downloaded.

However, order of .suitable(cls, url) being called seems to be based on the order of import in youtube_dlc/extractor/extractors.py. Therefore, I assume it won’t cause any problems.

I have tested the commit using both types of url (i.e., https://www.vlive.tv/video/111110 and https://www.vlive.tv/post/1-18363798) and verified that it works correctly. Also, flags such as --no-playlist works correctly.

@kyuyeunk
Copy link
Contributor Author

It seems like the method i used in commit 9b539c3 doesnt pass the following test

Traceback (most recent call last):'
File "/home/travis/build/blackjack4494/yt-dlc/test/test_all_urls.py", line 95, in test_no_duplicates
self.assertFalse(
AssertionError: True is not false : VLiveBaseIE should not match URL 'https://www.vlive.tv/video/1326' . That URL belongs to VLive.

Therefore, a different method should be used.

@kyuyeunk
Copy link
Contributor Author

It is better to ignore comment #223 (comment) and only read this to avoid confusion.
Lengthy above comment was cause by my lack of understanding of the program and trying to create a workaround that worked within my limited knowledge.

The latest commit merges VLivePlaylistIE and VLiveIE into VLiveIE as @SeonjaeHyeon and @exwm suggested.

If VLiveIE. _real_extract detects that a link contains a playlist, it will parse video infos in the playlist and return list of video infos using playlist_result.
Otherwise, it returns a single video info.

I have tested the commit using both types of url (i.e., https://www.vlive.tv/video/111110 and https://www.vlive.tv/post/1-18363798) and verified that it works correctly.
Also, flags such as --no-playlist works correctly.

Copy link
Contributor

@exwm exwm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore this comment.

Copy link
Contributor

@exwm exwm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the suppertedsites.md (https://github.com/blackjack4494/yt-dlc/blob/master/docs/supportedsites.md) should be regenerated/updated since the explicit playlist extractor is removed. Playlists are still supported in their new form though and I'm not sure where/how that should be documented.

youtube_dlc/extractor/vlive.py Outdated Show resolved Hide resolved
youtube_dlc/extractor/vlive.py Outdated Show resolved Hide resolved
@kyuyeunk
Copy link
Contributor Author

I think the suppertedsites.md (https://github.com/blackjack4494/yt-dlc/blob/master/docs/supportedsites.md) should be regenerated/updated since the explicit playlist extractor is removed. Playlists are still supported in their new form though and I'm not sure where/how that should be documented.

Looking through other entries in supportedsites.md, I don't think there are strick convention for commenting what is supported.
I have updated supportedsites.md to remove vlive:playlist and comment that vlive supports both videos and playlists.

@SeonjaeHyeon
Copy link
Contributor

Everything looks fine, but it seems that the plugin doesn't collect all videos of playlist.
For example, https://www.vlive.tv/post/1-19612839, the plugin downloads only 33 videos.
But as you can see, there are more than 33 videos in the playlist. (totalCount value in JSON Data says there are 119 videos in total.)

That's because there is not all the video information in JSON Data. The front part of the playlist was omitted. (maybe due to length limit or some reasons)
So if we want to get all of them, I think the following endpoint should be used.
https://www.vlive.tv/globalv-web/vam-web/playlist/v1.0/playlist-<playlistSeq>/posts?limit=<int>&before=<int>

@kyuyeunk
Copy link
Contributor Author

So if we want to get all of them, I think the following endpoint should be used.
https://www.vlive.tv/globalv-web/vam-web/playlist/v1.0/playlist-<playlistSeq>/posts?limit=<int>&before=<int>

I've looked into https://www.vlive.tv/globalv-web/vam-web/playlist/v1.0/playlist-<playlistSeq>/posts?limit=<int>&before=<int> and it seems like appId is required to fetch a valid data.
Otherwise, you get {"errorCode":"common_403","message":"You are not authorized."} error.

Comment from previous thread (#101 (comment)) mentioned that appId isn't required for LIVE videos if Host, User-Agent, and Referer is specified. But for playlists, this doesn't seems to be the case.

Also, I'm not certain if appId is a fixed value, or varies depending on a situation. But just to be sure, I implemented a code that fetches appId from the website.

Using appId and https://www.vlive.tv/globalv-web/vam-web/playlist/v1.0/playlist-<playlistSeq>/posts?appId=<appId>&limit=<int>, the newest commit downloads all 119 videos from the link https://www.vlive.tv/post/1-19612839. I've also added the link to _TESTS.

@SeonjaeHyeon
Copy link
Contributor

Ohh.. I forgot to mention appId. Sorry. It is also required.
And appId is a fixed value so far: 8c6cc7b45d2568fb668be6e05b6e5a3b
But the value can be changed any time. So, it would be better to parse appId as you did.

Now, it looks perfect for me. Downloading playlist works correctly. 👍

@blackjack4494
Copy link
Owner

@kyuyeunk you may want to have a look at this again. some conflicts needs to be resolved since there were changes made to vlive in #245

@kyuyeunk kyuyeunk marked this pull request as ready for review December 4, 2020 14:30
@kyuyeunk
Copy link
Contributor Author

kyuyeunk commented Dec 4, 2020

@kyuyeunk you may want to have a look at this again. some conflicts needs to be resolved since there were changes made to vlive in #245

@blackjack4494 The pull request has been modified to resolve conflicts. And I have verified that it's working as intended.

Also, VLiveChannelIE has been be modified, as it caused unintended behavior when VLiveIE was modified to support playlists.

pukkandan referenced this pull request in yt-dlp/yt-dlp Jan 7, 2021
@kyuyeunk kyuyeunk closed this by deleting the head repository May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants