Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor test_stream to a separate function #434

Open
wants to merge 237 commits into
base: master
Choose a base branch
from

Conversation

soraxas
Copy link

@soraxas soraxas commented Mar 14, 2021

This simple PR refactor HostedMediaFile.__test_stream into its separate function.

Firstly, it doesn't uses any state within HostedMediaFile (i.e. didn't uses the self arg) so it should be a separate utility function. Moreover, this allows the resolver to call this utility function to verify that the stream it's returning is indeed playable. For example, I'm coding a resolver where it has multiple mirrors link that are potentially playable, but some of them might have been expired. HostedMediaFile checks the returned stream_url and discard the link (as well as the resolver) if the link is unplayable. However, it would be more robust if the resolver could check the links against the implemented function to verify that it is playable before returning the stream_url.

Usage:

from resolveurl import hmf

...

# in some custom resolver
class MyResolver(ResolveUrl):
  
  def get_media_url(self, host, media_id):
    ...
    potential_urls = [url1, url2, ...]
    for url in potential_urls:
      if not hmf.test_stream(url):  # HERE
        continue
      ...
      return url

Gujal00 and others added 30 commits March 13, 2020 12:01
* Added the web_url as Referer to explicitly get final media url
Streamz.cc In order to properly result in ResolverError in case the scrape sources function does not end up in a result.
* Update waaw

* upd waaw

* Update waaw.py

* Update waaw.py
remove unnecessary imports
defunct hoster
defunct hoster
Domain Seems to be dead
@Gujal00
Copy link
Contributor

Gujal00 commented Mar 15, 2021

This doesnt seem right as ResolveURL would only ever receive one url to resolve and it does its job correctly. The scenario you are describing is scraper addons, which could find multiple links and have to pick a working link. Most addons already handle this by sending the urls one by one to ResolveURL until they get a resolved (and checked) url back.

@soraxas
Copy link
Author

soraxas commented Mar 15, 2021

Yes, what I'm referring to is ResolveURL receiving one URL, but while resolving, it found multiple potential sources.

(Sorry that my description might have introduced the confusion, the potential_urls in the example code is found by something like

potential_urls = helpers.scrape_sources(html)  # or other internal scraping method

rather than being multiple input url to ResolveURL.


Consider the scenario where you have found multiple url sources and have to use something like

url = helpers.pick_source(...)

to requires user intervention on picking sources (e.g. lib/resolveurl/plugins/hdvid.py, lib/resolveurl/plugins/mp4upload.py, etc. in this repo) . But in reality, the multiple sources are, most of the time, just mirror urls and/or different resolutions of 1080p/720p/...

Most of the time the user doesn't really care about which link to use, but more on using a link that is still valid. So my rationale is we'll loop through the potential links inside the implemented ResolveURL and returns the first one that is valid, rather than requiring the user to go through a trial and error process to manually picking sources to test mirror links.

@Gujal00
Copy link
Contributor

Gujal00 commented Mar 15, 2021

No, still not a valid sceanrio. A single url given for resolving, never has any mirrors. In all cases it is the same host with multiple qualities of the file on the same server. The helpers.pick_source only lets you pick qualities, not mirrors. So if the user has set up autopick best quality (which is the default setting) it will pick the highest quality from the found sources. if not it will list the qualities found if there were more than one quality available.

As mentioned before what you are trying to do is to be done on the addon and not on ResolveURL

@soraxas
Copy link
Author

soraxas commented Mar 16, 2021

I think regardless of the philosophical question of what is counted as a resolver script and what is counted as a scraper addon (underlyingly the resolver plugins are still scrapping the source), my real question is what is the main concern of separating __test_stream from its HostedMediaFile class? This PR is only about that, nothing more. The __test_stream looks more like a utility function to me as it's stateless and should be able to call from anywhere.


For the resolverURL plugin that I'm working on, it handles a URL that contains page source that looks something like

<div id="vb_player_area" align="center">
  <div
    id="vb_player"
    class="jwplayer jw-reset jw-state-playing jw-stretch-uniform jw-breakpoint-6 jw-orientation-portrait"
    tabindex="0"
    aria-label="Video Player"
    role="application"
    style="width: 1003px; height: 1017px;"  >

    ...

  </div>
</div>

<div id="vb_server_list" align="center">
  <span
    id="vb_sv_4"
    class="vb_btnt vb_btnt-primary"
    onclick="vb_load_player('y76362VwiTyKlAtjkoD21iBQoy6znz92xej5EfEhgAiRv1F5cZZzTc2yz28Dv0IYkjR22fCxjkPOTWPh3Pp8DuJcV8B864xNf7eXjBTqKtPqDSJhZ4kWkOh2q4CIImvwOsj1aUkTKSAFidCoJ5NpR8ngkb3BC4g8hZWL4HoyZAOJNWXNtezkr5mJM6t0m7l7mz0Ur0HmQpFsJqkQQXryQatvKeyGJymwO7LCOK9ZbSWQI1JEfYpKjre2zYBDSOFLtIPBq6EljRP5sGxJV5hqKzrksUHQ2ovPjjR3MiO1gVSPBi0pEto1S4ytqtbqIUH43gDRQte4TnUh3bNjh2mJHPfgpVGwoU6W6FyqFPBCRmsOwZQ1a5OD879ddJyRFHP2YhMPgmGsVkz0GSWNtHQ0eohZ283lpnxYoiT3fiKjY085puGdmaObvh35haw45Bk8CJfu3ipxYSfUDUfTjTeMyw3gMXhi9Wd8xe1U7sLNCvCRmefES2u9arkgE9l2y2LfrXfREhzjzB7BL4dnETYDAQQyF2bHtItS',4,'mp4');"
    >360p</span
  ><span
    id="vb_sv_5"
    class="vb_btnt vb_btnt-primary active"
    onclick="vb_load_player('Ce7jetTAOPIXqv84i0P3bOAPvE20OoSpWNvxcuv0VkvhFfbpbqRvYsSQ7oPGaF9sA21zrtEfYQurZzXuBAlbCiBA86rRHZ3l8wdRwOQa5qXf3HdVNyNFkmvkbFydU2MpdPnZd7Y6Nw0JW4eZrTswgsndrzKgUNVohRXsd8i5WhaanWPkNhuZ0joZonGZvzte35vLtBkVWfR6b4eyKbHCr3xNOgfjZ42Z1atQ9ZhwzNOS8TLqJUGYk1MuASx9nBws8mZlWklRskSb8f8GPKKns00akmfKAbkovsQq6i8sNZyV55Cd2y6PSHza2JwNbFv4rBOxYKH1BwNt8KLyVRjUeBk3TvIDNXrcVHL5XwHpWhUwTZLl705c4I952meThz7SpaXHnl9Sm1BSmIHXMOGK7OTsOH9g2rqDq0YGwfdMGmZf5HWy8yFYqMKE2Me6ITWbaNwVGK6HgfDQtzWLBo8D6XTCqEvjcSnQqm',5,'mp4');"
    >720p</span
  ><span
    id="vb_sv_6"
    class="vb_btnt vb_btnt-primary"
    onclick="vb_load_player('PzLChL0eRFYFHN9bzj2U8D2g56NR8r8h8E4k62ngSKtyU1xhVHo1eYX9z72zuHqmkKfOPPgEfhjm8cpoxBj1YYuRIZ47EV643ud8pv5a9utyjfbuVkRBuz6VYO2R6MobbOCcQVHuxil8MldXmSYXI90036NOoB9eZHPJi9NvA1fUTbAWwjchBC7R1lu9spvH24oCMh78om4VVak9769ttAr8He7cIaGPLCeQ2yS3z5Hv4Z7Fbp7twPAHiQudFWcAugBmtP1R00X0n3rv2T82WNy85TPWdiNa93zGFqlVMw6do3uV6e6VbYKeibh8gZugjGjQEXmaX0MjBm1aQfr8EhylgvB4MZb9vEfCKyfzDnu66vpNKMel7LqvbsBIIhO1XIc1o9MTI0y446GUHRlnsmSFq628ROcjMzqH7oNy0Ad9FmKwyqiCurvSTRq1lcsORg2qo3ypxn3HXswtAnkWk27n9XYKGnb50g2dkIe17q0q6zNoEYUa9DQ8RtWUTO1P5v1p4xHaAwBPmNTvmhR16Y09e1IOi5c6A9jd5FfPXzVVz0hSBO4zWEqbTo0YyUFY8wjJ8whBcGILod8pk474p5UB8CaUIzesgGByodImd5lvHGh3PDAmcUuUDYL1QJsk4kG6oIBKLe8pGiqX1jxsFPPNYP3xRDF0n8KvZ4s97R717vLHuFa1gUSV2JDlDbCdPnpzHI9Rb8FtRBqrOcap5PyHsGRdnmK86QXKFbcv0Nu5NYC3MkinZbZfU357RsJqx4zNtgEakdcZIe9xGOyZac0bkehEzx7tE1oKI8wzDmy2bcpMBQSyjMCB68lBrj6un7cNwKeVOwZU4psNizu0aihwtZhBYd0vPKky7PnkNt2ZRiRwjvBQTAY1EZaIM7ljQyf==',6,'mp4');"
    >360p #2</span
  ><span
    id="vb_sv_7"
    class="vb_btnt vb_btnt-primary"
    onclick="vb_load_player('NCGUDGM0mtxkRSwaEZtis6Zw7U2qxxctATWIpKmJJmcmoLpIJcurwu1sIQYbzBkzxGfaLfJQu0kNmpMXnQeldRmP8BmMXuneNPmPI7gG5L60KmcFwUBjBS5VOIaVbohra2eBiBfJswiDgKlsWQv4jOfTX97mkpjs0qdrj5gG1YRKHuNPl8ysJVzH92QxaqvFMHGxBQobySXoFBr1iurq9P5ARNUvvDtTWkw9RUPIrzgyDf2CbcVG8Y0GTyNeD6b6OxeZ4c4e5svuS9H4lxIqvkTQrqOMj88Qp5kJhXM7QZkmorQnUfsJAixA1jk9CnkTXCzj7j3uSlFbleX7ZMjGJCvBoVdmGcL2u2Qx31ewzw4tl3g0jVhpXBN3V6jTqUdD0ZT80IuQOmmrJ95BEzLJDIDIQ1VSEfAZ4CIuEX8zFS0YZP5Araj6sugQedNvUmM8xy0ZnSAYRS5mNky5iFf4jyRUeWN7uMDNYnznSQzz1PcJfA2MlRzVp7APaVZx2ZjiLs5X5nVO3d9KoY1QsiF4aZ9fGJh7A6KgfIjmE7nK5cSg1yXYtes4XdTxlwdBG2WiprO9iCUCw5Esd8D8sGRaKeDq2VfnD6RmeQYi92aYXtN6Lmf4VOhtKdHvBiCuRKVPZK5yu6tGwfZc6LwRsBENpjBaKSGNFMyTUUGobJ9rDw2Brq27VeIqlzWcfjTDa1hBHE3D9SK5guVkV6TpPcGx6gMWoqg6ZZ2kFYtdGTqCUV4gmg1MAaN6C6H3uCWvPSkp2yzpytjZOQf47HadUoTVT3dcL5FFFhAF3s9Hli8M0RJ9AOeS9TJbsbSLOxDitM84gveKiw4IkJAsUrkD8UkZ5hjB0kLrZTm88yRbblhgFeXtjS7WCQnFzRRQyi1HWSFz3x=',7,'mp4');"
    >720p #2</span
  ><span
    id="vb_sv_8"
    class="vb_btnt vb_btnt-primary"
    onclick="vb_load_player('220Z6GwpjgnPVYc7o5WjTVGEZIZBZMkTszRWdNal32G1OR2Kz6Jjy38DBvjUWgvERW',8,'embed');"
    >HLS</span
  ><span
    id="vb_sv_9"
    class="vb_btnt vb_btnt-primary"
    onclick="vb_load_player('aH38HiasJ520aA9B9xTlE=',9,'embed');"
    >Hydrax</span
  ><span
    id="vb_sv_10"
    class="vb_btnt vb_btnt-primary"
    onclick="vb_load_player('aHR0cHM6Ly91cHRv&9FOWmNvbS90NXk5NHBlN2M2cXo=',10,'embed');"
    >Uptobox</span
  >
</div>

where it's a JW player with javascript onclick function that can switch to different resolution and mirrors, all on the same toolbar. But regardless of the details of the this particular resolverURL plugin, I think the __test_stream would still be useful for other things to test validity of url.

@jsergio123
Copy link
Owner

It's been quite some time since I looked through the SMR code and I might be mistaken but I feel like this would cause your plugin to become extremely slow. If @Gujal00 was to allow this I suggest maybe moving test_stream to helpers.

@soraxas
Copy link
Author

soraxas commented Mar 22, 2021

Should it be the one located at lib/resolveurl/plugins/lib/helpers.py?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.