Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RegieLive provider is regularily returning BadZipFile #2165

Open
jungla0 opened this issue Jun 6, 2023 · 10 comments
Open

RegieLive provider is regularily returning BadZipFile #2165

jungla0 opened this issue Jun 6, 2023 · 10 comments

Comments

@jungla0
Copy link

jungla0 commented Jun 6, 2023

Describe the bug
RegieLive is some time returing BadZipFile, even if during a manual search is working and it downloads a proper zip file.

To Reproduce
Steps to reproduce the behavior:

  1. Add RegieLive as a provider
  2. Add movie/series from Sonarr/Radarr
  3. Manually search for a subtitle.
  4. Click on the provider name link and see the zip being downloaded
  5. Click on the download button in manual search window and notice that the subtitle is not downloaded and the provider returns BadZipFile

Expected behavior
BadZipFile should not appear for actual zip files

Screenshots
badzipfile

Software (please complete the following information):

  • Bazarr: 1.2.1
  • Radarr version: 4.6.0.7439
  • Sonarr version: 4.0.0.535
  • OS: DSM 7.2
@morpheus65535
Copy link
Owner

Is that na issue only with RegieLive?

@jungla0
Copy link
Author

jungla0 commented Jun 6, 2023

Based on what providers I used so far, seem like it, yes. Don't have the biggest knowledge, but looking over the log, it seems to me that it cache that error sometime ago and now it displays it every time. Log:

Unexpected error in provider 'regielive', Traceback: Traceback (most recent call last): File "/volume2/@appstore/bazarr/share/bazarr/bazarr/../libs/subliminal_patch/core.py", line 398, in download_subtitle self[subtitle.provider_name].download_subtitle(subtitle) File "/volume2/@appstore/bazarr/share/bazarr/bazarr/../libs/subliminal_patch/providers/regielive.py", line 129, in download_subtitle archive = zipfile.ZipFile(io.BytesIO(_zipped.content)) File "/var/packages/python310/target/lib/python3.10/zipfile.py", line 1269, in __init__ self._RealGetContents() File "/var/packages/python310/target/lib/python3.10/zipfile.py", line 1336, in _RealGetContents raise BadZipFile("File is not a zip file")zipfile.BadZipFile: File is not a zip file

image

Also, one thing that I didn't mentioned is that, RegieLive used to work, and was one of the best for me but for the past couple of day I'm constantly receiving that error. I've even removed it for 1-2 days just in case it had some timeouts or anything but once I add it back, it returns the above at the first search

@morpheus65535
Copy link
Owner

morpheus65535 commented Jun 8, 2023

It seems that regielive implemented reCaptcha on their download pages.

image

@alexandrucatalinene are you in the mood to look into this one? You've done a great job when we had issues with this provider before.

Thanks!

@alexandrucatalinene
Copy link
Contributor

Ok, I'll try and tackle this one.

@morpheus65535 morpheus65535 changed the title Some providers are regularily returning BadZipFile, RegieLive provider are regularily returning BadZipFile Jun 8, 2023
@morpheus65535 morpheus65535 changed the title RegieLive provider are regularily returning BadZipFile RegieLive provider is regularily returning BadZipFile Jun 8, 2023
@alexandrucatalinene
Copy link
Contributor

Just an update: I didn't have time to properly look into the issue but I did try an reproduce it (both in bazarr but also in browser) and I couldn't do it.

Everything downloaded fine and unzipped, all zipped files were valid, never got a recaptcha challenge (even though I changed UAs, reset all cookies, private browsing etc).

So, my guess right now is that it's something that happens only for certain devices (IPs maybe) that triggered some sort of rule on RL.

@morpheus65535
Copy link
Owner

@alexandrucatalinene What I can tell is that I've used Bazarr first to search for subtitles in batch then got throttled. When accessing the URL that get throttled (the zip file download), I get redirected to the html page to solve the recaptcha.

@IonutNeagu
Copy link
Contributor

There is a limit of subtitles that can be downloaded in a certain period of time. Problems can occur with series with many episodes or if someone downloads many subtitles. Normally, that limit is not reached.

@alexandrucatalinene
Copy link
Contributor

alexandrucatalinene commented Jul 24, 2023

There is a limit of subtitles that can be downloaded in a certain period of time. Problems can occur with series with many episodes or if someone downloads many subtitles. Normally, that limit is not reached.

So, my take on this is that we should intercept this error and return a Throttled exception. Any clue on on definite way of detecting this limit (http return code, page data etc).

I want to avoid trying to add a bypass just for some edge cases.

@TomSawyer8006
Copy link

Hi, there are 2 problems:

  1. Downloaded subtitles per IP:
    The limit is around 50/day - without getting the reCapcha.
    If you get the "301 redirect" header when downloading the zip file then you hit the limit.
    At this point you should pause the use of the RL provider for a few hours.
    Even if the reCapcha is solved you'll see it again for almos every other download.
    And, if you continue, after a while you'll get a "txt" file telling you hit the limit and in this case you'll get a 24 hours restriction.

  2. Requests per minute:
    For searches and downloads alltogether.
    The limit is something in the range of 15 requests / minute, but no more than 3-4 in a few seconds (it's rate-limited per minute, not per second). I think there's also a limit per hour.
    If you get the "429 Too Many Requests" header then you need to slow down the requests count immediately.
    If you continue and ignore the 429, after 1-2 minutes you'll get firewalled completely, for somewhere between a few hours and some days, depending the number of requests made after you got the 429 header.
    Keep in mind that this 429 also applies to downloads.

Both problems occur only with big series because because there are many requests done in bulk.
For normal use, there's no problem.

I hope it helps.

morpheus65535 added a commit that referenced this issue Jan 16, 2025
…edirected to captcha validation or being completely blocked for a while. #2165
@morpheus65535
Copy link
Owner

@TomSawyer8006 thanks for taking the time to do some investigation. I've implemented your suggestion and we may fine tune delays further down the road. Let me know if upcoming beta improve results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants