-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
500px now rips non-water marked images #492
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. Still managed to get an adult content placeholder image (although I got 67 actual photos).
Also missing out on obvious titles which could be taken from the image file names if available.
Also it looks like after one rip of the example link I exceeded the rate limit, so I can't test again.
As usual, I think we can definitely start ripping before all of the URLs are parsed out. It's a common pattern in various rippers anyway, so it seems worth making that change. But I supposed it could wait.
My log btw: https://pastebin.com/ZpCgqdFC |
It looks like theres been some changes to the site since I wrote the ripper, I'll get on fixing these |
Maybe it's best to avoid using Then this check can be discarded: Because this placeholder URL could be different, or could change any time. Instead, always extract the target URL(s) from here:
Because that script element with |
@cyian-1756 any update on this one? |
They implemented some insane rate limiting (I was still getting IP banned after waiting 10 secs between requests) so I haven't really be able to do much testing (As I get pretty much insta banned) |
Maybe we need to make the wait interval long and slightly randomized to get around bot-detection? |
^ 10 seconds and getting insta-banned is already a lot, so the base waiting time would have to be something like 15 or 20 seconds at minimum with 5-10 seconds range of randomization at minimum ... And those might not even be enough. Tbh I'm very surprised how strict limiting they suddenly implemented. |
That might work, I'll look into it.
I wouldn't be shocked if they did it to combat ripme considering it went into effect pretty much right after I fixed this ripper and added watermark free ripping |
The 500px ripper now rips images without a water mark on them closing issue #491. There are still some issues with the ripper (It takes a long while to start ripping and doesn't save the image titles) but those can be fixed later
Test link http://500px.com/david-foto