Added login cookie support and paging fixes for FlickrRipper.java and some... #173

ghost · 2015-02-18T20:21:44Z

Just get cookies from browser after logging in to Flickr and put one line in the config. After 1st time, cookies will be Base64 encoded in the config for some very simple security measure.

-- How to get cookies easily (for end users) --

Chrome Browser:

Login to Flickr
Hit F12 for Developer Tools in browser
Go to Resources tab
Expand Cookies on the left
Select www.flickr.com
Get the values for these 3 cookies: cookie_accid, cookie_epass and current_identity
Put these in the "rip.properties" file like this. That's it. (Replace ### with the values)

flickr.cookies2encode = current_identity=###; cookie_accid=###; cookie_epass=###;

Added clearConfigProperty(...) to Utils.java
Modified AbstractHTMLRipper.java so that "no images found" IOException is thrown only for the 1st page. The rest will just log and break out of the while loop.
Added UsenetHub ripper. (http://adult.usenethub.com)
Added Picasa Web Albums ripper. (http://picasaweb.google.com)

Just get cookies from browser after logging in to Flickr and put one line in the config. After 1st time, cookies will be Base64 encoded in the config for some very simple security measure. -- How to get cookies easily (for end users) -- Chrome Browser: - Login to Flickr - Hit F12 for Developer Tools in browser - Go to Resources tab - Expand Cookies on the left - Select www.flickr.com - Get the values for these 3 cookies: <cookie_accid>, <cookie_epass> and <current_identity> - Put these in the "rip.properties" file like this. That's it. (Replace ### with the values) flickr.cookies2encode = current_identity=###; cookie_accid=###; cookie_epass=###; * Added clearConfigProperty(...) to Utils.java * Modified AbstractHTMLRipper.java so that "no images found" IOException is thrown only for the 1st page. The rest will just log and break out of the while loop. * Added UsenetHub ripper. (http://adult.usenethub.com) * Added Picasa Web Albums ripper. (http://picasaweb.google.com)

metaprime · 2016-12-19T10:45:33Z

Hi @SilhouetteTR, could I ask you to split this PR into a PR each for:

Fixes to the existing Ripper (definitely want to merge these in)
New Rippers (will probably review and then merge)
Cookie support (not sure about this, see comments below).

If I don't hear back from you for a week or so, I'll split this into pieces and merge it in a little at a time.

Cookie support:

I think for most users the process of having to manually add cookies is going to be a non-starter. I wonder if there's some way of automatically collecting the necessary information from the cookies that have already been stored on the computer, or something like that.

General comments:

Could you please clean up the commented-out lines and ensure that you're consistently using 4 spaces for tabs instead of tabstop characters?

rautamiekka · 2016-12-20T12:32:46Z

EDIT: I figured out a better way to see what to use ...

Regarding cookies, potentially the best way to know which browser's cookies to use is to look at last changed timestamp and use the newest.

Also, it differs by browser and platform where and how the cookies are stored (I don't see same browser storing in different formats on different platforms though). Chrome and Firefox store everything in SQLite3 databases, IE I think stores as a text file, others no clue but I'd imagine Opera using SQLite3 too.

Opening SQLite databases while the corresponding browser is on might work by using read-only non-locking access. Then there's how you plan to remember all that data; I think you could keep the corresponding data in RAM until RipMe shuts down and read again when starting to rip such site, but reading for every link at same place would be costly.

metaprime · 2016-12-20T12:51:32Z

@rautamiekka I see your points. However, I think the necessity to manually add cookies to the config will be a non-starter for a lot of users. If it was automatic to some extent, and users could specify something in the UI like which web browser they use (we could support one or several web browsers) -- this project is full of the pattern of explicitly supporting lots of little things.

The data could be stored in the rip.properties file like you suggested. I don't think we're going to have meaningful size constraints for cookies as text data since this program downloads large media archives.

Even if you just kept it in RAM, you'd only need to read the cookies once per domain per session -- subsequent rips on the same site would not need to read the cookies again if you don't restart the program.

metaprime · 2016-12-20T12:52:20Z

First things first, let's get this PR split up into these three separate PRs:

Fixes to the existing Ripper (definitely want to merge these in)

New Rippers (will probably review and then merge)

Cookie support (not sure about this, see comments below).

I would definitely like to merge the first two. The cookies support will need more thought.

rautamiekka · 2016-12-20T15:01:33Z

You're right RAM usage would be a very distant issue, it'd take tens or hundreds of thousands of domains to start using considerably. Even then it wouldn't matter much. What matters is getting that data, which means 2 options:

automatically scanned from browsers by timestamp (newest used per domain; although there might be points where newer data would be less good or unusable altogether, I don't know about that part)
a very primitive Java-based Web browser integrated to RipMe to login and catch the data on flight (the user agent should be faked so the site won't use its potentially deployed browser version-based functionalities, assuming the browser can understand the data sent specifically for the browser the user agent belongs to).

metaprime · 2016-12-20T19:51:28Z

I don't want to get into the business of option 2.

Number 1 sounds fine. If newer data is less good -- I'm not that worried about it. Best-effort is a reasonable approach.

metaprime · 2016-12-22T15:12:05Z

#354 also adds support for cookies. I'm inclined to go with that implementation for cookies since it seems well-thought-out, but feel free to debate :)

metaprime · 2016-12-22T15:15:25Z

src/main/java/com/rarchives/ripme/ripper/AbstractHTMLRipper.java

            // Remove all but 1 image
            if (isThisATest()) {
                while (imageURLs.size() > 1) {
                    imageURLs.remove(1);
                }
            }

+            //if (imageURLs.size() == 0) {


Please remove redundant code comment.

metaprime · 2017-05-10T01:22:13Z

At this point I'm pretty sure @SilhouetteTR is not likely to return. I'll carve up this PR into smaller bits so that I can merge in the worthwhile stuff.

I think Cookies should be viewed holistically as a new issue. I like the start in #354 even though it requires manual effort from the user, it's better than nothing (which is the current state when it comes to any site that needs cookies).

…hub.com)

…saweb.google.com)

* Modified AbstractHTMLRipper.java so that "no images found" IOException is thrown only for the 1st page. The rest will just log and break out of the while loop. * Added clearConfigProperty(...) to Utils.java

metaprime · 2017-05-10T01:54:25Z

Carved up this PR into 3 new PRs: #532 #533 #534 which include the changes that we can definitely take from this.

Those 3 PRs include all changes in this PR except for the FlickrRipper change. The FlickrRipper change should be reimplemented based on the cookies implementation in #354 once that is merged.

metaprime force-pushed the master branch from 0aa1df7 to f13de34 Compare December 20, 2016 09:55

metaprime requested changes Dec 22, 2016

View reviewed changes

metaprime added the waiting-author label Apr 25, 2017

metaprime self-assigned this May 10, 2017

metaprime added this to the On-deck for 1.4.x milestone May 10, 2017

metaprime pushed a commit to metaprime/ripme-old that referenced this pull request May 10, 2017

@SilhouetteTR 4pr0n#173: Added UsenetHub ripper. (http://adult.usenet…

5b651ff

…hub.com)

metaprime pushed a commit to metaprime/ripme-old that referenced this pull request May 10, 2017

@SilhouetteTR 4pr0n#173: Added Picasa Web Albums ripper. (http://pica…

3cd23cf

…saweb.google.com)

This was referenced May 10, 2017

@SilhouetteTR #173: Added UsenetHub ripper. (http://adult.usenethub.com) #532

Open

@SilhouetteTR #173: Added Picasa Web Albums ripper. (http://picasaweb.google.com) #533

Open

metaprime mentioned this pull request May 10, 2017

@SilhouetteTR #173: AbstractHTMLRipper and Utils improvements #534

Open

metaprime modified the milestones: On-deck for 1.4.x, On-deck for 1.5.x Jun 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added login cookie support and paging fixes for FlickrRipper.java and some... #173

Added login cookie support and paging fixes for FlickrRipper.java and some... #173

ghost commented Feb 18, 2015

metaprime commented Dec 19, 2016

rautamiekka commented Dec 20, 2016 •

edited

Loading

metaprime commented Dec 20, 2016 •

edited

Loading

metaprime commented Dec 20, 2016

rautamiekka commented Dec 20, 2016 •

edited

Loading

metaprime commented Dec 20, 2016

metaprime commented Dec 22, 2016

metaprime Dec 22, 2016

metaprime commented May 10, 2017 •

edited

Loading

metaprime commented May 10, 2017 •

edited

Loading

Added login cookie support and paging fixes for FlickrRipper.java and some... #173

Are you sure you want to change the base?

Added login cookie support and paging fixes for FlickrRipper.java and some... #173

Conversation

ghost commented Feb 18, 2015

metaprime commented Dec 19, 2016

Cookie support:

General comments:

rautamiekka commented Dec 20, 2016 • edited Loading

metaprime commented Dec 20, 2016 • edited Loading

metaprime commented Dec 20, 2016

rautamiekka commented Dec 20, 2016 • edited Loading

metaprime commented Dec 20, 2016

metaprime commented Dec 22, 2016

metaprime Dec 22, 2016

Choose a reason for hiding this comment

metaprime commented May 10, 2017 • edited Loading

metaprime commented May 10, 2017 • edited Loading

rautamiekka commented Dec 20, 2016 •

edited

Loading

metaprime commented Dec 20, 2016 •

edited

Loading

rautamiekka commented Dec 20, 2016 •

edited

Loading

metaprime commented May 10, 2017 •

edited

Loading

metaprime commented May 10, 2017 •

edited

Loading