Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added login cookie support and paging fixes for FlickrRipper.java and some... #173

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

ghost
Copy link

@ghost ghost commented Feb 18, 2015

Just get cookies from browser after logging in to Flickr and put one line in the config. After 1st time, cookies will be Base64 encoded in the config for some very simple security measure.

-- How to get cookies easily (for end users) --

Chrome Browser:

  • Login to Flickr
  • Hit F12 for Developer Tools in browser
  • Go to Resources tab
  • Expand Cookies on the left
  • Select www.flickr.com
  • Get the values for these 3 cookies: cookie_accid, cookie_epass and current_identity
  • Put these in the "rip.properties" file like this. That's it. (Replace ### with the values)

flickr.cookies2encode = current_identity=###; cookie_accid=###; cookie_epass=###;

  • Added clearConfigProperty(...) to Utils.java
  • Modified AbstractHTMLRipper.java so that "no images found" IOException is thrown only for the 1st page. The rest will just log and break out of the while loop.
  • Added UsenetHub ripper. (http://adult.usenethub.com)
  • Added Picasa Web Albums ripper. (http://picasaweb.google.com)

Just get cookies from browser after logging in to Flickr and put one line in the config. After 1st time, cookies will be Base64 encoded in the config for some very simple security measure.

-- How to get cookies easily (for end users) --

Chrome Browser:
- Login to Flickr
- Hit F12 for Developer Tools in browser
- Go to Resources tab
- Expand Cookies on the left
- Select www.flickr.com
- Get the values for these 3 cookies: <cookie_accid>, <cookie_epass> and <current_identity>
- Put these in the "rip.properties" file like this. That's it. (Replace ### with the values)

flickr.cookies2encode = current_identity=###; cookie_accid=###; cookie_epass=###;

* Added clearConfigProperty(...) to Utils.java
* Modified AbstractHTMLRipper.java so that "no images found" IOException is thrown only for the 1st page. The rest will just log and break out of the while loop.
* Added UsenetHub ripper. (http://adult.usenethub.com)
* Added Picasa Web Albums ripper. (http://picasaweb.google.com)
@metaprime
Copy link
Collaborator

Hi @SilhouetteTR, could I ask you to split this PR into a PR each for:

  • Fixes to the existing Ripper (definitely want to merge these in)
  • New Rippers (will probably review and then merge)
  • Cookie support (not sure about this, see comments below).

If I don't hear back from you for a week or so, I'll split this into pieces and merge it in a little at a time.

Cookie support:

I think for most users the process of having to manually add cookies is going to be a non-starter. I wonder if there's some way of automatically collecting the necessary information from the cookies that have already been stored on the computer, or something like that.

General comments:

Could you please clean up the commented-out lines and ensure that you're consistently using 4 spaces for tabs instead of tabstop characters?

@rautamiekka
Copy link

rautamiekka commented Dec 20, 2016

EDIT: I figured out a better way to see what to use ...

Regarding cookies, potentially the best way to know which browser's cookies to use is to look at last changed timestamp and use the newest.

Also, it differs by browser and platform where and how the cookies are stored (I don't see same browser storing in different formats on different platforms though). Chrome and Firefox store everything in SQLite3 databases, IE I think stores as a text file, others no clue but I'd imagine Opera using SQLite3 too.

Opening SQLite databases while the corresponding browser is on might work by using read-only non-locking access. Then there's how you plan to remember all that data; I think you could keep the corresponding data in RAM until RipMe shuts down and read again when starting to rip such site, but reading for every link at same place would be costly.

@metaprime
Copy link
Collaborator

metaprime commented Dec 20, 2016

@rautamiekka I see your points. However, I think the necessity to manually add cookies to the config will be a non-starter for a lot of users. If it was automatic to some extent, and users could specify something in the UI like which web browser they use (we could support one or several web browsers) -- this project is full of the pattern of explicitly supporting lots of little things.

The data could be stored in the rip.properties file like you suggested. I don't think we're going to have meaningful size constraints for cookies as text data since this program downloads large media archives.

Even if you just kept it in RAM, you'd only need to read the cookies once per domain per session -- subsequent rips on the same site would not need to read the cookies again if you don't restart the program.

@metaprime
Copy link
Collaborator

First things first, let's get this PR split up into these three separate PRs:

  • Fixes to the existing Ripper (definitely want to merge these in)
  • New Rippers (will probably review and then merge)
  • Cookie support (not sure about this, see comments below).

I would definitely like to merge the first two. The cookies support will need more thought.

@rautamiekka
Copy link

rautamiekka commented Dec 20, 2016

You're right RAM usage would be a very distant issue, it'd take tens or hundreds of thousands of domains to start using considerably. Even then it wouldn't matter much. What matters is getting that data, which means 2 options:

  1. automatically scanned from browsers by timestamp (newest used per domain; although there might be points where newer data would be less good or unusable altogether, I don't know about that part)
  2. a very primitive Java-based Web browser integrated to RipMe to login and catch the data on flight (the user agent should be faked so the site won't use its potentially deployed browser version-based functionalities, assuming the browser can understand the data sent specifically for the browser the user agent belongs to).

@metaprime
Copy link
Collaborator

I don't want to get into the business of option 2.

Number 1 sounds fine. If newer data is less good -- I'm not that worried about it. Best-effort is a reasonable approach.

@metaprime
Copy link
Collaborator

#354 also adds support for cookies. I'm inclined to go with that implementation for cookies since it seems well-thought-out, but feel free to debate :)

// Remove all but 1 image
if (isThisATest()) {
while (imageURLs.size() > 1) {
imageURLs.remove(1);
}
}

//if (imageURLs.size() == 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove redundant code comment.

@metaprime metaprime self-assigned this May 10, 2017
@metaprime metaprime added this to the On-deck for 1.4.x milestone May 10, 2017
@metaprime
Copy link
Collaborator

metaprime commented May 10, 2017

At this point I'm pretty sure @SilhouetteTR is not likely to return. I'll carve up this PR into smaller bits so that I can merge in the worthwhile stuff.

I think Cookies should be viewed holistically as a new issue. I like the start in #354 even though it requires manual effort from the user, it's better than nothing (which is the current state when it comes to any site that needs cookies).

metaprime pushed a commit to metaprime/ripme-old that referenced this pull request May 10, 2017
metaprime pushed a commit to metaprime/ripme-old that referenced this pull request May 10, 2017
metaprime pushed a commit to metaprime/ripme-old that referenced this pull request May 10, 2017
* Modified AbstractHTMLRipper.java so that "no images found" IOException is thrown only for the 1st page. The rest will just log and break out of the while loop.
* Added clearConfigProperty(...) to Utils.java
@metaprime
Copy link
Collaborator

metaprime commented May 10, 2017

Carved up this PR into 3 new PRs: #532 #533 #534 which include the changes that we can definitely take from this.

Those 3 PRs include all changes in this PR except for the FlickrRipper change. The FlickrRipper change should be reimplemented based on the cookies implementation in #354 once that is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants