-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added login cookie support and paging fixes for FlickrRipper.java and some... #173
base: master
Are you sure you want to change the base?
Conversation
Just get cookies from browser after logging in to Flickr and put one line in the config. After 1st time, cookies will be Base64 encoded in the config for some very simple security measure. -- How to get cookies easily (for end users) -- Chrome Browser: - Login to Flickr - Hit F12 for Developer Tools in browser - Go to Resources tab - Expand Cookies on the left - Select www.flickr.com - Get the values for these 3 cookies: <cookie_accid>, <cookie_epass> and <current_identity> - Put these in the "rip.properties" file like this. That's it. (Replace ### with the values) flickr.cookies2encode = current_identity=###; cookie_accid=###; cookie_epass=###; * Added clearConfigProperty(...) to Utils.java * Modified AbstractHTMLRipper.java so that "no images found" IOException is thrown only for the 1st page. The rest will just log and break out of the while loop. * Added UsenetHub ripper. (http://adult.usenethub.com) * Added Picasa Web Albums ripper. (http://picasaweb.google.com)
Hi @SilhouetteTR, could I ask you to split this PR into a PR each for:
If I don't hear back from you for a week or so, I'll split this into pieces and merge it in a little at a time. Cookie support:I think for most users the process of having to manually add cookies is going to be a non-starter. I wonder if there's some way of automatically collecting the necessary information from the cookies that have already been stored on the computer, or something like that. General comments:Could you please clean up the commented-out lines and ensure that you're consistently using 4 spaces for tabs instead of tabstop characters? |
EDIT: I figured out a better way to see what to use ... Regarding cookies, potentially the best way to know which browser's cookies to use is to look at last changed timestamp and use the newest. Also, it differs by browser and platform where and how the cookies are stored (I don't see same browser storing in different formats on different platforms though). Chrome and Firefox store everything in SQLite3 databases, IE I think stores as a text file, others no clue but I'd imagine Opera using SQLite3 too. Opening SQLite databases while the corresponding browser is on might work by using read-only non-locking access. Then there's how you plan to remember all that data; I think you could keep the corresponding data in RAM until RipMe shuts down and read again when starting to rip such site, but reading for every link at same place would be costly. |
@rautamiekka I see your points. However, I think the necessity to manually add cookies to the config will be a non-starter for a lot of users. If it was automatic to some extent, and users could specify something in the UI like which web browser they use (we could support one or several web browsers) -- this project is full of the pattern of explicitly supporting lots of little things. The data could be stored in the Even if you just kept it in RAM, you'd only need to read the cookies once per domain per session -- subsequent rips on the same site would not need to read the cookies again if you don't restart the program. |
First things first, let's get this PR split up into these three separate PRs:
I would definitely like to merge the first two. The cookies support will need more thought. |
You're right RAM usage would be a very distant issue, it'd take tens or hundreds of thousands of domains to start using considerably. Even then it wouldn't matter much. What matters is getting that data, which means 2 options:
|
I don't want to get into the business of option 2. Number 1 sounds fine. If newer data is less good -- I'm not that worried about it. Best-effort is a reasonable approach. |
#354 also adds support for cookies. I'm inclined to go with that implementation for cookies since it seems well-thought-out, but feel free to debate :) |
// Remove all but 1 image | ||
if (isThisATest()) { | ||
while (imageURLs.size() > 1) { | ||
imageURLs.remove(1); | ||
} | ||
} | ||
|
||
//if (imageURLs.size() == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove redundant code comment.
At this point I'm pretty sure @SilhouetteTR is not likely to return. I'll carve up this PR into smaller bits so that I can merge in the worthwhile stuff. I think Cookies should be viewed holistically as a new issue. I like the start in #354 even though it requires manual effort from the user, it's better than nothing (which is the current state when it comes to any site that needs cookies). |
* Modified AbstractHTMLRipper.java so that "no images found" IOException is thrown only for the 1st page. The rest will just log and break out of the while loop. * Added clearConfigProperty(...) to Utils.java
Carved up this PR into 3 new PRs: #532 #533 #534 which include the changes that we can definitely take from this. Those 3 PRs include all changes in this PR except for the |
Just get cookies from browser after logging in to Flickr and put one line in the config. After 1st time, cookies will be Base64 encoded in the config for some very simple security measure.
-- How to get cookies easily (for end users) --
Chrome Browser:
flickr.cookies2encode = current_identity=###; cookie_accid=###; cookie_epass=###;