-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Download notes (replies, likes, etc.) #169
Comments
If you add
For every post downloaded. It'll only do anything useful with The following patch will add this functionality:
|
@cebtenzzre This patch does not download all the notes due to the Tumblr API only returning 50. https://stackoverflow.com/a/14428010 should help, but you'll need to scrape the /notes/ URL out of the rendered post HTML as well as scrape the paginated URLs out of the /notes/ pages to get the next page of notes. |
import dryscrape
import re
from bs4 import BeautifulSoup
def get_more_link(sess, base, url):
sess.visit(url)
soup = BeautifulSoup(sess.body(), 'lxml')
element = soup.find('a', class_='more_notes_link')
if not element:
return None
onclick = element.get_attribute_list('onclick')[0]
return base + re.search(r";tumblrReq\.open\('GET','([^']+)'", onclick).groups()[0]
base = 'https://uri-hyukkie.tumblr.com'
url = base + '/post/61181809095'
session = dryscrape.Session()
while True:
url = get_more_link(session, base, url)
if not url:
break
print url
session.visit(url)
soup = BeautifulSoup(session.body(), 'lxml')
notes = soup.find('ol', class_='notes').find_all('li')[:-1]
for n in notes:
print n.prettify() There's a proof-of-concept script to scrape the notes from a post that was linked in another StackOverflow answer by unor. Any remarks before I try to integrate it into tumblr-utils? (I'm technically still learning this language...) EDIT: Yes, I realize that there are minor issues here, and that I'm doing duplicate work. I'm fixing that in the version I'm working on. |
I've made a PR for this (#189). |
Included revisions: - Remove log_queue, better status and account logic - Better tracking and synchronization on ThreadPool.queue.qsize - Remove remaining_posts Fixes bbolli#169
Included revisions: - Remove log_queue, better status and account logic - Better tracking and synchronization on ThreadPool.queue.qsize - Remove remaining_posts - Remove getting_tup - Put back the account parameter - Make typing optional Fixes bbolli#169
Included revisions: - Remove log_queue, better status and account logic - Better tracking and synchronization on ThreadPool.queue.qsize - Remove remaining_posts - Remove getting_tup - Put back the account parameter - Make typing optional Fixes bbolli#169
Included revisions: - Remove log_queue, better status and account logic - Better tracking and synchronization on ThreadPool.queue.qsize - Remove remaining_posts - Remove getting_tup - Put back the account parameter Fixes bbolli#169
I refer to post #98. What is the status of this? Are Notes/Comments now downloadable by the backup script? I tried, but I dun see any Notes/Comments downloaded as of now.
Is this what the --likes command is for?
The text was updated successfully, but these errors were encountered: