Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change scraper failure logic to use old emotes on any scrape failure of a sub #43

Open
jamesmanning opened this issue Mar 6, 2015 · 2 comments

Comments

@jamesmanning
Copy link

Currently we have the CSS caching that helps handle the failure case of "can't download CSS file" but lots of other failure cases still end up with the subreddit being 'lost' in that scrape.

The proposed change would be something like this (I don't know Python at the moment, so bear with me)

try:
  subreddit_emotes = scrape(subreddit_name)
except
  old_emotes = read_json_file(existing_json_file_path)
  old_sub_emotes = [emote for emote in old_emotes if emote.sr == subreddit_name]
  subreddit_emotes = old_sub_emotes

This means that all failure scenarios for scraping a sub (other download problems, bad CSS, bugs in CSS parsing/emote-extraction code, etc) mean we don't completely lose the sub, since we'll just use the emotes we have from the last scrape.

This would also mean potentially being able to get rid of the any lower-level code that was made to try and handle failure scenarios by falling back to the last successful run (like the CSS file cache AFAICT?) but that's not necessary.

@Trellmor
Copy link
Contributor

Trellmor commented Mar 6, 2015

After checking the logs[1] and the code I see the following:
There is one emote that has an invalid image. The download fails and an error is logged, but the other images download fine.
Since the download failed, the images file can't be loaded later on, and another error is logged for this emote group.
All the other emotes are processed OK.

Apparently, some emotes are missing from marms scrapes, but without seeing his logs, I have no idea why.

[1]https://gist.github.com/Trellmor/1250959a4e1969180a9e

@Trellmor
Copy link
Contributor

The errors in my scrape log are because of an invalid background-image rule for \test in the BTPatron subreddit: https://www.reddit.com/r/BTPatron/about/stylesheet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants