You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have the following error when I try to scrape reddit : snscrape.base.ScraperException: 4 requests to https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000 failed, giving up.
I also tried with the python package and subreddit search but it doesn't work either.
I tried to do it from another device but the same result...
Any idea?
How to reproduce
run snscrape -n 100 -vv reddit-search toto
Expected behaviour
Get data?
Screenshots and recordings
No response
Operating system
Kubuntu 22.04
Python version: output of python3 --version
3.8.8
snscrape version: output of snscrape --version
snscrape 0.7.0.20230622
Scraper
reddit-search
How are you using snscrape?
CLI (snscrape ... as a command, e.g. in a terminal)
Backtrace
snscrape.base.ScraperException: 4 requests to https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000 failed, giving up.
Log output
2023-07-05 09:56:59.215 INFO snscrape.base Retrieving https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000
2023-07-05 09:56:59.216 DEBUG snscrape.base ... with headers: {'User-Agent': 'snscrape/0.7.0.20230622'}
2023-07-05 09:56:59.216 DEBUG snscrape.base ... with environmentSettings: {'verify': True, 'proxies': OrderedDict(), 'stream': False, 'cert': None}
2023-07-05 09:56:59.217 DEBUG urllib3.connectionpool Starting new HTTPS connection (1): api.pushshift.io:443
2023-07-05 09:56:59.285 DEBUG snscrape.base Connected to: ('172.67.219.85', 443)
2023-07-05 09:56:59.285 DEBUG snscrape.base Connection cipher: ('TLS_AES_256_GCM_SHA384', 'TLSv1.3', 256)
2023-07-05 09:56:59.682 DEBUG urllib3.connectionpool https://api.pushshift.io:443 "GET /reddit/search/submission?q=toto&limit=1000 HTTP/1.1" 403 30
2023-07-05 09:56:59.684 INFO snscrape.base Retrieved https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000: 403
2023-07-05 09:56:59.684 DEBUG snscrape.base ... with response headers: {'Date': 'Wed, 05 Jul 2023 07:56:59 GMT', 'Content-Type': 'application/json', 'Content-Length': '30', 'Connection': 'keep-alive', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains', 'CF-Cache-Status': 'BYPASS', 'Report-To': '{"endpoints":[{"url":"https:\\/\\/a.nel.cloudflare.com\\/report\\/v3?s=y4%2BHEpMTSBJTXPGm4t7j95SB7FGvFVoPOkhN7%2BoPzIMt8rFnrbVatyYC2TKIviyCyOuaYt%2B%2FtN02NPN3AZa%2BCtunP7oatjwYM8k51iOBRkXNrBcTndwFIxVJTfEqILZlwQTp"}],"group":"cf-nel","max_age":604800}', 'NEL': '{"success_fraction":0,"report_to":"cf-nel","max_age":604800}', 'Vary': 'Accept-Encoding', 'Server': 'cloudflare', 'CF-RAY': '7e1e0df6bb3ad4e5-CDG', 'alt-svc': 'h3=":443"; ma=86400'}
2023-07-05 09:56:59.684 INFO snscrape.base Error retrieving https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000: non-200 status code, retrying
2023-07-05 09:56:59.684 INFO snscrape.base Waiting 1 seconds
2023-07-05 09:57:00.687 INFO snscrape.base Retrieving https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000
2023-07-05 09:57:00.687 DEBUG snscrape.base ... with headers: {'User-Agent': 'snscrape/0.7.0.20230622'}
2023-07-05 09:57:00.688 DEBUG snscrape.base ... with environmentSettings: {'verify': True, 'proxies': OrderedDict(), 'stream': False, 'cert': None}
2023-07-05 09:57:00.809 DEBUG urllib3.connectionpool https://api.pushshift.io:443 "GET /reddit/search/submission?q=toto&limit=1000 HTTP/1.1" 403 30
2023-07-05 09:57:00.810 INFO snscrape.base Retrieved https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000: 403
2023-07-05 09:57:00.811 DEBUG snscrape.base ... with response headers: {'Date': 'Wed, 05 Jul 2023 07:57:00 GMT', 'Content-Type': 'application/json', 'Content-Length': '30', 'Connection': 'keep-alive', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains', 'CF-Cache-Status': 'BYPASS', 'Report-To': '{"endpoints":[{"url":"https:\\/\\/a.nel.cloudflare.com\\/report\\/v3?s=RRI7V4%2FKORopA2%2FQFWbrrUnkFlm%2Ftd5O9SismrizB9mCRFBeF2tTFM0L%2FhbJTzPPwHYyQiOZ6ZzhjUyUc%2BkSPQla5B1BqN%2BTV3LcE2%2Fv3y9Q%2FYeQHPp6gIGrjqjfaDO8dRC3"}],"group":"cf-nel","max_age":604800}', 'NEL': '{"success_fraction":0,"report_to":"cf-nel","max_age":604800}', 'Vary': 'Accept-Encoding', 'Server': 'cloudflare', 'CF-RAY': '7e1e0dff78f0d4e5-CDG', 'alt-svc': 'h3=":443"; ma=86400'}
2023-07-05 09:57:00.811 INFO snscrape.base Error retrieving https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000: non-200 status code, retrying
2023-07-05 09:57:00.811 INFO snscrape.base Waiting 2 seconds
2023-07-05 09:57:02.815 INFO snscrape.base Retrieving https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000
2023-07-05 09:57:02.815 DEBUG snscrape.base ... with headers: {'User-Agent': 'snscrape/0.7.0.20230622'}
2023-07-05 09:57:02.815 DEBUG snscrape.base ... with environmentSettings: {'verify': True, 'proxies': OrderedDict(), 'stream': False, 'cert': None}
2023-07-05 09:57:02.938 DEBUG urllib3.connectionpool https://api.pushshift.io:443 "GET /reddit/search/submission?q=toto&limit=1000 HTTP/1.1" 403 30
2023-07-05 09:57:02.938 INFO snscrape.base Retrieved https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000: 403
2023-07-05 09:57:02.939 DEBUG snscrape.base ... with response headers: {'Date': 'Wed, 05 Jul 2023 07:57:02 GMT', 'Content-Type': 'application/json', 'Content-Length': '30', 'Connection': 'keep-alive', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains', 'CF-Cache-Status': 'BYPASS', 'Report-To': '{"endpoints":[{"url":"https:\\/\\/a.nel.cloudflare.com\\/report\\/v3?s=iwHMuR85a9T6e4AsOzZ3nlYUMI4G2ke71fL7PEhrcNRyy%2BUhlTw9OhJgogU4NAWUKAY1gXhPNQgoSAZSct65B2fLZviQvfVhJwWAS7EWe%2BG0jcjKm4ot9p11cAMDQQQLmJ3P"}],"group":"cf-nel","max_age":604800}', 'NEL': '{"success_fraction":0,"report_to":"cf-nel","max_age":604800}', 'Vary': 'Accept-Encoding', 'Server': 'cloudflare', 'CF-RAY': '7e1e0e0cc998d4e5-CDG', 'alt-svc': 'h3=":443"; ma=86400'}
2023-07-05 09:57:02.939 INFO snscrape.base Error retrieving https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000: non-200 status code, retrying
2023-07-05 09:57:02.939 INFO snscrape.base Waiting 4 seconds
2023-07-05 09:57:06.945 INFO snscrape.base Retrieving https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000
2023-07-05 09:57:06.945 DEBUG snscrape.base ... with headers: {'User-Agent': 'snscrape/0.7.0.20230622'}
2023-07-05 09:57:06.945 DEBUG snscrape.base ... with environmentSettings: {'verify': True, 'proxies': OrderedDict(), 'stream': False, 'cert': None}
2023-07-05 09:57:07.066 DEBUG urllib3.connectionpool https://api.pushshift.io:443 "GET /reddit/search/submission?q=toto&limit=1000 HTTP/1.1" 403 30
2023-07-05 09:57:07.067 INFO snscrape.base Retrieved https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000: 403
2023-07-05 09:57:07.067 DEBUG snscrape.base ... with response headers: {'Date': 'Wed, 05 Jul 2023 07:57:07 GMT', 'Content-Type': 'application/json', 'Content-Length': '30', 'Connection': 'keep-alive', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains', 'CF-Cache-Status': 'BYPASS', 'Report-To': '{"endpoints":[{"url":"https:\\/\\/a.nel.cloudflare.com\\/report\\/v3?s=sbdEsUwu7UnHrollCV0oSOt0FUSXPBvUgjqRiWXSV0A%2BNpdcPvsXdaETxaF8GYBdD0k02i5vWa8sK%2FnZnSCNU5T0VPs3FMTx5yhC7E9LkDFzczUz5ZkXmrzoHoN4%2FcQEJYqI"}],"group":"cf-nel","max_age":604800}', 'NEL': '{"success_fraction":0,"report_to":"cf-nel","max_age":604800}', 'Vary': 'Accept-Encoding', 'Server': 'cloudflare', 'CF-RAY': '7e1e0e2699d8d4e5-CDG', 'alt-svc': 'h3=":443"; ma=86400'}
2023-07-05 09:57:07.067 ERROR snscrape.base Error retrieving https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000: non-200 status code
2023-07-05 09:57:07.067 CRITICAL snscrape.base 4 requests to https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000 failed, giving up.
2023-07-05 09:57:07.067 CRITICAL snscrape.base Errors: non-200 status code, non-200 status code, non-200 status code, non-200 status code
2023-07-05 09:57:07.118 CRITICAL snscrape._cli Dumped stack and locals to /tmp/snscrape_locals_j8mi7h4g
Traceback (most recent call last):
File "/home/matthieu-inspiron/anaconda3/bin/snscrape", line 8, in <module>
sys.exit(main())
File "/home/matthieu-inspiron/anaconda3/lib/python3.8/site-packages/snscrape/_cli.py", line 323, in main
for i, item in enumerate(scraper.get_items(), start = 1):
File "/home/matthieu-inspiron/anaconda3/lib/python3.8/site-packages/snscrape/modules/reddit.py", line 219, in get_items
yield from self._iter_api_submissions_and_comments({type(self)._apiField: self._name})
File "/home/matthieu-inspiron/anaconda3/lib/python3.8/site-packages/snscrape/modules/reddit.py", line 185, in _iter_api_submissions_and_comments
tipSubmission = next(submissionsIter)
File "/home/matthieu-inspiron/anaconda3/lib/python3.8/site-packages/snscrape/modules/reddit.py", line 143, in _iter_api
obj = self._get_api(url, params = params)
File "/home/matthieu-inspiron/anaconda3/lib/python3.8/site-packages/snscrape/modules/reddit.py", line 94, in _get_api
r = self._get(url, params = params, headers = self._headers, responseOkCallback = self._handle_rate_limiting)
File "/home/matthieu-inspiron/anaconda3/lib/python3.8/site-packages/snscrape/base.py", line 275, in _get
return self._request('GET', *args, **kwargs)
File "/home/matthieu-inspiron/anaconda3/lib/python3.8/site-packages/snscrape/base.py", line 271, in _request
raise ScraperException(msg)
snscrape.base.ScraperException: 4 requests to https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000 failed, giving up.
Dump of locals
I prefer to send it in private
Additional context
No response
The text was updated successfully, but these errors were encountered:
Pushshift is effectively dead, so yeah, this is expected and can't work anymore. Pushshift was the only way to retrieve (a) useful search results since Reddit's own search is awful, (b) get all submissions in a subreddit since Reddit limits that to 1000 results, and (c) get all submissions/comments by a user due to the same limitation on Reddit.
Potentially, PullPush could serve as a replacement, but since Reddit's API changes are rolling out this month, I'll wait for that to happen before making any changes.
(If the Reddit API itself is sufficient for your purposes, I recommend using PRAW rather than snscrape.)
Describe the bug
I have the following error when I try to scrape reddit :
snscrape.base.ScraperException: 4 requests to https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000 failed, giving up.
I also tried with the python package and subreddit search but it doesn't work either.
I tried to do it from another device but the same result...
Any idea?
How to reproduce
run
snscrape -n 100 -vv reddit-search toto
Expected behaviour
Get data?
Screenshots and recordings
No response
Operating system
Kubuntu 22.04
Python version: output of
python3 --version
3.8.8
snscrape version: output of
snscrape --version
snscrape 0.7.0.20230622
Scraper
reddit-search
How are you using snscrape?
CLI (
snscrape ...
as a command, e.g. in a terminal)Backtrace
snscrape.base.ScraperException: 4 requests to https://api.pushshift.io/reddit/search/submission?q=toto&limit=1000 failed, giving up.
Log output
Dump of locals
I prefer to send it in private
Additional context
No response
The text was updated successfully, but these errors were encountered: