-
Notifications
You must be signed in to change notification settings - Fork 726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Likelihood of rate limits/IP ban from Twitter? #385
Comments
Well, recently JustAnotherArchivist has added an amazing feature. snscrape now reuses guest tokens across sessions. This prevents rate-limiting from burning through too many guest tokens. Personally I have not been banned, and I've been downloading thousands of tweets recursively... I recommend creating an "Ignore File" and writing the tweet data for every 5 tweets or so, so that you don't have to redo all your progress if/when (2.2mil tweets is a lot) you get banned. I did something similar in my recursive tweet downloading script. |
Based on my past experience, that should be fine. I've scraped many millions of tweets before in parallel without problems. I usually split such big runs up into monthly scrapes using a search query like See also #307 |
Thank you for the info! Perhaps I'm in the minority on this, but it might be helpful to others to include this sort of anecdotal information somewhere, so that people have a better idea on how much they can expect to use snscrape before being in danger of getting banned/rate limited? (Apologies if it's already available somewhere and I just didn't see it!) |
It's mentioned in some issues but not prominently. Documentation is WIP, and I agree it may be worth including some vague notes about it there. |
I have been rate-limited by IP with a cooldown of half an hour to an hour; AFAICT it is not possible to get banned. |
I've never been rate-limited before, at least that I've noticed. |
my memory is hazy but it took... maybe 10-100 concurrent threads |
Ok then that explains it, lol. I only have a few at a time |
This comment was marked as off-topic.
This comment was marked as off-topic.
https://business.twitter.com/en/blog/update-on-twitters-limited-usage.html so even if it wasn't problem before it could happen problem now |
I'm currently trying to use snscrape to download Tweets from Twitter. According to my calculations, I should be getting around 2,200,000 Tweets in total by the time it finishes. I'm concerned about the possibility of getting IP banned from Twitter as a result of this. Is this something worth being concerned about, or should I not worry?
More generally:
This tool seems like a godsend, compared to the limits of the official Twitter API. Having a more solid understanding of the "safe zone" would make me feel more comfortable with using it. I know that maintainers can't guarantee anything about rate limits or IP bans, but if anyone has experience with where they begin to set in, knowing that would help a lot!
The text was updated successfully, but these errors were encountered: