Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Likelihood of rate limits/IP ban from Twitter? #385

Open
Meorge opened this issue Feb 2, 2022 · 10 comments
Open

Likelihood of rate limits/IP ban from Twitter? #385

Meorge opened this issue Feb 2, 2022 · 10 comments
Labels
module:twitter question Further information is requested

Comments

@Meorge
Copy link

Meorge commented Feb 2, 2022

I'm currently trying to use snscrape to download Tweets from Twitter. According to my calculations, I should be getting around 2,200,000 Tweets in total by the time it finishes. I'm concerned about the possibility of getting IP banned from Twitter as a result of this. Is this something worth being concerned about, or should I not worry?

More generally:

  • How much scraping should be safe to do before worrying about getting rate-limited or IP banned?
  • If the concern of getting IP banned is valid, do these bans typically go away after a period of time, or are they permanent, etc?

This tool seems like a godsend, compared to the limits of the official Twitter API. Having a more solid understanding of the "safe zone" would make me feel more comfortable with using it. I know that maintainers can't guarantee anything about rate limits or IP bans, but if anyone has experience with where they begin to set in, knowing that would help a lot!

@Meorge Meorge changed the title Likelihood of IP ban from Twitter? Likelihood of rate limits/IP ban from Twitter? Feb 2, 2022
@TheTechRobo
Copy link
Contributor

TheTechRobo commented Feb 2, 2022

Well, recently JustAnotherArchivist has added an amazing feature. snscrape now reuses guest tokens across sessions. This prevents rate-limiting from burning through too many guest tokens.

Personally I have not been banned, and I've been downloading thousands of tweets recursively...

I recommend creating an "Ignore File" and writing the tweet data for every 5 tweets or so, so that you don't have to redo all your progress if/when (2.2mil tweets is a lot) you get banned. I did something similar in my recursive tweet downloading script.

@JustAnotherArchivist
Copy link
Owner

Based on my past experience, that should be fine. I've scraped many millions of tweets before in parallel without problems. I usually split such big runs up into monthly scrapes using a search query like keyword since:2022-01-01 until:2022-02-01 to fetch tweets from this January. Then I iterate over the months and finally check whether each monthly output file contains the expected results (e.g. whether the last result is close to midnight on the 1st).

See also #307

@JustAnotherArchivist JustAnotherArchivist added module:twitter question Further information is requested labels Feb 2, 2022
@Meorge
Copy link
Author

Meorge commented Feb 2, 2022

Thank you for the info! Perhaps I'm in the minority on this, but it might be helpful to others to include this sort of anecdotal information somewhere, so that people have a better idea on how much they can expect to use snscrape before being in danger of getting banned/rate limited? (Apologies if it's already available somewhere and I just didn't see it!)

@JustAnotherArchivist
Copy link
Owner

It's mentioned in some issues but not prominently. Documentation is WIP, and I agree it may be worth including some vague notes about it there.

@cosmicoptima
Copy link

I have been rate-limited by IP with a cooldown of half an hour to an hour; AFAICT it is not possible to get banned.

@TheTechRobo
Copy link
Contributor

I've never been rate-limited before, at least that I've noticed.

@cosmicoptima
Copy link

my memory is hazy but it took... maybe 10-100 concurrent threads

@TheTechRobo
Copy link
Contributor

Ok then that explains it, lol. I only have a few at a time

@nandanVasistaBH29

This comment was marked as off-topic.

@hyzhak
Copy link

hyzhak commented Jul 5, 2023

At a high level, we are working to prevent these accounts from 1) scraping people’s public Twitter data to build AI models and 2) manipulating people and conversation on the platform in various ways.

https://business.twitter.com/en/blog/update-on-twitters-limited-usage.html

so even if it wasn't problem before it could happen problem now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:twitter question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants