-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where is the text of the tweets? #10
Comments
Hello, I share with you a reply on the same topic that the creator of dataset gave me some months ago. "... about the *.json.gz files, these data files need to be retrieved from Twitter via their IDs. As such, I can’t share the tweet files directly (against Twitter’s terms of service). At UMD, we maintain an archive of tweets extracted from Twitter’s 1% public sample, so I was able to pull a sample of tweets from that archive (you can find a similar one at the Internet Archive’s Twitter stream grab). You could also use Twitter’s API to rehydrate tweets slowly, or you could partner with Twitter/Gnip to rehydrate all the tweets in one go. From there, the *.json.gz files are comprised of all relevant tweets, with each tweet occupying one line in the file." Hope you find useful this information :) |
So many thanks @marcodegra, this is exactly what I want to know. I saw in Internet Archive's Twitter stream grab several datasets so seems pretty easy to merge the whole together. Surely this takes a while but seems faster than using Twitter's API. Thanks again for the info! 🎉 |
Where is the archive? Would you share the link please? |
Hello hansd410, did you already try to follow the instructions to download it? |
Hello marcodegra. |
Well, then please refer to the above comments. |
Oh I see. I just guessed @VictorSuarezL got the archive that fit with the streaming data. |
|
Thank you @VictorSuarezL for your help. if you were able to get the twitter text, would you please let us know it and how did you get them. Thank you again |
I recently access to the CREDBANK-data, merging all the different databases. So far I have found the main topics, score, and so on. I would love to use this corpus of tweets in a paper, but unfortunately, I can't find the original text of the tweets, where is it? Is it available in another resource? Did I miss anything?
A way of getting the original text of the tweets could be using the id of the tweet and the REST API of Twitter. But given the number of tweets and the time since they were posted, I am afraid it will not be possible or will take a lot of time. So I was wondering if it could be possible to get the text?
BTW thanks for sharing and congrats for the great job done!
The text was updated successfully, but these errors were encountered: