Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where is the text of the tweets? #10

Open
VictorSuarezL opened this issue Apr 12, 2019 · 9 comments
Open

Where is the text of the tweets? #10

VictorSuarezL opened this issue Apr 12, 2019 · 9 comments

Comments

@VictorSuarezL
Copy link

VictorSuarezL commented Apr 12, 2019

I recently access to the CREDBANK-data, merging all the different databases. So far I have found the main topics, score, and so on. I would love to use this corpus of tweets in a paper, but unfortunately, I can't find the original text of the tweets, where is it? Is it available in another resource? Did I miss anything?

A way of getting the original text of the tweets could be using the id of the tweet and the REST API of Twitter. But given the number of tweets and the time since they were posted, I am afraid it will not be possible or will take a lot of time. So I was wondering if it could be possible to get the text?

BTW thanks for sharing and congrats for the great job done!

@marcodegra
Copy link

marcodegra commented Apr 13, 2019

Hello,

I share with you a reply on the same topic that the creator of dataset gave me some months ago.

"... about the *.json.gz files, these data files need to be retrieved from Twitter via their IDs. As such, I can’t share the tweet files directly (against Twitter’s terms of service).

At UMD, we maintain an archive of tweets extracted from Twitter’s 1% public sample, so I was able to pull a sample of tweets from that archive (you can find a similar one at the Internet Archive’s Twitter stream grab). You could also use Twitter’s API to rehydrate tweets slowly, or you could partner with Twitter/Gnip to rehydrate all the tweets in one go.

From there, the *.json.gz files are comprised of all relevant tweets, with each tweet occupying one line in the file."

Hope you find useful this information :)

@VictorSuarezL
Copy link
Author

So many thanks @marcodegra, this is exactly what I want to know. I saw in Internet Archive's Twitter stream grab several datasets so seems pretty easy to merge the whole together. Surely this takes a while but seems faster than using Twitter's API.

Thanks again for the info! 🎉

@hansd410
Copy link

Where is the archive? Would you share the link please?

@marcodegra
Copy link

Hello hansd410, did you already try to follow the instructions to download it?

@hansd410
Copy link

Hello marcodegra.
Yes, I downloaded streaming data, but I can find twitter ID only.
I wonder I could get twitter 'text' data through archive.

@marcodegra
Copy link

Well, then please refer to the above comments.
Unfortunate I can not help you more than that :(

@hansd410
Copy link

Oh I see. I just guessed @VictorSuarezL got the archive that fit with the streaming data.
Thank you for reply, @marcodegra!

@myrainbowandsky
Copy link

So many thanks @marcodegra, this is exactly what I want to know. I saw in Internet Archive's Twitter stream grab several datasets so seems pretty easy to merge the whole together. Surely this takes a while but seems faster than using Twitter's API.

Thanks again for the info! 🎉
Could you please show where the archive is?

@AfrouzHojati
Copy link

Thank you @VictorSuarezL for your help. if you were able to get the twitter text, would you please let us know it and how did you get them.

Thank you again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants