Data interpretation #701

somowennie · 2023-08-19T12:35:00Z

somowennie
Aug 19, 2023

Hello, there.
I am new to twarc and I would really appreciate if someone could guide me here. I used the twarc tool to get a csv file, but I didn't quite understand some of the hashtags. an someone help me explain what all columns mean.

Answered by igorbrigadir

Aug 19, 2023

twarc-csv "flattens" the nested JSON structure of a tweet, adding some extra columns that should make it easier to work with. The full list is in https://github.com/DocNow/twarc-csv/blob/main/dataframe_converter.py#L13

The best place to see where it all comes from is the Data Dictionary here: https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet

It also does some pre-processing, like extracting a list of hashtags without the character indexes in the text, so for example, in the csv, entities.hashtags is a list of hashtags extracted from the hashtags part of the entities object:

{"hashtags": [{"start": 70, "end": 75, "tag": "Lega"}, {"start": 176, "end": 192,…

View full answer

igorbrigadir · 2023-08-19T14:42:58Z

igorbrigadir
Aug 19, 2023
Collaborator

twarc-csv "flattens" the nested JSON structure of a tweet, adding some extra columns that should make it easier to work with. The full list is in https://github.com/DocNow/twarc-csv/blob/main/dataframe_converter.py#L13

The best place to see where it all comes from is the Data Dictionary here: https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet

It also does some pre-processing, like extracting a list of hashtags without the character indexes in the text, so for example, in the csv, entities.hashtags is a list of hashtags extracted from the hashtags part of the entities object:

{"hashtags": [{"start": 70, "end": 75, "tag": "Lega"}, {"start": 176, "end": 192, "tag": "maratonamentana"}]}

The csv values for hashtags are a JSON list:

["#Lega", "#maratonamentana"]

Also, note that the csv produced is designed to be read programatically, into a dataframe in R or Pandas for example. Excel struggles to display the data, as it fails to support 64bit integers in IDs and tries to "guess" the format of the data, and makes mistakes, but if you can specify all columns as TEXT it can be coerced into displaying correctly. Be aware though that re-saving in Excel can corrupt the data, so it's best to use something else (Google sheets generally does a better job).

What format is best, original JSON or CSV, depends on what exactly you're doing.

As for the hashtags themselves there is generally no metadata about them at all - it depends entirely on the context of the tweets.

Hope that helps!

5 replies

somowennie Aug 19, 2023
Author

Thank u very much. You've been a great help to me !

somowennie Aug 21, 2023
Author

Oh, yeah, I have one more question. My API has academic access, but how can I not crawl historical tweets. How do I solve this problem.

SamHames Aug 21, 2023
Collaborator

My API has academic access

You probably had academic access - it's almost certainly been revoked. I don't think anyone has that kind of access without paying for it now, sorry.

somowennie Aug 21, 2023
Author

Is it impossible to access historical tweets through the Twitter API? Is there a better way?

igorbrigadir Aug 28, 2023
Collaborator

Unfortunately - the new API restrictions don't allow you do do anything with historical tweets

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data interpretation #701

{{title}}

Replies: 1 comment 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Data interpretation #701

somowennie Aug 19, 2023

Replies: 1 comment · 5 replies

igorbrigadir Aug 19, 2023 Collaborator

somowennie Aug 19, 2023 Author

somowennie Aug 21, 2023 Author

SamHames Aug 21, 2023 Collaborator

somowennie Aug 21, 2023 Author

igorbrigadir Aug 28, 2023 Collaborator

somowennie
Aug 19, 2023

Replies: 1 comment 5 replies

igorbrigadir
Aug 19, 2023
Collaborator

somowennie Aug 19, 2023
Author

somowennie Aug 21, 2023
Author

SamHames Aug 21, 2023
Collaborator

somowennie Aug 21, 2023
Author

igorbrigadir Aug 28, 2023
Collaborator