Data interpretation #701
-
Hello, there. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
twarc-csv "flattens" the nested JSON structure of a tweet, adding some extra columns that should make it easier to work with. The full list is in https://github.com/DocNow/twarc-csv/blob/main/dataframe_converter.py#L13 The best place to see where it all comes from is the Data Dictionary here: https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet It also does some pre-processing, like extracting a list of hashtags without the character indexes in the text, so for example, in the csv,
The csv values for hashtags are a JSON list:
Also, note that the csv produced is designed to be read programatically, into a dataframe in R or Pandas for example. Excel struggles to display the data, as it fails to support 64bit integers in IDs and tries to "guess" the format of the data, and makes mistakes, but if you can specify all columns as What format is best, original JSON or CSV, depends on what exactly you're doing. As for the hashtags themselves there is generally no metadata about them at all - it depends entirely on the context of the tweets. Hope that helps! |
Beta Was this translation helpful? Give feedback.
twarc-csv "flattens" the nested JSON structure of a tweet, adding some extra columns that should make it easier to work with. The full list is in https://github.com/DocNow/twarc-csv/blob/main/dataframe_converter.py#L13
The best place to see where it all comes from is the Data Dictionary here: https://developer.twitter.com/en/docs/twitter-api/data-dictionary/object-model/tweet
It also does some pre-processing, like extracting a list of hashtags without the character indexes in the text, so for example, in the csv,
entities.hashtags
is a list of hashtags extracted from thehashtags
part of theentities
object: