Twitter Happiest Hour

Example ETL process which determines the happiest hour on Twitter. The happiest hour of the day is the one containing the most of ":)" in tweets.

Note: All time-specific calculations are performed in UTC TimeZone.

Prerequisites

Download Spark 2.2.0 from https://spark.apache.org/downloads.html (Pre-build for Hadoop 2.6) - spark-2.2.0-bin-hadoop2.6.tgz, untar, create env variable SPARK_HOME pointing to the location.

Configuration

Configuration to be placed in the src/main/resources/application.conf file and should contain valid Twitter API Keys.

Test and Compile

sbt clean assembly

First part

Reads Twitter Streaming API, filters incoming tweets containing ":)", saves to json files into HappyTweetsJob.happyTweetsDir into folders partitioned by hour: hour=<hour-timestamp>.

Batch size is determined by setting HappyTweetsJob.batchSize.

Each json file is named as tweets-<first-tweet-id>.json.

Run:

java \
    -cp target/scala-2.11/twitter-happiest-hour-assembly-1.0.jar \
    -Dconfig.file=src/main/resources/application.conf \
    com.example.etl.twitter.HappyTweetsJob

Second part

One can find happiest hour during arbitrary time period [from, to).

Run:

sbt \
    -Dconfig.file=src/main/resources/application.conf \
    "runMain com.example.etl.twitter.HappiestHourJob \
    --from 2017-09-25T14:00Z \
    --to 2017-09-25T16:00Z"

java \
    -classpath "target/scala-2.11/twitter-happiest-hour-assembly-1.0.jar:$SPARK_HOME/jars/*" \
    -Dconfig.file=src/main/resources/application.conf \
    com.example.etl.twitter.HappiestHourJob \
    --from 2017-09-25T14:00Z \
    --to 2017-09-25T16:00Z

TODO

Fix transitive dependencies conflicting versions (json4s, jackson, netty, etc.)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data/happy_tweets		data/happy_tweets
project		project
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Happiest Hour

Prerequisites

Configuration

Test and Compile

First part

Second part

TODO

About

Releases

Packages

Languages

velppa/twitter-happiest-hour

Folders and files

Latest commit

History

Repository files navigation

Twitter Happiest Hour

Prerequisites

Configuration

Test and Compile

First part

Second part

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages