Skip to content

This is a automation script for tweepy's streaming class with some included with some additional features

License

Notifications You must be signed in to change notification settings

codeck313/twitterCapture

Repository files navigation

twitterCapture

This is an automation script for tweepy's streaming class included with some additional features allowing it to run on external server to capture data right of the bat. It has E-Mail notification and ability to update one's filter terms based on the trend's list of a particular area built into it. It stores everything into a sqlite3 database by default.

Table of contents

Getting Started

These instructions will get you a copy of the project up and running on your local machine for testing purposes. See deployment notes on how to deploy the project to run 24x7 .

Prerequisites

First clone the Git repository from GitHub and install required libraries by:

git clone https://github.com/codeck313/twitterCapture.git
cd twitterCapture
pip install -r requirements.txt

By default it is using sqlite3. Though you can change it to other DB management system you like.

To use sqlite3 :

On Windows download the pre-compiled binaries

On Linux :

sudo apt-get install sqlite3

Setup

First create a Twitter developer account from here : Developer Twitter

After creating a new app, get your keys and tokens from the app. If you are unaware about how you can do that, follow this guide: App Twitter

After creation of a new app to get your Keys and tokens from that app follow this: Guide: Access Token

You should now have keys for these fields in settings.py :

TWITTER_KEY = ""  # "Consumer API key"
TWITTER_SECRET = ""  # "Consumer API Secret key"
TWITTER_APP_KEY = ""  # "Access token"
TWITTER_APP_SECRET = "" # "Access token secret"

Now add the terms that you would like to track in the tweets in the list. TRACK_TERMS in settings.py.Like below:

TRACK_TERMS = ["#India","python"]

This is the most basic setup required to start using this script.You can customize it further to suit your personal need.

Customization

You can customize these things to further utilize the script:

Enabling Mail Alert

You can get mail alerts when an error happens in the code or when the trend list updated or when you capture certain amount of tweets.

By default it's using Gmail as the email provider. If you aren't using Gmail as you email service change settings SMTP_SERVER and PORT to meet your email service configuration.

Add your Email ID and password to Email ID you wanna send it to:

SENDER_EMAIL = "[email protected]"
PASWD = "123456" #we dont recommend it though ;)
RECEVIER_EMAIL  = "[email protected]"

Add what you would like to be displayed as the subject of the mail that you will recieve.

EMAIL_SUBJECT  = "Tutorial"

Now you can mention the number of tweets after which you would like an update form the script to know it has reached that count.You can mention upto 2 Tweet count no after which you would like an update.

# Will update you once the script has recorded 20 or 200 tweets or its multiple
ALERT_DURATION = [20,200]

In the script, change [0,0] to a number (Twitter count)at which you want mail alert.

To silence the bug reports being mailed to you set the value of BUG_ALERT to False

After that just change the boolean value of MAIL_ALERT to True to start using this feature.

For Gmail users only You would need to enable 3rd Party less secure apps to login in you Gmail ID for more details.

Database customization

To use other database management system :

# Using MySQL database with username and password
CONNECTION_STRING = 'mysql://user:password@localhost/mydatabase'

# Using PostgreSQL database
CONNECTION_STRING = 'postgresql://scott:tiger@localhost:5432/mydatabase'

To change the table name in which the data is stored use :

TABLE_NAME='YOUR_NAME'

Enabling Trendlist update functionality

If you want your system to get the trend list of a particular place and use it to update your tweet tracking (or filtering) list follow along the steps

To start using this feature set boolean value of TRENDDATA_UPDATE to True To specify the amount of time after which you want to grab the list using Twitter API

#the time is in seconds format
REFRESH_TIME=600 #10min

Specify the Place Code of the place you want the trends to be picked off from

#For Mumbai,India it's 2295411
PLACE_CODE  =  2295411

For the complete list of Cities and their WOEID code refer to this You can mention additional tracking(or filtering) by mentioning them in the TRACK_TERMS list

METADATA Being Collected

Tweet's text is collected in extended form.

About 19 different type of METADATA is stored in the database making it the user's choice to keep or discard the data her/he doesn't need afterwards instead of ending up with less data from the zero point. The different METADATA other than tweets text are:

  • id_str : The unique identifier for this tweet.
  • hashtags : Different Hashtags used in a tweet are segregated into this field.
  • tweet_created : As the name suggest it has the tweet's date and time stamp.
  • user_name : The user name of the person who tweeted the text.
  • user_handle : Her/His handle.
  • verified_status : Indicates whether that the user has a verified account.
  • origin_source : From what device or service the person has tweeted. Like the good o'l Twitter for iPhone .
  • isRT : Boolean value of whether it's a retweet or not.
  • coordinates : Location of the tweet as reported by the user or the client application.
  • user_location : Location of the user as reported by him.
  • place_name : Indicate the place name that the tweet is associated with (but not necessarily originating from).
  • user_description : User's twitter account description.
  • user_followers : Number of followers the user's account currently has.
  • friends_count : Number of people the user is following.
  • no_tweet_user : Number of tweets this user has posted.
  • user_created : Date when the user was created.
  • user_bg_color : The hexadecimal color chosen by the user for their background.
  • entities : This provides arrays of common things included in Tweets like hashtags, user mentions, links, attached media and more. This is saved in a form of text so you would need regex to segregate different aspects from it.
  • polarity : Where does a person's tweet lie on the polarity scale.
  • objectivity : Whether a tweet presents a facts (higher objectivity) or provides a person's analysis or opinion. For more details.

Deployment on a Linux Server (Raspberry Pi)

In order to run the script on a external server you can do the following steps in order to make it operational 24x7.

All the .shscripts are in the project file with a copy of the crontab -e log.

First open launcher.sh in the cloned project directory. Edit the cd command in the script so that it opens your project directory.

cd /home/pi/tweetCapture

Next up edit the connectivity.sh line 5 to point to the place where the launcher.sh is saved

[23])  echo "HTTP connectivity is up "
    /path/to/launcher.sh;;  <<<< THIS LINE
      5) echo "The web proxy won't let us through";;

Now use chmod +x connectivity.sh and chmod +x launcher.sh to convert them into executable.

Make a folder logs in you home directory. Now type the following command into to terminal:

crontab -e

and add these following lines to it :

@reboot sh /path/to/connectivity.sh > ~/logs/cronlogRE 2>&1
*/5 * * * * sh ~/path/to/connectivity.sh > ~/logs/cronlog 2>&1

Remeber to change the /path/to/connectivity.sh to path of your connectivity.sh file.

  • The first line makes the script to run on every reboot of the system
  • Second line runs the script every 5 mins.Though the python script will run if and only if its not running already.

Save the file and exit.

Now after a reboot your script should start running. If for some reason it gets closed like disconnection of the network or some error in the API. It will again start in 5 min or when ever network is established and if opted for Email notification you would be notified subsequently.

Build With

Acknowledgement

About

This is a automation script for tweepy's streaming class with some included with some additional features

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published