This is an automation script for tweepy's streaming class included with some additional features allowing it to run on external server to capture data right of the bat. It has E-Mail notification and ability to update one's filter terms based on the trend's list of a particular area built into it. It stores everything into a sqlite3 database by default.
- Getting Started
- Customization
- METADATA Being Collected
- Deployment on a Linux Server (Raspberry Pi)
- Build With
- Acknowledgement
These instructions will get you a copy of the project up and running on your local machine for testing purposes. See deployment notes on how to deploy the project to run 24x7 .
First clone the Git repository from GitHub and install required libraries by:
git clone https://github.com/codeck313/twitterCapture.git
cd twitterCapture
pip install -r requirements.txt
By default it is using sqlite3. Though you can change it to other DB management system you like.
To use sqlite3 :
On Windows download the pre-compiled binaries
On Linux :
sudo apt-get install sqlite3
First create a Twitter developer account from here : Developer Twitter
After creating a new app, get your keys and tokens from the app. If you are unaware about how you can do that, follow this guide: App Twitter
After creation of a new app to get your Keys and tokens from that app follow this: Guide: Access Token
You should now have keys for these fields in settings.py
:
TWITTER_KEY = "" # "Consumer API key"
TWITTER_SECRET = "" # "Consumer API Secret key"
TWITTER_APP_KEY = "" # "Access token"
TWITTER_APP_SECRET = "" # "Access token secret"
Now add the terms that you would like to track in the tweets in the list. TRACK_TERMS
in settings.py
.Like below:
TRACK_TERMS = ["#India","python"]
This is the most basic setup required to start using this script.You can customize it further to suit your personal need.
You can customize these things to further utilize the script:
You can get mail alerts when an error happens in the code or when the trend list updated or when you capture certain amount of tweets.
By default it's using Gmail as the email provider. If you aren't using Gmail as you email service change settings SMTP_SERVER
and PORT
to meet your email service configuration.
Add your Email ID and password to Email ID you wanna send it to:
SENDER_EMAIL = "[email protected]"
PASWD = "123456" #we dont recommend it though ;)
RECEVIER_EMAIL = "[email protected]"
Add what you would like to be displayed as the subject of the mail that you will recieve.
EMAIL_SUBJECT = "Tutorial"
Now you can mention the number of tweets after which you would like an update form the script to know it has reached that count.You can mention upto 2 Tweet count no after which you would like an update.
# Will update you once the script has recorded 20 or 200 tweets or its multiple
ALERT_DURATION = [20,200]
In the script, change [0,0]
to a number (Twitter count)at which you want mail alert.
To silence the bug reports being mailed to you set the value of BUG_ALERT
to False
After that just change the boolean value of MAIL_ALERT
to True
to start using this feature.
For Gmail users only You would need to enable 3rd Party less secure apps to login in you Gmail ID for more details.
To use other database management system :
# Using MySQL database with username and password
CONNECTION_STRING = 'mysql://user:password@localhost/mydatabase'
# Using PostgreSQL database
CONNECTION_STRING = 'postgresql://scott:tiger@localhost:5432/mydatabase'
To change the table name in which the data is stored use :
TABLE_NAME='YOUR_NAME'
If you want your system to get the trend list of a particular place and use it to update your tweet tracking (or filtering) list follow along the steps
To start using this feature set boolean value of TRENDDATA_UPDATE
to True
To specify the amount of time after which you want to grab the list using Twitter API
#the time is in seconds format
REFRESH_TIME=600 #10min
Specify the Place Code of the place you want the trends to be picked off from
#For Mumbai,India it's 2295411
PLACE_CODE = 2295411
For the complete list of Cities and their WOEID code refer to this
You can mention additional tracking(or filtering) by mentioning them in the TRACK_TERMS
list
Tweet's text is collected in extended form.
About 19 different type of METADATA is stored in the database making it the user's choice to keep or discard the data her/he doesn't need afterwards instead of ending up with less data from the zero point. The different METADATA other than tweets text are:
id_str
: The unique identifier for this tweet.hashtags
: Different Hashtags used in a tweet are segregated into this field.tweet_created
: As the name suggest it has the tweet's date and time stamp.user_name
: The user name of the person who tweeted the text.user_handle
: Her/His handle.verified_status
: Indicates whether that the user has a verified account.origin_source
: From what device or service the person has tweeted. Like the good o'l Twitter for iPhone .isRT
: Boolean value of whether it's a retweet or not.coordinates
: Location of the tweet as reported by the user or the client application.user_location
: Location of the user as reported by him.place_name
: Indicate the place name that the tweet is associated with (but not necessarily originating from).user_description
: User's twitter account description.user_followers
: Number of followers the user's account currently has.friends_count
: Number of people the user is following.no_tweet_user
: Number of tweets this user has posted.user_created
: Date when the user was created.user_bg_color
: The hexadecimal color chosen by the user for their background.entities
: This provides arrays of common things included in Tweets like hashtags, user mentions, links, attached media and more. This is saved in a form of text so you would need regex to segregate different aspects from it.polarity
: Where does a person's tweet lie on the polarity scale.objectivity
: Whether a tweet presents a facts (higher objectivity) or provides a person's analysis or opinion. For more details.
In order to run the script on a external server you can do the following steps in order to make it operational 24x7.
All the .sh
scripts are in the project file with a copy of the crontab -e
log.
First open launcher.sh
in the cloned project directory. Edit the cd
command in the script so that it opens your project directory.
cd /home/pi/tweetCapture
Next up edit the connectivity.sh
line 5 to point to the place where the launcher.sh
is saved
[23]) echo "HTTP connectivity is up "
/path/to/launcher.sh;; <<<< THIS LINE
5) echo "The web proxy won't let us through";;
Now use chmod +x connectivity.sh
and chmod +x launcher.sh
to convert them into executable.
Make a folder logs in you home directory. Now type the following command into to terminal:
crontab -e
and add these following lines to it :
@reboot sh /path/to/connectivity.sh > ~/logs/cronlogRE 2>&1
*/5 * * * * sh ~/path/to/connectivity.sh > ~/logs/cronlog 2>&1
Remeber to change the /path/to/connectivity.sh to path of your connectivity.sh
file.
- The first line makes the script to run on every reboot of the system
- Second line runs the script every 5 mins.Though the python script will run if and only if its not running already.
Save the file and exit.
Now after a reboot your script should start running. If for some reason it gets closed like disconnection of the network or some error in the API. It will again start in 5 min or when ever network is established and if opted for Email notification you would be notified subsequently.
- Thanks to Jinal and Aditya for their unwavering support.
- WeWorkPlay Article on conrtab
- Gilles For the web connectivity testing tool