Docs: http://twarc-cloud.readthedocs.io/
An AWS-friendly wrapper for Twarc for collecting Twitter data from Twitter's API.
Twarc-Cloud is a CLI that manages AWS Fargate Elastic Container Service (ECS) tasks that retrieve Twitter data from Twitter's API and stores in AWS S3 storage.
Collecting from the following Twitter API methods is supported:
- Serverless: No server to maintain or pay for when not in use.
- Use as few AWS services as possible: To reduce complexity and cost.
Thus, there is no web server, database, message queue, etc.
As of March 2019, the costs for the used AWS services are:
- Fargate ECS: (1/4 CPU x $0.04048 per CPU per hour) + 1/2 GB x $0.004445 per GB per hour) = $0.012345 per harvest per hour
- S3: $0.023 per GB per month
Costs will vary depending on how much data you collect. However, overall it is ridiculously cheap.
First:
pip install pylint
then:
python -m unittest discover
pylint *.py twarccloud tests
Twarc-Cloud is currently under-tested; writing new tests is a priority.
The Docker image justinlittman/twarc-cloud is set to autobuild on commit:
latest
is master.version-<major>
is the most recent tagged major version.version-<major.minor.patch>
is the most recent tagged version.
A Twarc-Cloud harvester running as an ECS task is tied to a major version; each time a new ECS task is started, the most recent Docker image is pulled. Any breaking change will result in a new major version.
To manually build and push a Docker image:
docker build . -t 'justinlittman/twarc-cloud:latest'
docker push justinlittman/twarc-cloud:latest
To install the requirements:
pip install sphinx recommonmark sphinx-autobuild sphinx_rtd_theme
To run a live version of the docs:
cd docs
make livehtml
A live version of the docs will available on http://localhost:8000.
- Update version in
twarccloud/__init__.py
. - Commit and push.
- Tag a release in Github named with the version, e.g., 1.0.0.
Twarc-Cloud is inspired by and borrows heavily from DocNow's Twarc and George Washington University Libraries' Social Feed Manager.