A subreddit archiver.
This code is currently broken, updates will be pushed soon.
Wark is a Flask application, compatible with Python 2.7+ and Python 3.2+.
Running the app in a virtualenv setup is highly recommended: https://virtualenv.readthedocs.org/en/latest/virtualenv.html
Flask has several production setup options. One recommended one is uWSGI: http://flask.pocoo.org/docs/deploying/uwsgi/
To install all dependencies, run this command from within the virtualenv:
pip install -r requirements.txt
On first run, in order to initialize the database, run the following command:
python models.py --setup
To start the scanner daemon, run python daemon.py
.
To start the Flask development server, run python app.py
. Do not use this in
production! Instead, set up uWSGI as described above.
The settings.py
file contains some development defaults for all settings.
A list of Flask settings and their defaults is available in the Flask docs: https://flask.pocoo.org/docs/config/
A list of available SQLAlchemy settings is also available: https://pythonhosted.org/Flask-SQLAlchemy/config.html
Wark will automatically look for a local_settings.py
file to override all
settings with. Modifying settings in such a file is recommended.
In addition to all these, the following application settings are available:
WARK_USER_AGENT
: The user agent of the scanner.WARK_SUBREDDITS
: List of subreddits to scan.WARK_MAX_CACHED_ITEMS
: List of IDs to keep in cache. Increasing this value reduces database load, but can increase memory usage.WARK_REQUEST_THROTTLE
: Seconds to wait between each scan of the new queue. Note that Reddit keeps the new queue in cache for a few minutes, so setting this too low is counter-productive.WARK_POSTS_PER_REQUEST
: Amount of new posts to query for every time.