MC RSS Fetcher

This is the Media Cloud "RSS Fetcher", it keeps a database of approximately 180K RSS and Google news sitemap feeds to fetch, shadowed from the web-search Sources database.

Then throughout the day it tries to fetch those. Every night it generates a synthetic RSS feed with all those URLs.

Files are available afterwards at http://my.server/rss/mc-YYYY-MM-dd.rss.gz.

See documentation in doc/ for more details.

Install for Test/Development under Dokku

See doc/deployment.md

Install for Stand-Alone Development

For development directly on your local machine:

Install postgresql & redis
Create and popilate a virtual environment: make install
Active the venv: source venv/bin/activate
Create a postgres user: sudo -u postgres createuser -s MYUSERNAME
Create a database called "rss-fetcher" in Postgres: createdb rss-fetcher
Run alembic upgrade head to initialize database.
cp .env.template .env (little or no editing should be needed)

mypy.sh will install mypy (and necessary types library & autopep8) and run type checking.
autopep.sh will normalize code format

BOTH should be run before merging to main (or submitting a pull request).

All config parameters should be fetched via fetcher/config.py and added to .env.template

Running

Various scripts run each separate component:

python -m scripts.import_feeds my-feeds.csv: Use this to import from a CSV dump of feeds (a one-time operation)
run-fetch-rss-feeds.sh: Start fetcher (leader and worker processes) (run from Procfile)
run-server.sh: Run API server (from Procfile)
run-gen-daily-story-rss.sh: Generate the daily files of URLs found on each day as needed (run hourly)
python -m scripts.update_feeds Incrementally Sync feeds from web-search server (run every five minutes most of the day)
python -m scripts.update_feeds --full-sync Sync all feeds from web-search server (run nightly)
python -m scripts.db_archive: archive and trim fetch_events and stories tables (run nightly)
run-stats.sh report feed and source stats to statsd/graphite/grafana for vitals page (run from Procfile).

All crontab entries set up by dokku-scripts/crontab.sh (must be run as root)

NOTE! Cloud backup of production database must be done manually: see doc/deployment.md.

Pruning of cloud backups done by system-dev-ops/postgres/prune-backups (must be installed separately).

Development Docs

doc/database-changes.md describes how to implement database migrations.
doc/stats.md describes how monitoring is implemented.

Deployment

See doc/deployment.md and dokku-scripts/README.md for procedures and scripts.

Name		Name	Last commit message	Last commit date
Latest commit History 981 Commits
dashboards		dashboards
doc		doc
dokku-scripts		dokku-scripts
fetcher		fetcher
scripts		scripts
server		server
stubs		stubs
.env.template		.env.template
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pre-commit-run.sh		.pre-commit-run.sh
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Makefile		Makefile
Procfile		Procfile
README.md		README.md
alembic.ini		alembic.ini
app.json		app.json
install.sh		install.sh
mypy.ini		mypy.ini
postdeploy.sh		postdeploy.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run-aws-sync.sh		run-aws-sync.sh
run-fetch-rss-feeds.sh		run-fetch-rss-feeds.sh
run-gen-daily-story-rss.sh		run-gen-daily-story-rss.sh
run-server.sh		run-server.sh
run-stats.sh		run-stats.sh
runtime.txt		runtime.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MC RSS Fetcher

Install for Test/Development under Dokku

Install for Stand-Alone Development

Running

Development Docs

Deployment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

mediacloud/rss-fetcher

Folders and files

Latest commit

History

Repository files navigation

MC RSS Fetcher

Install for Test/Development under Dokku

Install for Stand-Alone Development

Running

Development Docs

Deployment

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages