Skip to content

Commit

Permalink
Merge pull request #255 from edgi-govdata-archiving/202-be-nicer-to-i…
Browse files Browse the repository at this point in the history
…nternet-archive-and-also-be-more-strict-when-loading-mementos

Be nicer to Internet Archive and also be more strict when loading mementos
  • Loading branch information
Mr0grog authored Sep 14, 2018
2 parents c0061fd + e63da2d commit 4bb08f8
Show file tree
Hide file tree
Showing 33 changed files with 1,563 additions and 260 deletions.
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,8 @@ cover/*
*.log

# Sphinx documentation
doc/build/
doc/source/generated/*.rst
docs/build/
docs/source/generated/*.rst

# PyBuilder
target/
Expand Down
18 changes: 10 additions & 8 deletions docs/source/cdx_api.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
.. currentmodule:: web_monitoring.internetarchive

******************************************************
Python API to Internet Archive Wayback Machine CDX API
******************************************************
**********************************************
Python API to Internet Archive Wayback Machine
**********************************************

Search for historical snapshots of a URL. Download metadata about the snapshots
and/or the snapshot content itself.

We implement Python clients for the CDX and Memento APIs provided by Wayback
Machine.

Tutorial
========

Expand All @@ -15,9 +18,8 @@ TO DO
API Documentation
=================

.. autosummary::
:toctree: generated
.. autoclass:: WaybackClient

list_versions
search_cdx
timestamped_uri_to_version
.. automethod:: search
.. automethod:: list_versions
.. automethod:: timestamped_uri_to_version

This file was deleted.

This file was deleted.

This file was deleted.

32 changes: 17 additions & 15 deletions web_monitoring/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,22 +28,24 @@ def _add_and_monitor(versions):

def import_ia(url, *, from_date=None, to_date=None, maintainers=None,
tags=None, skip_unchanged='resolved-response'):
# Pulling on this generator does the work.
skip_responses = skip_unchanged == 'response'
versions = (ia.timestamped_uri_to_version(version.date, version.raw_url,
url=version.url,
maintainers=maintainers,
tags=tags,
view_url=version.view_url)
for version in ia.list_versions(url,
from_date=from_date,
to_date=to_date,
skip_repeats=skip_responses))

if skip_unchanged == 'resolved-response':
versions = _filter_unchanged_versions(versions)

_add_and_monitor(versions)
with ia.WaybackClient() as wayback:
# Pulling on this generator does the work.
versions = (wayback.timestamped_uri_to_version(version.date,
version.raw_url,
url=version.url,
maintainers=maintainers,
tags=tags,
view_url=version.view_url)
for version in wayback.list_versions(url,
from_date=from_date,
to_date=to_date,
skip_repeats=skip_responses))

if skip_unchanged == 'resolved-response':
versions = _filter_unchanged_versions(versions)

_add_and_monitor(versions)


def _filter_unchanged_versions(versions):
Expand Down
Loading

0 comments on commit 4bb08f8

Please sign in to comment.