Skip to content

Commit

Permalink
Merge pull request #163 from bellingcat/feat/unittest
Browse files Browse the repository at this point in the history
CI Unit tests
  • Loading branch information
pjrobertson authored Jan 14, 2025
2 parents 9cdaea8 + 6f10270 commit eebd040
Show file tree
Hide file tree
Showing 20 changed files with 514 additions and 149 deletions.
38 changes: 38 additions & 0 deletions .github/workflows/tests-core.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: Core Tests

on:
push:
branches: [ main ]
paths:
- src/**
pull_request:
paths:
- src/**

jobs:
tests:
runs-on: ubuntu-22.04
strategy:
matrix:
python-version: ["3.10", "3.11", "3.12"]
defaults:
run:
working-directory: ./

steps:
- uses: actions/checkout@v4

- name: Install Poetry
run: pipx install poetry

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: 'poetry'

- name: Install dependencies
run: poetry install --no-interaction --with dev

- name: Run Core Tests
run: poetry run pytest -ra -v -m "not download"
38 changes: 38 additions & 0 deletions .github/workflows/tests-download.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: Download Tests

on:
schedule:
- cron: '35 14 * * 1'
pull_request:
branches: [ main ]
paths:
- src/**

jobs:
tests:
runs-on: ubuntu-22.04
strategy:
fail-fast: false
matrix:
python-version: ["3.10"] # only run expensive downloads on one (lowest) python version
defaults:
run:
working-directory: ./

steps:
- uses: actions/checkout@v4

- name: Install poetry
run: pipx install poetry

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: 'poetry'

- name: Install dependencies
run: poetry install --no-interaction --with dev

- name: Run Download Tests
run: poetry run pytest -ra -v -m "download"
16 changes: 16 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

[![PyPI version](https://badge.fury.io/py/auto-archiver.svg)](https://badge.fury.io/py/auto-archiver)
[![Docker Image Version (latest by date)](https://img.shields.io/docker/v/bellingcat/auto-archiver?label=version&logo=docker)](https://hub.docker.com/r/bellingcat/auto-archiver)
[![Core Test Status](https://github.com/bellingcat/auto-archiver/workflows/Core%20Tests/badge.svg)](https://github.com/bellingcat/auto-archiver/actions/workflows/tests-core.yaml)
[![Download Test Status](https://github.com/bellingcat/auto-archiver/workflows/Download%20Tests/badge.svg)](https://github.com/bellingcat/auto-archiver/actions/workflows/tests-download.yaml)
<!-- ![Docker Pulls](https://img.shields.io/docker/pulls/bellingcat/auto-archiver) -->
<!-- [![PyPI download month](https://img.shields.io/pypi/dm/auto-archiver.svg)](https://pypi.python.org/pypi/auto-archiver/) -->
<!-- [![Documentation Status](https://readthedocs.org/projects/vk-url-scraper/badge/?version=latest)](https://vk-url-scraper.readthedocs.io/en/latest/?badge=latest) -->
Expand Down Expand Up @@ -259,6 +261,20 @@ The "archive location" link contains the path of the archived file, in local sto
## Development
Use `python -m src.auto_archiver --config secrets/orchestration.yaml` to run from the local development environment.

### Testing

Tests are split using `pytest.mark` into 'core' and 'download' tests. Download tests will hit the network and make API calls (e.g. Twitter, Bluesky etc.) and should be run regularly to make sure that APIs have not changed.

Tests can be run as follows:
```
# run core tests
pytest -ra -v -m "not download" # or poetry run pytest -ra -v -m "not download"
# run download tests
pytest -ra -v -m "download" # or poetry run pytest -ra -v -m "download"
# run all tests
pytest -ra -v # or poetry run pytest -ra -v
```

#### Docker development
working with docker locally:
* `docker build . -t auto-archiver` to build a local image
Expand Down
134 changes: 128 additions & 6 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

15 changes: 10 additions & 5 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -62,15 +62,20 @@ dependencies = [
"toml (>=0.10.2,<0.11.0)"
]


[poetry.group.dev.dependencies]
autopep8 = "*"

[tool.poetry.group.dev.dependencies]
pytest = "^8.3.4"
autopep8 = "^2.3.1"

[project.scripts]
auto-archiver = "auto_archiver.__main__:main"

[project.urls]
homepage = "https://github.com/bellingcat/auto-archiver"
repository = "https://github.com/bellingcat/auto-archiver"
documentation = "https://github.com/bellingcat/auto-archiver"
documentation = "https://github.com/bellingcat/auto-archiver"


[tool.pytest.ini_options]
markers = [
"download: marks tests that download content from the network",
]
9 changes: 7 additions & 2 deletions src/auto_archiver/archivers/twitter_archiver.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,10 @@ def download_syndication(self, item: Metadata, url: str, tweet_id: str) -> Union
result = Metadata()
tweet = r.json()

if tweet.get('__typename') == 'TweetTombstone':
logger.error(f"Failed to get tweet {tweet_id}: {tweet['tombstone']['text']['text']}")
return False

urls = []
for p in tweet.get("photos", []):
urls.append(p["url"])
Expand All @@ -135,7 +139,7 @@ def download_syndication(self, item: Metadata, url: str, tweet_id: str) -> Union

media.filename = self.download_from_url(u, f'{slugify(url)}_{i}{ext}')
result.add_media(media)

result.set_title(tweet.get("text")).set_content(json.dumps(tweet, ensure_ascii=False)).set_timestamp(datetime.strptime(tweet["created_at"], "%Y-%m-%dT%H:%M:%S.%fZ"))
return result.success("twitter-syndication")

Expand All @@ -158,7 +162,8 @@ def download_yt_dlp(self, item: Metadata, url: str, tweet_id: str) -> Union[Meta
.set_timestamp(timestamp)
if not tweet.get("entities", {}).get("media"):
logger.debug('No media found, archiving tweet text only')
return result.success("twitter-ytdl")
result.status = "twitter-ytdl"
return result
for i, tw_media in enumerate(tweet["entities"]["media"]):
media = Media(filename="")
mimetype = ""
Expand Down
Loading

0 comments on commit eebd040

Please sign in to comment.