Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Unit tests #163

Merged
merged 20 commits into from
Jan 14, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 38 additions & 0 deletions .github/workflows/tests-core.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: Core Tests

on:
push:
branches: [ main ]
paths:
- src/**
pull_request:
paths:
- src/**

jobs:
tests:
runs-on: ubuntu-22.04
strategy:
matrix:
python-version: ["3.10", "3.11", "3.12"]
defaults:
run:
working-directory: ./

steps:
- uses: actions/checkout@v4

- name: Install Poetry
run: pipx install poetry

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: 'poetry'

- name: Install dependencies
run: poetry install --no-interaction --with dev

- name: Run Core Tests
run: poetry run pytest -ra -v -m "not download"
38 changes: 38 additions & 0 deletions .github/workflows/tests-download.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: Download Tests

on:
schedule:
- cron: '35 14 * * 1'
pjrobertson marked this conversation as resolved.
Show resolved Hide resolved
pull_request:
branches: [ main ]
paths:
- src/**

jobs:
tests:
runs-on: ubuntu-22.04
strategy:
fail-fast: false
matrix:
python-version: ["3.10"] # only run expensive downloads on one (lowest) python version
defaults:
run:
working-directory: ./

steps:
- uses: actions/checkout@v4

- name: Install poetry
run: pipx install poetry

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: 'poetry'

- name: Install dependencies
run: poetry install --no-interaction --with dev

- name: Run Download Tests
run: poetry run pytest -ra -v -m "download"
16 changes: 16 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

[![PyPI version](https://badge.fury.io/py/auto-archiver.svg)](https://badge.fury.io/py/auto-archiver)
[![Docker Image Version (latest by date)](https://img.shields.io/docker/v/bellingcat/auto-archiver?label=version&logo=docker)](https://hub.docker.com/r/bellingcat/auto-archiver)
[![Core Test Status](https://github.com/bellingcat/auto-archiver/workflows/Core%20Tests/badge.svg)](https://github.com/bellingcat/auto-archiver/actions/workflows/tests-core.yaml)
[![Download Test Status](https://github.com/bellingcat/auto-archiver/workflows/Download%20Tests/badge.svg)](https://github.com/bellingcat/auto-archiver/actions/workflows/tests-download.yaml)
<!-- ![Docker Pulls](https://img.shields.io/docker/pulls/bellingcat/auto-archiver) -->
<!-- [![PyPI download month](https://img.shields.io/pypi/dm/auto-archiver.svg)](https://pypi.python.org/pypi/auto-archiver/) -->
<!-- [![Documentation Status](https://readthedocs.org/projects/vk-url-scraper/badge/?version=latest)](https://vk-url-scraper.readthedocs.io/en/latest/?badge=latest) -->
Expand Down Expand Up @@ -259,6 +261,20 @@ The "archive location" link contains the path of the archived file, in local sto
## Development
Use `python -m src.auto_archiver --config secrets/orchestration.yaml` to run from the local development environment.

### Testing

Tests are split using `pytest.mark` into 'core' and 'download' tests. Download tests will hit the network and make API calls (e.g. Twitter, Bluesky etc.) and should be run regularly to make sure that APIs have not changed.

Tests can be run as follows:
```
# run core tests
pytest -ra -v -m "not download" # or poetry run pytest -ra -v -m "not download"
# run download tests
pytest -ra -v -m "download" # or poetry run pytest -ra -v -m "download"
# run all tests
pytest -ra -v # or poetry run pytest -ra -v
```

#### Docker development
working with docker locally:
* `docker build . -t auto-archiver` to build a local image
Expand Down
134 changes: 128 additions & 6 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

15 changes: 10 additions & 5 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -62,15 +62,20 @@ dependencies = [
"toml (>=0.10.2,<0.11.0)"
]


[poetry.group.dev.dependencies]
autopep8 = "*"

[tool.poetry.group.dev.dependencies]
pytest = "^8.3.4"
autopep8 = "^2.3.1"

[project.scripts]
auto-archiver = "auto_archiver.__main__:main"

[project.urls]
homepage = "https://github.com/bellingcat/auto-archiver"
repository = "https://github.com/bellingcat/auto-archiver"
documentation = "https://github.com/bellingcat/auto-archiver"
documentation = "https://github.com/bellingcat/auto-archiver"


[tool.pytest.ini_options]
markers = [
"download: marks tests that download content from the network",
]
9 changes: 7 additions & 2 deletions src/auto_archiver/archivers/twitter_archiver.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,10 @@ def download_syndication(self, item: Metadata, url: str, tweet_id: str) -> Union
result = Metadata()
tweet = r.json()

if tweet.get('__typename') == 'TweetTombstone':
logger.error(f"Failed to get tweet {tweet_id}: {tweet['tombstone']['text']['text']}")
return False

urls = []
for p in tweet.get("photos", []):
urls.append(p["url"])
Expand All @@ -135,7 +139,7 @@ def download_syndication(self, item: Metadata, url: str, tweet_id: str) -> Union

media.filename = self.download_from_url(u, f'{slugify(url)}_{i}{ext}')
result.add_media(media)

result.set_title(tweet.get("text")).set_content(json.dumps(tweet, ensure_ascii=False)).set_timestamp(datetime.strptime(tweet["created_at"], "%Y-%m-%dT%H:%M:%S.%fZ"))
return result.success("twitter-syndication")

Expand All @@ -158,7 +162,8 @@ def download_yt_dlp(self, item: Metadata, url: str, tweet_id: str) -> Union[Meta
.set_timestamp(timestamp)
if not tweet.get("entities", {}).get("media"):
logger.debug('No media found, archiving tweet text only')
return result.success("twitter-ytdl")
result.status = "twitter-ytdl"
return result
for i, tw_media in enumerate(tweet["entities"]["media"]):
media = Media(filename="")
mimetype = ""
Expand Down
Loading