Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add auto-discovery from popular services #225

Merged
merged 1 commit into from
Jan 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 32 additions & 32 deletions .assets/podcast-archiver-dry-run.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 7 additions & 7 deletions .assets/podcast-archiver-help.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,5 +62,6 @@ repos:
language: system
require_serial: true
pass_filenames: false
always_run: true
files: ^podcast_archiver/config\.py$
types: [python]
29 changes: 29 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,12 +62,41 @@ podcast-archiver --dir ~/Podcasts --feed https://feeds.feedburner.com/TheAnthrop

Podcast Archiver expects values to its `--feed/-f` argument to be URLs pointing to an [RSS feed of a podcast](https://archive.is/jYk3E).

If you are not certain if the link you have for a show that you like, you can try and pass it to Podcast Archiver directly. The archiver supports a variety of links from popular podcast players and platforms, including [Apple Podcasts](https://podcasts.apple.com/us/browse), [Overcast.fm](https://overcast.fm/), [Castro](https://castro.fm/), and [Pocket Casts](https://pocketcasts.com/):

```sh
# Archive from Apple Podcasts URL
podcast-archiver -f https://podcasts.apple.com/us/podcast/black-girl-gone-a-true-crime-podcast/id1556267741
# ... or just the ID
podcast-archiver -f 1556267741

# From Overcast podcast URL
podcast-archiver -f https://overcast.fm/itunes394775318/99-invisible
# ... or episode sharing links (will resolve to all episodes)
podcast-archiver -f https://overcast.fm/+AAyIOzrEy1g
```

#### Supported services

The table below lists most of the supported services and URLs. If you think that some service you like is missing here, [please let me know](https://github.com/janw/podcast-archiver/issues/new)!

| Service | Example URL |
| ------------------------------------- | -------------------------------------------------------------------------------------- |
| Apple Podcasts | <https://podcasts.apple.com/us/podcast/the-anthropocene-reviewed/id1342003491> |
| [Overcast](https://overcast.fm/) | <https://overcast.fm/itunes394775318/99-invisible>, <https://overcast.fm/+AAyIOzrEy1g> |
| [Castro](https://castro.fm/) | <https://castro.fm/podcast/f996ae94-70a2-4d9c-afbc-c70b5bacd120> |
| [SoundCloud](https://soundcloud.com/) | <https://soundcloud.com/chapo-trap-house> |

#### Local files

Feeds can also be "fetched" from a local file:

```bash
podcast-archiver -f file:/Users/janw/downloaded_feed.xml
```

#### Testing without downloading

To find out if you have to the right feed, you may want to use the `--dry-run` option to output the discovered feed information and found episodes. It will prevent all downloads.

```sh
Expand Down
2 changes: 1 addition & 1 deletion config.yaml.example
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
## Podcast-Archiver configuration
## Generated using podcast-archiver v2.0.2
## Generated using podcast-archiver v2.1.0

# Field 'feeds': Feed URLs to archive.
#
Expand Down
5 changes: 5 additions & 0 deletions cspell.config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,8 @@ words:
- venv
- virtualenv
- willhaus
- rsps
- soundcloud
- logbuch
- netzpolitik
- wochendaemmerung
3 changes: 2 additions & 1 deletion hack/rich-codex.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
TMPDIR=$(mktemp -d 2>/dev/null || mktemp -d -t 'tmpdir')

export FORCE_COLOR="1"
export TERM="xterm-256color"
export COLUMNS="120"
export CREATED_FILES="created.txt"
export DELETED_FILES="deleted.txt"
Expand All @@ -17,4 +18,4 @@ export PODCAST_ARCHIVER_IGNORE_DATABASE=true
# shellcheck disable=SC2064
trap "rm -rf '$TMPDIR'" EXIT

exec poetry run rich-codex --terminal-width $COLUMNS --notrim
poetry run rich-codex --terminal-width $COLUMNS --notrim --terminal-theme DIMMED_MONOKAI
3 changes: 2 additions & 1 deletion podcast_archiver/console.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@

_theme = Theme(
{
"error": "bold dark_red",
"error": "dark_red bold",
"errorhint": "dark_red dim",
"warning": "orange1 bold",
"warning_hint": "orange1 dim",
"completed": "dark_cyan bold",
Expand Down
2 changes: 1 addition & 1 deletion podcast_archiver/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
USER_AGENT = f"{PROG_NAME}/{__version__} (https://github.com/janw/podcast-archiver)"
ENVVAR_PREFIX = "PODCAST_ARCHIVER"

REQUESTS_TIMEOUT = 30
REQUESTS_TIMEOUT = (5, 30)

SUPPORTED_LINK_TYPES_RE = re.compile(r"^(audio|video)/")
DOWNLOAD_CHUNK_SIZE = 256 * 1024
Expand Down
4 changes: 4 additions & 0 deletions podcast_archiver/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,3 +45,7 @@ class NotModified(PodcastArchiverException):
def __init__(self, info: FeedInfo, *args: object) -> None:
super().__init__(*args)
self.info = info


class NotSupported(PodcastArchiverException):
pass
37 changes: 26 additions & 11 deletions podcast_archiver/models/feed.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
from pydantic import AliasChoices, BaseModel, ConfigDict, Field, field_validator

from podcast_archiver.constants import MAX_TITLE_LENGTH
from podcast_archiver.exceptions import NotModified
from podcast_archiver.exceptions import NotModified, NotSupported
from podcast_archiver.logging import logger, rprint
from podcast_archiver.models.episode import EpisodeOrFallback
from podcast_archiver.models.field_types import LenientDatetime
Expand Down Expand Up @@ -90,6 +90,13 @@ def truncate_title(cls, value: str) -> str:
def field_titles(cls) -> list[str]:
return [field.title for field in cls.model_fields.values() if field.title]

@property
def alternate_rss(self) -> str | None:
for link in self.links:
if link.rel == "alternate" and link.link_type == "application/rss+xml":
return link.href
return None


class FeedPage(BaseModel):
model_config = ConfigDict(arbitrary_types_allowed=True)
Expand All @@ -103,38 +110,46 @@ class FeedPage(BaseModel):
episodes: list[EpisodeOrFallback] = Field(default_factory=list, validation_alias=AliasChoices("entries", "items"))

@classmethod
def parse_feed(cls, source: str | bytes, alt_url: str | None) -> FeedPage:
def parse_feed(cls, source: str | bytes, alt_url: str | None, retry: bool = False) -> FeedPage:
feedobj = feedparser.parse(source)
obj = cls.model_validate(feedobj)
if obj.bozo and (exc := obj.bozo_exception) and isinstance(exc, SAXParseException):
url = source if isinstance(source, str) and not alt_url else alt_url
if not obj.bozo:
return obj

if (fallback_url := obj.feed.alternate_rss) and not retry:
logger.info("Attempting to fetch alternate feed at %s", fallback_url)
return cls.from_url(fallback_url, retry=True)

url = source if isinstance(source, str) and not alt_url else alt_url
if (exc := obj.bozo_exception) and isinstance(exc, SAXParseException):
rprint(f"Feed content is not well-formed for {url}", style="warning")
rprint(f"Continuing processing but here be dragons ({exc.getMessage()})", style="warning_hint")
return obj
rprint(f"Attemping processing but here be dragons ({exc.getMessage()})", style="warninghint")

raise NotSupported(f"Content at {url} is not supported")

@classmethod
def from_url(cls, url: str, *, known_info: FeedInfo | None = None) -> FeedPage:
def from_url(cls, url: str, *, known_info: FeedInfo | None = None, retry: bool = False) -> FeedPage:
parsed = urlparse(url)
if parsed.scheme == "file":
return cls.parse_feed(parsed.path, None)

if not known_info:
return cls.from_response(session.get_and_raise(url), alt_url=url)
return cls.from_response(session.get_and_raise(url), alt_url=url, retry=retry)

response = session.get_and_raise(url, last_modified=known_info.last_modified)
if response.status_code == HTTPStatus.NOT_MODIFIED:
logger.debug("Server reported 'not modified' from %s, skipping fetch.", known_info.last_modified)
raise NotModified(known_info)

instance = cls.from_response(response, alt_url=url)
instance = cls.from_response(response, alt_url=url, retry=retry)
if instance.feed.updated_time == known_info.updated_time:
logger.debug("Feed's updated time %s did not change, skipping fetch.", known_info.updated_time)
raise NotModified(known_info)

return instance

@classmethod
def from_response(cls, response: Response, alt_url: str | None) -> FeedPage:
instance = cls.parse_feed(response.content, alt_url=alt_url)
def from_response(cls, response: Response, alt_url: str | None, retry: bool) -> FeedPage:
instance = cls.parse_feed(response.content, alt_url=alt_url, retry=retry)
instance.feed.last_modified = response.headers.get("Last-Modified")
return instance
Loading
Loading