Skip to content

Commit

Permalink
be more lenient when parsing html articles
Browse files Browse the repository at this point in the history
  • Loading branch information
facundoolano committed Jan 9, 2024
1 parent 674cb5a commit d1892f4
Showing 1 changed file with 1 addition and 2 deletions.
3 changes: 1 addition & 2 deletions feedi/parsers/html.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,7 @@ def fetch(url):
metadata = scraping.all_meta(soup)

title = metadata.get('og:title', metadata.get('twitter:title', getattr(soup.title, 'text')))

if not title or (metadata.get('og:type') and metadata['og:type'] != 'article'):
if not title:
raise ValueError(f"{url} is missing article metadata")

if 'og:article:published_time' in metadata:
Expand Down

0 comments on commit d1892f4

Please sign in to comment.