Skip to content

Commit

Permalink
Merge pull request #274 from yldoctrine/some_website_have_no_title
Browse files Browse the repository at this point in the history
Add robustness for pages missing title
  • Loading branch information
fhamborg authored Jul 8, 2024
2 parents d79ae58 + 3ec0c35 commit 3896c7d
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions newsplease/helper_classes/parse_crawler.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,8 @@ def pass_to_pipeline(
article['download_date'] = timestamp
article['source_domain'] = source_domain.encode("utf-8")
article['url'] = response.url
article['html_title'] = response.selector.xpath('//title/text()') \
.extract_first().encode("utf-8")
extracted_title = response.selector.xpath('//title/text()').extract_first()
article['html_title'] = extracted_title.encode("utf-8") if extracted_title is not None else ''
if rss_title is None:
article['rss_title'] = 'NULL'
else:
Expand Down

0 comments on commit 3896c7d

Please sign in to comment.