You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're currently using courlan via trafilatura for some crawling and found that when trying to do liveness checks for a hosts url we're being blocked due to user agent headers, however, we're unable to change them. I noticed there's some commented out code in the redirection test which the is_live_page uses that references user agent headers.
Is there any interest in supporting changing the headers or having a different one set?
Thanks.
The text was updated successfully, but these errors were encountered:
drFerg
changed the title
is_live_url is sometimes failing due to user agent blocking
is_live_page is sometimes failing due to user agent blocking
Aug 29, 2024
adbar
changed the title
is_live_page is sometimes failing due to user agent blocking
Support for custom user agents in is_live_page()
Aug 29, 2024
Hi!
We're currently using courlan via trafilatura for some crawling and found that when trying to do liveness checks for a hosts url we're being blocked due to user agent headers, however, we're unable to change them. I noticed there's some commented out code in the redirection test which the is_live_page uses that references user agent headers.
Is there any interest in supporting changing the headers or having a different one set?
Thanks.
The text was updated successfully, but these errors were encountered: