Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contra Costa news has obfuscated URLs #173

Open
Mr0grog opened this issue Jan 15, 2021 · 0 comments
Open

Contra Costa news has obfuscated URLs #173

Mr0grog opened this issue Jan 15, 2021 · 0 comments
Labels
enhancement New feature or request help wanted Extra attention is needed news Related to scraping news (rather than data)

Comments

@Mr0grog
Copy link
Collaborator

Mr0grog commented Jan 15, 2021

Some news items in Contra Costa are coming through with obfuscated URLs like:

{
  "id": "https://urldefense.com/v3/__https://youtu.be/PZXjV4tFFdA__;!!LFxATBw!R57VawoklPt_lKiaxV8GyPBZz3SVfJ-u4jcwWzvx-x6ltcNKyKswx1dx4suE$",
  "url": "https://urldefense.com/v3/__https://youtu.be/PZXjV4tFFdA__;!!LFxATBw!R57VawoklPt_lKiaxV8GyPBZz3SVfJ-u4jcwWzvx-x6ltcNKyKswx1dx4suE$",
  "title": "Video:\u00a0Update on COVID-19 Vaccine Distribution",
  "date_published": "2021-01-15T00:00:00-08:00"
}

It might be good to check any URLs that aren’t under the cchealth.org or contracosta.ca.gov domains and, if they are redirects, substitute the redirect target for the original URL we had. So, for example, the example item above would wind up with the URL https://youtu.be/PZXjV4tFFdA.

Nice to have here: while we’re at it, we could add a youtube or video tag to news items that link to YouTube.

@Mr0grog Mr0grog added enhancement New feature or request help wanted Extra attention is needed news Related to scraping news (rather than data) labels Jan 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed news Related to scraping news (rather than data)
Projects
None yet
Development

No branches or pull requests

1 participant