Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tesla internship validation issue #100

Closed
1 task
johndpjr opened this issue Dec 6, 2022 · 5 comments
Closed
1 task

Tesla internship validation issue #100

johndpjr opened this issue Dec 6, 2022 · 5 comments
Assignees
Labels
backend Deals with the FastAPI and web-scraping backend bug Something isn't working scraping Involves web scraping
Milestone

Comments

@johndpjr
Copy link
Owner

johndpjr commented Dec 6, 2022

Context

I've noticed that when scraping Tesla, some invalid attributes get written to the db. Sometimes, the Tesla internship title gets put into the period and year. See image below.

TODO

  • Fix?

Notes

Screenshot from 2022-12-05 20-20-21

@johndpjr johndpjr added bug Something isn't working scraping Involves web scraping backend Deals with the FastAPI and web-scraping backend labels Dec 6, 2022
@johndpjr johndpjr added this to AgTern Dec 6, 2022
@johndpjr johndpjr moved this to Backlog in AgTern Dec 6, 2022
@JeremyEastham
Copy link
Collaborator

I know why this is happening. When the regex tries to pull the period and year out of the title, it falls back to the entire title. It should fall back to an empty string or None/null. I was having trouble specifying an empty string or None/null as the default value for the regex when I was finalizing the big add websites PR. Pydantic doesn't make a meaningful distinction between a non-existent value, an empty value, and a null value in the config, so I tried to write a custom validator for it. Unfortunately, Pydantic validators are unintuitive. It appears that the current config validation is not working correctly.

@johndpjr johndpjr moved this from Backlog to Todo in AgTern Apr 2, 2023
@johndpjr johndpjr moved this from Todo to Backlog in AgTern Apr 2, 2023
@johndpjr johndpjr added this to the Sprint 1 milestone Oct 3, 2023
@johndpjr johndpjr moved this from Backlog to Todo in AgTern Oct 3, 2023
@johndpjr johndpjr added the needs more detail An incomplete ticket label Oct 3, 2023
@johndpjr
Copy link
Owner Author

johndpjr commented Oct 3, 2023

Hopefully this will be solved with #136 , so I might hold off on completing this ticket until we get that implemented

@johndpjr johndpjr removed the needs more detail An incomplete ticket label Oct 3, 2023
@karammasad karammasad self-assigned this Oct 8, 2023
@karammasad karammasad moved this from Todo to In Progress in AgTern Oct 8, 2023
@karammasad
Copy link
Collaborator

@johndpjr Now that ticket #136 won't be done at this time. Should I go ahead and attempt to solve this bug? Or is this being put off further?

@johndpjr
Copy link
Owner Author

johndpjr commented Oct 8, 2023

Does this run config work: python3 -m backend --dev --scrape-only --include-companies Tesla --save-jobs

@johndpjr johndpjr closed this as completed Oct 8, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in AgTern Oct 8, 2023
@johndpjr johndpjr reopened this Oct 9, 2023
@github-project-automation github-project-automation bot moved this from Done to Todo in AgTern Oct 9, 2023
@johndpjr johndpjr moved this from Todo to In Progress in AgTern Oct 9, 2023
@johndpjr
Copy link
Owner Author

johndpjr commented Nov 1, 2023

With the new backend, this issue will (hopefully) go away!

@johndpjr johndpjr closed this as not planned Won't fix, can't repro, duplicate, stale Nov 1, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in AgTern Nov 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Deals with the FastAPI and web-scraping backend bug Something isn't working scraping Involves web scraping
Projects
Archived in project
Development

No branches or pull requests

3 participants