Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove HTML sanitization from Moped ETL (BeautifulSoup) #20801

Open
mddilley opened this issue Jan 27, 2025 · 1 comment · May be fixed by cityofaustin/atd-moped#1538
Open

Remove HTML sanitization from Moped ETL (BeautifulSoup) #20801

mddilley opened this issue Jan 27, 2025 · 1 comment · May be fixed by cityofaustin/atd-moped#1538
Assignees
Labels
Need: 2-Should Have May be painful to leave out, but the solution is still viable Product: Moped A comprehensive mobility project tracking platform for Austin, Texas Service: Dev Infrastructure and engineering Type: Bug Report Something is not right Workgroup: TPW Transportation & Public Works Department

Comments

@mddilley
Copy link

mddilley commented Jan 27, 2025

We previously updated the Moped AGOL ETL to use BeautifulSoup to prevent invalid HTML from bouncing back from the AGOL API (cityofaustin/atd-moped#1428). This applied to the project_status_update field that is input through rich-text editing. #20842 will update a setting on the AGOL feature service level to handle this task. This issue will be a follow-up to test bypassing the HTML handling that uses BeautifulSoup in the ETL and then removing this code if there are no issues.

Please avoid testing on Monday Feb. 10th since dashboards and maps powered by this dataset will be live demoed.

In Scope

  • Test bypassing the project status notes HTML handling code
  • If the ETL works without it, remove the project status notes checks and the BeautifulSoup dependency
@mddilley mddilley added Need: 2-Should Have May be painful to leave out, but the solution is still viable Product: Moped A comprehensive mobility project tracking platform for Austin, Texas Service: Dev Infrastructure and engineering Type: Bug Report Something is not right Workgroup: TPW Transportation & Public Works Department labels Jan 27, 2025
@mddilley mddilley changed the title Extend the handling of invalid HTML by the Moped AGOL ETL to other fields Remove HTML sanitization from Moped ETL (BeautifulSoup) Jan 29, 2025
@mateoclarke mateoclarke self-assigned this Feb 6, 2025
@mateoclarke
Copy link
Contributor

From Mike:

you will know the script failed if you see one of these API responses:
{'code': 400, 'message': '', 'details': ['Field project_status_update has invalid html content.']}[ - see this post](https://austininnovation.slack.com/archives/CHZE6BC6L/p1727189785934479?thread_ts=1727122151.675579&cid=CHZE6BC6L)
ValueError: {'code': 504, 'message': 'Your request has timed out.', 'details': []} (the wayyyy less helpful error) - [see this post](https://austininnovation.slack.com/archives/CHZE6BC6L/p1727122989795069?thread_ts=1727122151.675579&cid=CHZE6BC6L)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Need: 2-Should Have May be painful to leave out, but the solution is still viable Product: Moped A comprehensive mobility project tracking platform for Austin, Texas Service: Dev Infrastructure and engineering Type: Bug Report Something is not right Workgroup: TPW Transportation & Public Works Department
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants