Skip to content
This repository has been archived by the owner on Oct 10, 2024. It is now read-only.

modify status_code check to be more stringent #3

Open
rahulbot opened this issue Jan 18, 2024 · 5 comments
Open

modify status_code check to be more stringent #3

rahulbot opened this issue Jan 18, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@rahulbot
Copy link

rahulbot commented Jan 18, 2024

I'm tracking down a weird bug on a big query, running search locally connected via tunnel to prod ES index. I'm seeing a log message that seems to indicate the query is being chunked into lots of chunks, and then some of them are returning 404 but not throwing a real error.

I don't know from where, but log message I see look like this:

http://localhost:8010 "POST /v1/mediacloud_search_text_/search/overview HTTP/1.1" 200 194840
http://localhost:8010 "POST /v1/mediacloud_search_text_
/search/overview HTTP/1.1" 200 194840
http://localhost:8010 "POST /v1/mediacloud_search_text_*/search/overview HTTP/1.1" 404 30

Note the 404 on the last one. While tracking this down I noted:

if r.status_code >= 500:
raise RuntimeError("API Server Error {}: a bad query string could have triggered this. Endpoint: {},"
" Params: {}".format(r.status_code, endpoint_url, params))

Why is this >=500? When I change it to !=200 locally I get a much more helpful error. I'm worried this could be swallowing 40x errors and returning incorrect data due to the query chunking in place.

@rahulbot rahulbot added bug Something isn't working question Further information is requested labels Jan 18, 2024
@rahulbot rahulbot changed the title swallowing 40x errors? swallowing 40x errors silently? Jan 18, 2024
@pgulley
Copy link
Member

pgulley commented Jan 18, 2024

This is a remnant of when the API lived in IA's servers- I agree, there's a better pattern. I'll address

@pgulley
Copy link
Member

pgulley commented Jan 18, 2024

Indeed, this change surfaced a new error in the testing circuit to chase down

@pgulley
Copy link
Member

pgulley commented Jan 18, 2024

Ok, so- the 404 errors are returned by the api when the query returns nothing- we're using 404 parsed as 'not found'- that was why we didn't attempt to catch them earlier. I believe that the behavior you described here is probably actually because those specific chunks happened to return nothing- I would expect any other issue to surface as a 500 error.

At the very least, changing this means putting some other verbiage in place to catch 404 specifically, since that's how the api returns "0"- but really I'm thinking that supressing 404 should be the intended behavior here.

@rahulbot
Copy link
Author

Yeah. Super confusing to me. Anyway perhaps safest to do list of valid response codes and raise error if not in list?

@rahulbot rahulbot added enhancement New feature or request and removed bug Something isn't working labels Jan 22, 2024
@rahulbot
Copy link
Author

Note: non-urgent because I was wrong and the current behavior is not a bug. Proposed enhancement is to have a safe-list of valid response codes (if r.status_code not in [200, 404]:?).

@rahulbot rahulbot changed the title swallowing 40x errors silently? modify status_code check to be more stringent Jan 22, 2024
@rahulbot rahulbot removed the question Further information is requested label Jan 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants