-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Usually instant query is occasionally extremely slow #707
Comments
Not sure what the issue is atm but just wanted to give an anecdote. I had a similar issue in the past and it turned out that the planner was choosing different indexes across different analyze runs. This is because my data was so big and the values so skewed in certain directions that different samples made it think different query paths were cheaper. When you say your table is heavily indexed it makes me think this may be worth looking into. The more indexes you have the more chances the planner has of choosing the wrong one. You might not see it in psql unless you run the query/analyze a lot of times in a row. Or this might not be the issue at all. But just wanted to give you something else to consider. |
@tomasz-tomczyk can you try with these two configs and let us know if it changes things? Btw, thank you for the well written report. :) |
I know what you mean, we hit it too in the past. However, I ran it several thousand times in a loop looking for any instance taking more than 5s and never hit it :(
Thanks for the suggestion - I assume I can pass these do the
If that's the case, that unfortunately didn't seem to help There may be something to do with the fact that due to the lack of |
@tomasz-tomczyk can you try setting it under your repo configuration in |
For the record, I added @josevalim I've added it to the repo configs and haven't been able to reproduce it just yet - including running x1000 in a loop on 7 different pods at the same time, so I'm cautiously optimistic... 🤞 Not sure I fully understand what this does, but reading the linked thread and references there it might have other side effects on the other queries - i'll monitor the app today and weekend and report back next week if anything happens. |
@tomasz-tomczyk I would try both options in isolation and see the impact of each one. But basically, PostgreSQL is choosing a poor query strategy, probably based on the connection cache, and the option forces it to not do that. Also, how big is your pool size? That could help explain why it takes about 200 queries for the issue to appear:
|
I was seeing the same issue and tried setting the above properties. Seems to work fine in dev, put in prod (v15 and later v16.4) I get |
What is your production database? Is it actual Postgres or something that talks Postgres? I can confirm it exists back on PG 12: https://www.postgresql.org/docs/12/runtime-config-query.html |
It is managed by our cloud provider. Guess I'll need to ask them... Thanks @josevalim |
I've tried them on their own:
I was going to leave just the latter in, unless you'd recommend to keep the
That assessment seems spot on. We started off with pool size of 15 and the team initially assumed we're hitting a capacity on the pool size and over time of this issue expanded it to 50 - and now I was able to replicate the issue after ~250 attempts so you're exactly right. It also explains why I was able to get it to replicate faster earlier on, before the team made the change to the pool size. Not sure if this is the right place to feedback but it'd be extremely helpful to know that a query has timed out waiting for a connection in the pool vs timing out in Postgres (if that's possible) I'd be happy closing this issue. Thanks a lot for the input! Do we think it'd be helpful/possible to pass in the |
To be clear,
Please open up an issue and thanks for the follow up! |
Elixir version
1.17.2
Database and Version
PostgreSQL 15.7
Postgrex Version
0.18.0
Current behavior
We have a query doing
ilike
search on a table with 44M rows; it's heavily indexed and normally returns in ~100ms.Recently, we started seeing an issue where a search with less than 3 characters can occasionally take more than 15s hitting our configured Repo timeout.
In simplified terms, we search on
clients_patients
table which has the names of clients & patients and has an index on thecoalesce
'd names, so we can do fairly quick text search. We then join on the client & patient tables to get the full results. We limit to 15 records, so the overall data returned from Postgres isn't huge.The issue is seemingly gone if we remove the JOINs.
ORDER BY id
)psql
is always instantEcto.Adapters.SQL.query!
is always instantEcto.Adapters.SQL.explain
withanalyze
andbuffers
is always instant too, so I've been unable to see what the query plan for the "slow" version ispg_stat_activity
tableThrough adding lots of IO.inspects, we traced it to this line where the
msg_send
takes place, but it gets stuck waiting onrecv_bind
On a fresh Kubernetes pod (i.e. pod that hasn't had this query run, but others would have), the issue doesn't start taking place until the query is executed ~200 times and after that it's fairly regular 1 in 10-30 attempts will hit the timeout.
I also checked:
Process.info(self(), :message_queue_len)
when it's waiting to receive the data and it's0
every time, on both fast and slow attempts.Expected behavior
Would expect the query to be consistently fast 😅
Resolution
The issue went away after I added the following to all Repo configs:
The text was updated successfully, but these errors were encountered: