-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lacinia hangs on parallel batching and non-batching queries #12
Comments
Hi, Thanks for the detailed report. I'll have to look into it closely, but my first instinct is that the pedestal interceptor is starting a superlifter instance per request, but as you have two queries in your request it would be shared between them, possibly in parallel, and I suspect that to be the cause of the issues. Perhaps it should create an instance per query - this should be possible by inspecting the lacinia payload. |
I wasn't sure if that was the case as I couldn't replicate the issue on your pet example, also my knowledge of dataloader & urania is limited. But now that I have more time to think about it, sharing the same superlifter via multiple parallel queries would probably cause other subtle problems (i.e two parallel resolvers sharing the same bucket). So perhaps separating them is the way to go no matter what? I found out you can get the query info via
Being a clojure newbie, I would love to hear what you think. Thanks! |
Looking closer at lacinia API, I found out you can write to the context map in a way so that only nested resolvers have access to the added keys: I feel this is the better way to inject superlifter. |
I've looked into this a bit and I found I could get inconsistent results when running multiple queries at the same time, but it didn't lock up for me. I started to look into how to create a superlifter per query but it's not as straightforward as I'd hoped. My plan was to create a superlifter per query at the pedestal interceptor layer with a key that could then be derived from the lacinia context inside the resolvers, meaning that the |
One more thing to mention in the meantime is I would generally recommend a time trigger just as a way of avoiding hanging indefinitely in unexpected scenarios. It could be a big interval, like 500ms. |
I have been busy at my job and haven't got the time to work on my side project for the past two weeks. These only happen in cases where my parallel resolvers trigger buckets with the same ID (in the above case, its |
Hi, Could you try While #15, or having a separate superlifter per query would probably solve your issue here, it's not the most elegant and would also make it slightly less efficient. I'm going to have a think about how to solve #13 which I hope will fix your use case too. The hanging on error I think is probably a different issue - the promise should be rejected and it should be able to continue in the normal way. I'll check this separately. Thanks |
I upgraded to After a few hours, I found out that the change in logging config was the reason why the bug went away. I can reproduce the bug consistenly now with my old logging config (default level = I'm using |
Hi @bday2009 If I understand correctly, it does not hang when you log superlifter at debug with 0.1.2? And that with 0.1.2-SNAPSHOT it does not hang regardless of logging levels? Are you seeing 'overwriting bucket' in your logs? And does that happen just before a hang? Thanks for helping me get to the bottom of all this. |
Hi there, It hangs on both versions when I changed the logging level for superlifter. I should have written "After a few hours, I found out that the change in logging config was the sole reason why the bug went away, not the change in version" :) When I was on 0.1.2-SNAPSHOT with |
I think I was able to reproduce this with parallel queries and just a size trigger on the bucket. I have some ideas about the cause too so I think I can make some progress on this. Thanks again for your input |
Note here that we have 3*2 fetches. The first 3 are enqueued and a fetch is triggered, but another muse is enqueued before the fetch can take place. The fetch therefore fetches 4 muses in one go. The remaining 2 muses when enqueued are not enough to reach the trigger's threshold so they sit in limbo. |
https://github.com/oliyh/superlifter/blob/master/src/superlifter/core.cljc#L86 Possible fix 1: instead of Possible fix 2: The swap function needs to be composed with triggers that fire based on the state of the queue. The queue should be composed of Also, the |
A further problem related to overwriting buckets - if the overwritten bucket has 1 item in it, it will be fetched when that bucket is shut down. That might mean that when its two siblings are added into the successor bucket that it never reaches its threshold. I think we need to make overwritten buckets combine their queue with the new one, by enqueuing everything into the new bucket. |
Buckets are contained in an atom and themselves contain an atom for the queue. I think perhaps the queue atom should be removed and instead the bucket should be an immutable data structure, that way we don't have two atom boundaries to think about. This seems to be a problem already with overwriting buckets, that the old bucket can be cleaned out but it's possible that another thread still has a reference to it's queue and can enqueue items even while it is being shut down. Instead we need fully atomic operations on buckets. |
Hi @bday2009 I have spent quite a bit of time on this and found several issues. I believe them all to be resolved now - could you try the latest Thanks |
Thank you for your efforts. I quickly tried the new version and still ran into the problem. Query
Query
The request hangs in both cases. There are 3 customers, the queried customer has 6 addresses. |
Querying a single customer also fails now, it was working in 0.1.2:
|
Hi, I think I understand what's going on with your first example - you are creating a bucket of size 3 for Your second and third examples are confusing, they seem to indicate that there is no trigger for the |
First query now starts working but it still hangs sometimes.
When it hangs:
Second and third queries seem to work now. If the only thing you did for |
Hi, Thanks once again for your quick feedback. I added some logging but I also ensured that muses go into the default bucket if the named bucket does not exist, as I suspected that your singular customerById does not create the customer-addresses bucket. This is proved by the following from your working example:
This one works because the singular one comes first, goes into the default bucket (with a threshold of 1) and is therefore fetched. The plural customers one comes after, creates the bucket of three and then filling it to the threshold and it's then fetched and all is good. If they happen to be reversed, as in your second example, the plural one runs first, creates the bucket, fills it with three muses which are fetched, and then the singular one goes into that bucket but then sits there forever because the threshold is three. What you 'should' be doing for consistency is calling |
I believe I have removed race conditions from superlifter so adding/removing logging should not affect it now, rather what remains is a deterministic issue depending on the order in which it is called by the resolvers (which itself is not deterministic, hence why the macro behaviour is not deterministic). |
Hi again @bday2009 I have added a new kind of trigger called It's usage can be seen in the example, which I have updated to remove the calls to Note that it now also specifies the I have updated the readme on that branch to describe the elastic trigger, and to suggest better use of the existing queue-size trigger. This is now deployed as a new Hope this makes sense and works for you! |
hi @oliyh, thank you for your hard work. I think I understand this, however I have a couple of questions I hope you can help clarify:
|
Hi,
I think there are ways to improve lacinia integration, I created #16 as one such way - if you have other suggestions I'd like to hear them. Hope this helps. I may have misunderstood your question (2) please clarify if I have. |
I'd like to clarify (2) and (3):
This would update But yeah your (2) and (3) answers addressed another one of my concerns that I didn't mention in the earlier comment. |
Thanks for clarifying, I understand now. Yes I agree, I already ran into this issue with the example project test, which does the same query twice with different aliases. There are three pets, but the pet details resolver is called 6 times. By setting a bucket of 3 it filled it twice which isn't bad, but it would be better if it knew it was going to be 6 (or zero, as you mention). I am hoping that this information is available in the lacinia context and furthermore my idea in #16 was that the pet details resolver would declare its own bucket in the schema, that way superlifter could look ahead and set up the correct sized buckets for any child resolvers for you. This all may or may not be possible, for now hopefully it's at least fully functional at this low/medium level and we can aspire for a more beautiful high level API for lacinia. |
Lacinia already has a helper to look ahead at the child selections, see selections-seq So you could write it now like this pseudo code (selections-seq doesn't return a set, but a list you'd have to filter) (with-superlifter context
(cond-> (prom/promise {:id (:id args)})
(contains? (selections-seq context) :pet-details)
(s/update-trigger! :pet-details :elastic
(fn [trigger-opts _pet-ids]
(update trigger-opts :threshold inc))))))
|
Hello there. First of all, thank you for this awesome library, its the only one I could find that does what I need and supports lacinia out of the box. I'm having some issues using it for a SQL-backed GraphQL API. It's based on the pet example in this repo, I added one more query to fetch a single record. It's a very naive implementation as I'm not sure how to approach this. Ultimately I'd like to have different mechanisms of resolving customer addresses for single non-batching (customer_by_id) and batching (customers) cases. I haven't achieved this, in the example below, both cases use the same query
select * from addresses where customer_id in
, and the API hangs inconsistently when I send parallel queries. Hope to get some guidance on this. Thank you in advanced.This is an excerpt of my schema:
My resolvers:
I go to graphiql and do send this combined query:
Half of the times, it resolves just fine (when the second query resolves first). Other half, the request hangs with pending status. If I do the queries separately, they resolve fine.
The text was updated successfully, but these errors were encountered: