-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CDI context propagation #167
Comments
Addentum: the practise of capturing/restoring values inside a request context (as opposed to just propagating the request context entire) appears to cause values to never be destroyed, which is a horrible source of bugs, with DB connections obtained more than once per request, and never released. |
(CDI) Application context is not part of context propagation here as that context is backed by a concurrent map and all instances are shared across application. This has implications for users that define such beans. Opposed to that are request scoped beans, stored in a thread local map. Current propagation is achieved by copying req. scoped instances from current active context to the new thread where the context is activated (without any link to the original HTTP request by the way). Note that contexts can be manipulated by events, users or frameworks and can end abruptly leading to destruction of contextual instances during context deactivation and destruction. As for what you try to achieve - the "happens-before" sequence of events - this seems like a more common problem that could affect more contexts than just CDI. I might be missing something but...
this creates the immutable snapshot, in your example with no req. bean active
this applies the snapshot (makes sure there is req. context active but with no beans) and runs something on it
here you are running on the same snapshot, e.g. it is "reset" into whatever state was in the snapshot, if you want to preserve the state, you would need to capture the snapshot after |
Well, remember CP was created to alleviate the lack of pipeline-local (equivalent to thread-local for reactive pipelines), so the taking a CDI snapshot rather than propagating the entire request scope makes it completely unusable. Again, please consider the actual use-cases rather than theoretical aspects around concurrent execution, which is not what we're trying to solve here. |
That would mean we were wrong from the start, because even javadoc of |
Snapshot being immutable makes a lot of sense IMO. I don't really see a fault with purely CDI context propagation for this issue. Like I said, other contexts are likely to suffer from similar problems as you described here for CDI simply due to immutable nature of snapshots. |
The intent of documenting ThreadContextSnapshot instances as immutable is reflected in the two sentences after it in the JavaDoc,
If we consider what would happen if a ThreadContextSnapshot were to instead store state (about how to restore a previous context) and then the snapshot was applied to two threads at once, that information would be overwritten and it would become impossible to properly restore one of the threads to its previous state after the task/action completes. This was the original reason for the language stating immutable. That said, it is still quite valuable for most context types to have the context itself behave as immutable. A good example is Security context. When you create a completion stage via MP Context Propagation with Security context propagation enabled, you are intending for the action to run as the user (USER1) that is currently on the thread. If you happen to log in a different user (USER2) after that point (or within a prior stage to which you also propagated context which runs first), you still want the stage that you created under USER1 to run as USER1, not a different user. Here's an example of the above,
There are other context types, however, such as Transaction that could be viewed as semi-immutable. Transaction context can be considered immutable in that the snapshot is of a particular transaction ID which never changes. It is that exact transaction id which is re-established on the thread where the completion stage action runs. The transaction's state, however, can vary, and individual stages have the ability to mark it rollback-only or outright commit it or roll it back, and it is up to the application developer to ensure to do the right thing and coordinate between stages. This sort of context propagation might be thought of as an immutable pointer to the actual transaction, whose state can change. If I'm understanding the debate here, it's about which model CDI context propagation ought to follow. |
Right. Spot on with the Transaction context. The fundamental property of the CDI Request Context, like its Application Context, is that it's a mutable map, where lazy values can appear, and which must be destroyed at the end of its lifecycle. Those two contexts only differ in the span of their lifecycle, who can view them (everyone vs a single request), and how they are implemented (but that's irrelevant to their behaviour). At the moment, the CDI ThreadContextProvider uses snapshots of values to capture/restore not the request context, but its current state, which breaks lazy values and destroy semantics (there's never an end for them since they can be captured). I don't think this is the right way to do it, and it breaks so many things and semantics that it's hard to argue it should be. Perhaps it's indeed the snapshot immutability text that needs to be changed to explain those two models? |
I was reading through what we currently have documented for CDI context, as well as experimenting with our current implementation (based on Weld) to see how it behaves in this area. The only discussion of the meaning of CDI context appears on the JavaDoc for the CDI constant on ThreadContext,
It uses the somewhat ambiguous wording "preserved state" rather than a more precise term like "snapshot of state as of the point in time when context was captured" or "shared state across all operations participating in the request". The comments in the code example that follow do seem to hint at preserving state across multiple completion stage actions, but aren't explicit enough. There is sufficient ambiguity for an implementation to interpret it either way, and we should definitely clarify this section of doc as well as the language around snapshots. Our immediate concern should be around whether our implementations are doing the best thing for users. When I look into the current behavior of our Weld-based implementation of CDI context, I'm seeing similar results to what @FroMage was pointing out. In two separate requests I tried the following, Request 1:
Request 2
The second request (bean not used first) is outputting "DEFAULT_VALUE" rather than "DEFAULT_VALUE,A,B,C". In my opinion the behavior of request 2 above is counterintuitive, is a less useful behavior, and likely to make writing reactive applications more difficult. @manovotn do you know if it would be possible with Weld to achieve the opposite behavior and directly access the request from the other threads rather than operating on a copy of it? Are there other side-effects here that I'm not thinking of? Or other reasons why it would be undesirable to switch it? |
What @manovotn said is absolutely true from CDI point of view (immutable copy propagation). For context propagation, it was made to a copy not sharing. I did see what @FroMage and @njr-11 are trying to demonstrate look less useful as the chain is revert back to beginning. I am wondering whether Weld can be updated to propagate the context in this fashion: Now: 1->2->3. It seems 1->2 and then 1->3 |
@njr-11 directly accessing request context won't work because it operates on thread locals. Also the original task is backed by actual HTTP request which serves as a storage for beans (and we have local cache to fasten things up for repeated queries); any subsequent propagation means you are on a thread that no longer has access to HTTP request and no way of linking that. This in practice means we cannot update the actual storage even if we wanted to. So how it works: It goes along the same lines for session and conversation scopes just with HTTP session instead. Why don't we instead discuss that I think both our impls override the Futures already and can manipulate it in some way? |
@manovotn I think propagate context after each action meets my expectation as well, since we are fixing the problem of relaying instead of going back to the starting point. |
Note that what I suggested above doesn't solve the problem with things like I am mentioning this because it is what @FroMage originally used when coming up with this issue. |
Thanks for the replies - I was worried about the possibility of additional complications. Unfortunately, the approach of re-capturing context between stages isn't workable because the completion stage programming model allows for stages to depend upon multiple other stages. The linear model where each stage follows exactly one stage is an oversimplification. For example, in a scenario where an
It would be wrong for
to apply the CDI context for
We cannot have the former re-capturing CDI context from actionB and the latter from actionC. |
Yes, I'm not sure how we can turn an implementation of CDI scope context propagation from an immutable context copy to a shared mutable context without just implementing context propagation as a shared mutable context. As far as the spec is concerned, we could clarify that both styles of context propagation are possible (mutable or immutable). As far as the implementation of CDI context propagation, the issue is that the current behaviour does not work as expected, because:
Because of that, we're going to change the implementation of ArC context propagation to propagate the request context entire, as it mutates, and tell users to not attempt to modify it in parallel, or take care of synchronisation themselves. We can make the request context itself thread-safe (just as the application context is thread-safe), but as the CDI team pointed out, this won't magically make user request beans thread-safe themselves. While user application beans are traditionally written to be thread-safe. So it's not enough to make the request context thread-safe, but it at least allows users to access it from parallel threads if they want to (which we don't encourage). Now, as far as the Weld context propagation is concerned, I'm not sure what's best. I don't think the current behaviour works well in either parallel or reactive settings, so I'm not sure it should stay as-is. But that's not necessarily a spec issue, unless we start specifying what context providers should do, which I'm not sure is a good idea. |
From a user perspective I am in favor of the behavior where we snapshot the pointer to the context rather than snapshotting the CDI contextual instances that have been activated at the time the snapshot is taken. Behaviorally, this code example @njr-11 posted earlier: CompletableFuture<Void> cf = appCDIExecutor.runAsync(() -> {
requestBean.setState(requestBean.getState() + ",A");
}).thenRunAsync(() -> {
requestBean.setState(requestBean.getState() + ",B");
}).thenRunAsync(() -> {
requestBean.setState(requestBean.getState() + ",C");
});
cf.join(); I think should give the results: From an implementation side, I don't think this is possible given the current Weld APIs. In order for this to work, I think we would need an API that can set/get the entire I was able to hack into the set/getBeanStore methods via setAccessible(true) and the above test case gave us the desired behavior. However, I'm not sure if this will have any undesirable side-effects. CC @manovotn |
Current APIs in Weld totally don't allow that as they weren't designed that way in the first place. As for the approach @aguibert tried, that's a step closer, here are some thoughts from the top of my head:
To be fair I am not sure how to adapt Weld to this. There is tons of complexities and optimizations and several layers of caching. And MP CP not being clear on what use cases are supported doesn't help either. I can see sequential scenario (reactive use case Stephane brought up here) as sensible and doable at some cost. But other use cases seem too crazy and dangerous if we were to share the whole storage for propagation... |
That's certainly understandable that we would want to avoid the overhead when context propagation isn't used. What about using the request for context capture (which is always synchronous) to lazily replace the particular request scope's HashMap with ConcurrentHashMap?
If I understand correctly here, you mean 2 threads operating on the same request context both lazily access a bean for the first time, causing both to create a new instance and one overwrites the other's attempt to store and both continue using their own copy unaware of the other. Could this be addressed by replacing map.get/check null/map.put(newInstance) with map.putIfAbsent(newInstance) ? Or map.computeIfAbsent(key, functionToCreateNewInstance) if creating 2 instances and discarding one is itself a problem? Both of these operations are available on the standard Map API as of Java 8 and would be available regardless of whether the underlying hashmap is concurrent or not. Would this be doable? I'm not familiar with the CDI impl so I don't know how pervasive of changes these would be. |
Eventually the map would have the correct one but you already created two beans which might communicate with other beans (due to dependencies) and which might have post constructs that already triggered twice.
That was my first thought as well but unfortunately, that won't do it. You still already created two instances. And if you try to use |
today on the weekly hangout @manovotn, @njr-11, and I discussed a possible solution for this problem that combines a few ideas:
Proposed solutionEagerly register handles to all ContextualInstances (e.g. |
Reading over this again, here are few things coming to my mind:
|
Only the creator of the "request" can know when it ends. In the case of HTTP requests, only the container knows when it will end (sync or async) and only it can terminate it and notify of the ability to destroy things attached to it. It also has that responsability: it must do it. So trust that it does and stop worrying about it ;) Also, any thread still using the request when it has been terminated should expect bad consequences. This is not something that should be supported. |
CRAZY IDEA: for the reactive use case we could also COPY the contextual instances from one stage to the next one (i.e. something like In theory, this should be much easier to implement. |
@mkouba I like that idea, it seems implementable without requiring any changes to WELD too which is nice |
This was discussed a bit earlier. The problem is that the CompletionStage programming model allows more than just a linear progression. One completion stage can depend upon the completion of a couple of different stages. For example, you might have 1->2; 1->3; (2,3)->4; 1->5; (4,5)->6 I'm not too concerned about the destroy aspects of this and agree with the comment from @FroMage about it being a user error if they arrange stages that might run after the scope has ended. |
Yes, Martin stated that would be for the 'reactive' approach which is the sequential (or linear of you will). |
The approach where completion of one stage depends on the completion of a couple of other stages is every bit as much reactive. I am not in favor of a solution that only works properly for a favored subset of reactive patterns when the programming model itself (CompletionStage) provides and promotes other patterns that would be busted by that solution. |
Given that there doesn't appear to be any solution to this in the near term, I'm thinking of taking the following approach for our product to prevent users from getting into unexpected situations: Continue using the provided Weld interfaces as is, but detect when a bean is for the first time within a request by a completion stage action and fail the stage with an error indicating to the user that the bean must be accessed prior to capturing the context/creating the completion stage. This will hopefully avoid applications writing code that gets themselves into trouble with CDI context propagation and allow for a better solution in the future, which we can continue working on under this issue. |
That's every bit as user-hostile as leaving it in its current state. Users don't necessarily know all the beans they use and their dependencies through and through (coming from 3rd party lib for instance). They may not even have the power to access some of them directly due to that. With that mindet we might as well just do the eager creation approach and say you shouldn't rely on post construct or initializer methods. You would get a sensible propagation (since you would have a complete set of beans passed around) though at a cost of limitations and some overhead on instance creation which may or may not be huge depending on how many beans you have there and how bestial they are. |
It may just be a difference in philosophies. Although the user does not like to experience limitations or failures, I believe we are doing more of a service to users in alerting them upfront and even limiting function in order to prevent the possibility of data inconsistency that can happen without their awareness. I like the eager creation approach and the associated idea with storing references that Andy proposed, and I see the approach of raising errors in the near term as a step toward being able to adopt the eager creation/references approach in the long term. I get the impression that it will require some in-depth changes in Weld. Being able to do something more limited in the near term (detect & raise an error about inconsistent data), while remaining fully compatible with future direction seems like a good strategy to take so that we can get GA compatible implementations out there for users to start taking advantage of for the scenarios that are currently stable/possible. |
I wanted to share my use case and implementation strategy as I have attempted to implement this myself some time ago for parallel data loading. I usually have named CDI producer methods that represent some contextual data e.g. the currently logged in user, a list of data in the language of the user. @Named("currentUser")
@Produces
@RequestScoped
public User getUser(@Named("currentUserId") Integer userId) {
return entityManager.find(User.class, userId);
}
@Named("selectableLanguages")
@Produces
@RequestScoped
public List<Language> getUser(@Named("currentUser") User user) {
return Language.getLanguages(user.getLocale());
} I enjoy these beans being lazily loaded, as I only do heavy processing when it's actually used and at the same time, I avoid doing heavy processing multiple times as the beans are only constructed once and then shared. The only downside to this approach is that the first time the bean is accessed e.g. in the UI, the rendering is blocked, waiting for the heavy processing to finish. So I thought, since I usually know most beans that are going to be needed, I could try to pre-load these beans in parallel before actually starting rendering. This is when started experimenting with context propagation. I implemented something that collected all bean names that are necessary for a request and prior to executing the rendering, loading all beans in parallel, causing data to be populated in the bean store such that the rendering part could just take the data without having to wait any further. I wanted to contribute the context propagation parts as extension to DeltaSpike, but it wasn't accepted unfortunately, probably due to the hacks it used to get it working: The basic idea is, to capture context references that can be attached/detached to/from the current thread, such that the original bean store is mutated. I think this implements what @FroMage is expecting, which is also what I am expecting. I hope my use case and DeltaSpike implementation helps you and I would really like to see context propagation done with a shared mutable bean store to enable the mentioned use cases. |
@beikov thanks for sharing the use case and the original DS contribution. Also note that Furthermore it would need some way (config key?) to switch behaviour so that users who don't care about context propagation won't have to run on thread safe structures and concurrent collections which are generally slower than their non-thread-safe counterparts. If you are brave enough and have some time on your hands, you can try and compose something for Weld. PRs are most welcome ;-) |
Good to know that this doesn't solve the issue fully. My use case fortunately isn't affected by that as it doesn't initialize request scoped beans concurrently.
Up until scheduling the first runnable, the bean storage can stay unsynchronized though. You will only have to replace the bean storage with a synchronized version later and so AFAICS you can avoid the cost for users that don't require the request scope to be propagated or mutable.
I would be brave enough, but am lacking time currently. I still hope my use case and experiment encourages someone to work on this. |
I didn't see in the spec anywhere where we define how CDI should propagate its context with CP. In particular, the current implementation of CDI context propagation turns out to have confusing semantics, where it does not capture the request context entirely, but only the values currently added to the context.
My opinion was that it was obvious that when we want the request context to be captured, it has to be the entire request context (which is a mutable Map), which means being able to see all modifications to it that happened before. Not just the current values defined in the request context.
So it was implemented as capturing values currently active at this point, which is wrong IMO and pretty useless, unfortunately. So perhaps we need to clarify this in the spec?
To explain why we want to propagate the entire mutable context, here's an imperative example:
It's pretty obvious that the request context (like the application context) is a mutable
Map
that gets mutated, and "happens-before" modifications are visible to later instructions. It's a context that follows the current request.Now here's the same application in a reactive fashion:
To any reactive user, it's pretty obvious that this should work exactly the same as the imperative case. Except with the current implementation of CDI context propagation, the request context is captured before
a()
adds things to it, and restored empty so thatb()
has request-scoped beans vanish from its request scope.In practice, this causes the request scope to not really be propagated in a reactive pipeline (since "happens-before" mutations are not always visible). It differs from the imperative code, and does not follow the principle of least surprise (beans vanishing from the request context and potentially later coming back to it are very hard to debug).
It also makes lazy request-scoped beans unusable with reactive applications, since to make them properly follow the request lifecycle we would require to eagerly load them all at the start of the request, which is very bad for performance.
Finally, note that this is not regular at all with the application scope, which is propagated entirely, with the right "happens-before" mutations visible.
I didn't think this would be something the spec should define, since it's CDI-specific, and I thought pretty obvious, but perhaps we can make this generic and specify that while context-propagation is
ThreadContextProvider
-specific, we encourage behaviour to follow the "happens-before" semantics for mutable contexts, so that reactive code behaves in the same way to imperative code.The text was updated successfully, but these errors were encountered: