-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CaffeineCache as backingmap can have inconsistent data after eviction #136
Comments
I am not familiar with Coherence's code, but you might be describing this scenario in our Guava migration guide.
Caffeine cannot directly resolve this without internal access to the hash table, which would require maintaining patches to replace internal apis (like |
Thanks, Ben I don't think this will be a big problem for caffeine itself, or even using caffeine as front/local cache. However, when using it as backing map, there are two layers holding the entries, it should have same entries, since the evict is not synchronous operation in caffeine, I am hoping it could somehow handled in coherence code, make the change reflect in coherence layer synchronously or some other ways. Haven't read all the code, so just some thoughts. |
@aseovic helped me quite a bit, but admittedly I didn't really know what I was doing when writing the adapter. The I tried to write tests so maybe someone can reproduce the problem in isolation. Even though the interfaces provide good separation, there might be some implicit implementation detail depending on the older LocalCache's behavior. |
@lyiu18 This one is tricky...
What I don't understand, though, is why you are evicting individual entries and then calling That said, the only reason to use
Basically, calling
will remove the entry from both Does that address your concern, @lyiu18? |
I wonder if something like this would be worth changing to? public void evict(Object oKey)
{
f_cache.asMap().computeIfPresent(oKey, (k, oValueOld)
{
notifyEviction(k, oValueOld, RemovalCause.EXPIRED);
return null;
});
} Currently it simply sets the entry's TTL to zero, which implicitly forces it to be expired and eventually evicted. For whatever reason, I didn't consider |
To Aleks’s question What I meant by two layers, I think there are some data hold in PartitionedCache/ReadWriteBackingMap directly In fact, when clear is called, the code will eventually call PartitionedCache#Storage method public Set collectKeySet(PartitionSet partMask, boolean fSnapshot) This method will use PartitionedKeyIndex instead of internal map(readwritebackingmap/caffeine map) Thus the data is different, I assume the index map will get updated when caffeine map do actually evict/callback functions If it directly calls caffeine map I assume it will be fine, I didn’t test thoroughly, but when my debug window display all the entries in the caffeine map, the evicted key is not there (though I am not sure it is timing issue or not) For using Actually that is exactly what I used, it eventually goes to PartitionedCache#Storage method removePrimaryResource, it will call evict on the internal map, and problem started to show To Ben’s question I was thinking about same change, but I am not familiar enough to see how it should get changed or if that will make PartitionedCache (and its index etc) updated quickly, but may worth a try Some background how we found this problem, It is actually by accident, this is one of the test cases we have, that’s why we used a dummy cache loader that will throw exception (if we use a real cache loader, probably when you will see nothing, but there will be extra/waste cache loading operation), and the clear function is used for clear out cache to prepare for next unit test. My main concern is not just about this case, it mainly how we can make sure the data will be consistent in those different layers (PartitionedCache, and its internal CaffeineCache), will that causes more problems than just this It seems evict function schedule the work to be completed asynchronously, is there any other function has similar effects? Or there will be some function in PartitionedCache that needs to be modified etc? |
In Caffeine's eviction, yes. It tries to scale concurrent, independent writes rather than perform them in lock-step, which is especially bad for computations that may call a user's cache loader. Therefore we split the hash table and policy to operate under different locks and synchronize the two using a ring buffer, so that after the hash table update (external view) writers can schedule their changes against the policy (internal view) and the two will asynchronously become consistent. If a lock-step motion is required then the adapter should (a) use its own lock for all writes and (b) call The proposed change to Coherence's |
Yes, if changed to notifyEvicted call there, it does pass the tests I have. |
I see... Didn't realize it eventually ends up calling That said, Ben's proposed change seems reasonable, and if it passes your tests we should be good to go. We'll discuss it internally tomorrow and assign someone to make the change and back port it to all the code lines that have Caffeine support. Thanks, @ben-manes for figuring out the solution, and @lyiu18 for reporting it. |
Use caffeine as backingmap, saw possible inconsistency after evict entry from cache
The cache setup as following
instead of use it directly as shown in the document, added caffeine inside read-write-backing-map, as I will need to integrate cache-store
A dummy cacheloader is used, which will throw exception when accessed, this is used for simply show the problem
Insert data into cache normally
evict cache right before issue cache.clear
the evict will eventually call this line
https://github.com/oracle/coherence/blob/main/prj/coherence-core-components/src/main/java/com/tangosol/coherence/component/util/daemon/queueProcessor/service/grid/partitionedService/partitionedCache/Storage.java#L8919
and it seems CaffeineCache will record the eviction but may not synchronously finish the operation
then the cache clear will hit this line
https://github.com/oracle/coherence/blob/main/prj/coherence-core-components/src/main/java/com/tangosol/coherence/component/util/daemon/queueProcessor/service/grid/partitionedService/partitionedCache/Storage.java#L1937
which will return the keys including the key just evicted
code following this will try to remove key one by one, and found the evicted key already gone from cache, then it will try to reload it from cache loader, which will trigger the exception from dummy cache loader
Is this intended behavior? main concern here is if there will be more cases we will see the inconsistency, causing more troubles
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: