-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Provide an explicit caching API #947
Conversation
50c8fb6
to
7e357ab
Compare
ce19988
to
a83feb2
Compare
479b878
to
b1eea4b
Compare
@Morendil gave it another try after your initial feedback :) any further advise would be highly appreciated. |
3195a15
to
1cad5b7
Compare
This still isn't the right tack IMO. It leaves the existing relationships between instances untouched, with the Variable -> Holder -> Cache linkage roughly reproducing the Variable -> Holder -> Storage scheme. To sketch it out briefly, I believe the Cache belongs to the Simulation (one Cache per Simulation), and functions as a "two-dimensional dictionary" - or to put it another way, a dictionary whose keys are tuples, specifically The Cache functionality should be relatively compact, in LOC terms, at parity with existing functionality; where it starts getting potentially more complex is when we add masking. So the way I'd approach this work would be to take the point of view of the Simulation, and pretend Holders and Storage and all this stuff don't exist at all. From the perspective of the Simulation, all I need is a way to answer "do I already have a value for this variable at this period that I can use, or do I need to compute it", and of course a way to register "I've just computed a value for this variable at this period so I'll store it to avoid computing it again". Separately, test that the Cache state of a simulation can be persisted and loaded again. This might be as simple as serializing and deserializing the array; while we're still not clear on how that feature is used in practice and exactly what needs it supports, I suspect the persistence medium is an implementation detail, and the current implementation doesn't best reflect those needs. So shunting that feature "to the side" gives us more flexibility. Since this is all essentially a matter of side-effects mocks would make some sense to me in testing this; request the same computation twice on a Simulation and check that only the first request causes a call to |
Thanks a lot, starts looking like a roadmap.
If calculacte is for example : calculate = Callabe[[variable, period]] -> [...]
cache = Callavble[[f, Callabe[[variable, period]]] -> Callabe[[variable, period]] -> [...]
DiskStorage always persists on disk, and you can dump and restore as expected, it works quite nicely (there's a test for that™). What happens is that all stored files are destroyed once the cache object is garbage collected ( As for array serializing and deseriarializing, I think it's being do like : numpy.dump(enum.posible_values).
EnumArray(file, enum). In my humble prespespective, It would be simply to just: hashed_array = tuple(map(tuple, array)
array = numpty.asarray(hashed_arraty)
### in the case you'd like to have
{(variable, periosd, hashed_mask): numpy.ndarray)} One additional test would be to do a restore from a set of fixture files that are independent of the test case. In otherwords, extract holders from calculate, for example, and wrap calculate with a HOF that does the job. At that point, if you suppress cache, thing will continue to work, just without cass, as there'll be no a single line of caching code inside the functions themselvs. |
Here's what I've been thinking about masks, I can't grasp if I truly understood the idea @Morendil and @sandcha : In [95]: v = 1
In [96]: p = 2
In [97]: m = tuple(map(bool, numpy.array([True, False, True])))
In [98]: key = (v,p,m)
In [99]: state = {key: k}
In [100]: k = numpy.array([100, 200, 500])
In [101]: numpy.array(key[2]) * state[key]
Out[101]: array([100, 0, 500]) In this example, you only need the key portion to be hashable, for example of you want to cache transforming vectors of a vector as a part of the key. |
Another example : import numpy
from typing import NamedTuple
class Key(NamedTuple):
variable = 2
period = 3
mask = numpy.array([True, True, False, True])
class Cache:
state: {}
def __init__(self):
self.state = {}
def put(self, key: NamedTuple, value: numpy.ndarray) -> None:
self.state[(key.variable, key.period, tuple(map(bool, key.mask)))] = value
def get(self, key: NamedTuple) -> {}:
return self.state[(key.variable, key.period, tuple(map(bool, key.mask)))]
cache = Cache()
value = numpy.array([10, 20, 30, 40])
cache.put(Key(), value)
result = cache.get(Key())
print(cache.state)
print(result)
print(value)
>>> {(2, 3, (True, True, False, True)): array([10, 20, 30, 40])}
>>> [10 20 30 40]
>>> [10 20 30 40] |
@maukoquiroga With respect to masking see the prior discussion which outlined the algorithm. When we spoke yesterday I implied that the mask would be part of the cache key but that was an oversimplification, so it's possible we'll end up dealing with just That doesn't diminish the value of simplifying the Holders/Storage tangle. Also one important thing missing from that roadmap, which should have high priority, is figuring out how we're going to remove all the calls to There are two clusters: one of them around age computations (which is why I worked on the age-related PR in France in the first place), the other around "inversions", such as the Reform that computes gross salary based on net. As @sandcha noted the way of dealing with ages that I suggested has some downsides, but it helps getting rid of get_holder calls; @sandcha has proposed that we implement masking so that we can perform the age computations in France without the awkwardness but implementing masking requires having that Cache API which requires getting rid of holders: we can't allow circular dependencies in our Mikado map! So, perhaps a better plan is to temporarily change the age computations to that awkward method and later make them more flexible again. I would suggest that inversions should be removed entirely from France (and explicitly documented as unsupported) for the time being as they are not properly "reforms" (speculative encodings of counterfactual law) but explorations of an alternative |
I don't think of that as a goal; it would certainly be nice. In practice, the caching logic is currently interleaved with the Spiral detection logic in |
Thanks a lot @Morendil for your feedback, I'm closing as this RFC showed there might be another better strategy to deal with caching and getting rid of holders 😃 |
Connected to #887
Connected to #891
What's inside?
A possible implementation of #891, trying to keep current contracts as they're now.
This is a proposition made to separate concerns between caching and storing logic, so as to move all caching related tasks from
Holder
toCache
, and all storing logic from everywhere toMemoryStorage
andDiskStorage
.Assuming those low-level classes were meant to be private, the new
Cache
class respects the previous classes' contracts, and warns the user of deprecated methods to be definitely removed in the future.Technical changes
MemoryStorage
and move storage logic here.DiskStorage
and move storage logic here.Cache
class.What's next?
Holder
toCache
, notably the 'lookup'calculate
,set_input*
,Holder
.Thanks for your feedback!