You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Static memory allocator - if we know the schedule of the graph computations we can detect at "graph" finalization each node exactly when "goes out of scope". This can allow us to detect when this memory can be recycled and used for other nodes. This all can be done statically and the memory manager should map an id: usize to an offset in an internal preallocated buffer. The size of the buffer will be fully known at begging of calling the function and can be persistant.
Example : f = tanh(a + MatrixMul(b, c) + d). In graph or SSA form we will have something like: n0 = MatrixMul(b, c), n1 = a + n0 + b, n2 = tanh(n1). If all of the tensors are the same size we can instead use a single buffer, that is n0, n1, n2 to point to the same memory location. Since we know f this can be done before even calling f (e.g. what a standard compiler would do for your registers)
Dynamic memory allocator - essentially at run time of the function it would "request" from the memory allocator memory of certain sizes, while the manager would have its own buffer probably into buckets and provide free slots. This however requires that we call back the memory manager to free slots when they are no longer needed.
From the two 1. is preferable as it a lot more optimal than 2. as well as it can tell you before execution the memory needed. 1. Has been used in MXNet, which is why they achieve the best memory footprints, while 2. is largely used in all other frameworks.
The text was updated successfully, but these errors were encountered:
Two options exist for memory manager:
id: usize
to an offset in an internal preallocated buffer. The size of the buffer will be fully known at begging of calling the function and can be persistant.n0 = MatrixMul(b, c)
,n1 = a + n0 + b
,n2 = tanh(n1)
. If all of the tensors are the same size we can instead use a single buffer, that isn0, n1, n2
to point to the same memory location. Since we knowf
this can be done before even callingf
(e.g. what a standard compiler would do for your registers)From the two 1. is preferable as it a lot more optimal than 2. as well as it can tell you before execution the memory needed. 1. Has been used in MXNet, which is why they achieve the best memory footprints, while 2. is largely used in all other frameworks.
The text was updated successfully, but these errors were encountered: