You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One general approach to solving this that piques my interest is a "tree-shaking" algorithm. The user would specify which collections must be output. Podio then starts by marking these collections as "live". It follows all associations backwards, across all collections, marking everything it encounters as live. Anything not live at the end does not get written. This would strike a good balance between saving space and preserving associations and hence data integrity. If the runtime cost is too high, we could reduce it by having the user specify exactly which collections need to be pruned.
That sounds like an interesting approach. I think it could work, there might be some edge cases to be considered. One potential issue is the following: All objects are identified by their ObjectID, consisting of a collectionID and an index into that collection. We would have to make sure that these are properly set before any writing happens. I think (and this needs to be verified) that this should work, because the final setting of all of these before we write things happens in prepareForWrite, i.e. as long as things are pruned before that we should be able to get the index set correctly.
See also discussion in: key4hep/k4FWCore#226
The text was updated successfully, but these errors were encountered: