Description
It is hard to diagnose a "stuck" timely dataflow computation, where for some reason there is a capability (or perhaps message) in the system that prevents forward progress. In the system there is fairly clear information (in the progress tracking) about which pointstamps have non-zero accumulation, and although perhaps not strictly speaking a "visualization" we could imagine extracting and presenting this information.
@antiguru recently had a similar issue, in which he wanted to "complete" a dataflow without simply exiting the worker (to take some measurements), and when he attempts this the dataflow never reports completion. The root cause was ultimately that a forgotten input was left un-closed.
One idiom that seemed helpful here was to imagine a version of the dataflow graph that reports e.g. whether operators have been tombstoned or not (closed completely, memory reclaimed). This would reveal who was keeping a dataflow open, which is a rougher version of what is holding a dataflow back. We might also look for similar idioms that allow people to ask, for a given timestamp/frontier, which operators have moved past that frontier and which have not, revealing where in the dataflow graph a time is "stuck".