Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a profiler leveraging the contextual logging #3888

Open
geo2a opened this issue May 22, 2024 · 3 comments
Open

Create a profiler leveraging the contextual logging #3888

geo2a opened this issue May 22, 2024 · 3 comments

Comments

@geo2a
Copy link
Collaborator

geo2a commented May 22, 2024

Motivation

When we try to understand why proofs are slow, we only have either very coarse-grained or very find-grained timing data. The coarse-grained data is either wall-clock time of running kontrol and other tools, or, at the level of kore-rpc-booster, the --log-stats option which times the requests. The fine-grained tool is the GHC profiler, which gives us Haskell-level cost-centres.

To better inform our decisions on what to work on to improve performance, we need something that sits in between.

Use cases

  • Users of the backend (semantics writers) want to know:
    • whether and how often simplification rules apply
    • how often booster aborts on rewrite rules (falling back to kore-rpc)
  • backend implementors want to know:
    • how much time is spent in different phases of execute and simplify requests (internal and external)
    • statistics about the terms being rewritten or simplified (probably based on indexes to group them)

Functionality

We would like to have a proof-level profiling tool that would allow us to find out, for both Booster and Kore:

  • how much time is spent in the steps of rewrite rule and equation application, both for all rules and for specific rules/sets of rules:
    • matching
    • checking the side condition
  • how much is spent in calls to the SMT solver

Additionally, for Booster we would like to know:

  • how much is spent in calls to LLVM
  • simplifier cache utilisation
  • Nice to have: statistics about the current configuration, perhaps for certain cells only. To know how big are the terms.

Data source

We have implemented the contextual JSON logging, which, in conjunction with timestamps, should give us all the information we need. We may need to disable the buffering of the timestamps in fast-logger or, of that is not possible, figure out another to provide timestamps.

Tool workflow and design

The tool's input will be the JSON log file, with every line being a json object. We should design the tool to support streaming the JSONs from the file so that we do not load it all into memory.

The log file will be generated by running the server with context logging enabled. This can be either done directly by the downstream tooling, i.e. kontrol prove --profile or by a script that consumes bug reports and runs the server with logging enabled.

The output of the tool will likely be a Markdown/HTML document.

@ehildenb
Copy link
Member

I would start very coarse grained, and only make it more fine-grained as we discover things. I would start with:

  • Time spent selecting rewrite rules (includes rule index calculation, unification, side-condition checking).
  • Time spent in old backend.
  • Time spent simplifying configurations.

Then, whichever of those cost-centers is the biggest, break down into smaller cost-centers.

I guess the granularity can (or should) be tunable in the tool which reads and processes the logs, rather than having to re-run the proofs. So that's good because you can have very fast iteration on looking at cost-centers even for a long-running proof (if we store the log file).

@ehildenb
Copy link
Member

Also, maybe visualizing the cost-centers and transit between them is useful. So building a graph where each type of log event we care about is a node, and edges between nodes are weighted with how much cumulative time is spent on that edge. That can give us an overall idea of how control flow is broken down in a more fine-grained manner. @jberthold made some diagrams like that for the old backend a long time ago.

@jberthold
Copy link
Member

Also, maybe visualizing the cost-centers and transit between them is useful. So building a graph where each type of log event we care about is a node, and edges between nodes are weighted with how much cumulative time is spent on that edge. That can give us an overall idea of how control flow is broken down in a more fine-grained manner. @jberthold made some diagrams like that for the old backend a long time ago.

The code to process the event logs has moved to https://github.com/runtimeverification/kore-prof/tree/master/src/TSM (currently unmaintained). The legacy kore rewrite engine had (and still has) calls to insert "markers" into the event log, see here for an example (search for traceMarkerIO used in marker functions).
Only the rewrite rule application step instrumentation was merged to master back then.

I found an old file showing such a graph for the rule application during a test (evm-semantics, IIRC):
image

@jberthold jberthold assigned jberthold and unassigned jberthold May 30, 2024
rv-jenkins pushed a commit that referenced this issue Jun 20, 2024
* Introduces a JSON-enabled type for the log context
* Uses the new type in all  booster logging
* Modifies `count-aborts` to parse the json data using the new type.
Performance is on par after switching to TH-derived `FromJSON`
instances.


Conversion of `kore-rpc` logging is left for future work.
Part of  #3888

---------

Co-authored-by: github-actions <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants