RFC: Address graph building performance and memory usage #927

MauricioUyaguari · 2022-03-09T20:11:23Z

Overview

The problem: when loading a huge project, depending on the size, Studio can take several minutes to load or in the worst case, crash the browser. We need to address this problem in, potentially, 3 aspects: performance optimization and memory usage optimization and scale-up resource.

We did some profiling, looks like the perceived performance of graph builder is impacted most visibly by SDLC being slow to return entities and the fact that we introduce mobx when we don't really have to (we could enable mobx when we start change detection instead). We tried to crudely lean down source information coming back in SDLC (using an older project structure) and removing makeObservable in meta models and the graph. This in total boost the perceived performance by 30-35%. This is significant, but not enough as it doesn't really address 2 problems:

Users are clueless about what takes so long. We should give more feedback about this so we can improve the perceived performance: i.e. add loading bar with info on loading stages
We still build the full graph so it's just the matter of time before users hit the ceiling of memory-usage and crash the browser. So we should address this problem as well. Some ideas include: build partial graph or lazy-build.

Implementation plan

Profiling

Turn off mobx in metamodels: timing for graph-building improves 10-20%
Turn off source information seems to improve entities loading time 50%: we say "seems" because we test using a different project structure, not on the same project structure and just pruning source information.
Compare engine (compilation) and studio (graph-building): apparently, engine is much faster than Studio, Studio performs badly at first phase: deserialize and indexing/initializing, other than that, the performance is somewhat acceptable

Performance optimization

Immediate measure before optimization

Before embarking to improve performance, we should fix the UX by being transparent about the graph builder progress. Hence, we must give better feedback and monitoring around graph-building:

Log the build time for steps in better details (for system, dependencies, generation)
Produce graph builder report: including step timing and the number of elements
Send telemetry about these to monitor the timing
During coming back from text-mode or compilation, we probably send telemetry as well

Identify and focus on critical workflow performance optimization

Since we need to optimize around the graph-loading experience, we first should ask the question: What is the purpose of building the graph, is it really the blocker and the pre-requisite for all else? If not, what critical flows can we optimize to improve UX

Taxonomy

For Taxonomy, the critical flow is building datapsace, so we should improve dataspace loading: #936

Simplify algorithm to render diagrams: so we don't need to build out the graph - Feature request: Improve UX when loading dataspaces #936
Optimize Legend Query Setup for Create New Queries and Queries from Services #967

Query

For Query, the critical flow is to build the graph, there's currently no way to avoid this. However, there are certain tasks we can do to avoid relying too much on the graph and potentially we could consider disable processing for certain parts of the graph that is not needed for query builder, such as file generation, text, diagram, etc.
MAYBE we only build UML and diagrams, but we could potentially look at the diagrams to scan which models we need to build? - Feature request: Improve UX when loading dataspaces #936
Move the property mappedness check to engine - Feature request: Make use of engine for property mappedness check in query builder #1069
Optimize Legend Query Setup for Create New Queries and Queries from Services #967

Studio

For Studio, graph builder serves the purpose of building editor form (analytics and linking) and change detection, as such, we could:

Separate change detection from graph-builder: hence, we could move forward with separating mobx from graph-builder flow, we will activate observability after graph-builder completes - separate mobx from metamodel and graph logic (part 1) #1000
Feature request: Allow viewing elements in grammar or JSON form when loading the graph #1071
Enhancement: remove graph building in text mode to improve performance #966

Optimization by deferred/lazy loading

Here we can consider doing something like Pure, where the graph unfolds as we access them. This requires some rewrite at meta level on how we access properties of metamodel. The initial graph is only built partially, only enough to build out the package tree perhaps, then as we access the elements, we start building the graph

More to discuss. Check with @pierredebelen and @kevin-m-knight-gs on how it's been done in Pure

Loading/Saving time optimization

Feature request: Improve SDLC operations' load time #1052

Micro-optimization code

There are micro code-construct which is written in a slightly inefficient manner, but can add-up and hurt performance significantly:

This is a silly one IMHO, but worth trying: replace forEach by for ... of .... Technically map also suffer the same penalty, but it's a big thing to refactor to stop using map. Arguably, we don't need to try these, but we could also look for replacing find using Set when possible. - I think it's really silly and not worth exploring at all.
https://leanylabs.com/blog/js-forEach-map-reduce-vs-for-for_of/
https://github.com/thlorenz/v8-perf/blob/master/language-features.md#iterating-maps-and-sets-via-foreach-and-callbacks
https://github.com/thlorenz/v8-perf/blob/master/language-features.md#array-builtins
https://medium.com/@ExplosionPills/map-vs-for-loop-2b4ce659fb03
When building the graph, the logic to resolve reference using section index takes a lot of time, we could potentially turn this off when there is not section index, especially for lambdas V1_resolvePathsInRawLambda, consider the usefulness of TEMPORARY__disableRawLambdaResolver - optimize path and element resolution when building graph #1068
We notice a performance bottle neck when we index and initialize graph, this could be due to Package.addChild() and Package.getOrCreatePackage() method (the former could be improved by using push() instead of being so defensive, the latter could be optimized by constructing at top level a map of path to packages for quicker lookup, remember to cleanup in delete and rename though). - improve graph builder performance by bypassing duplication checks in tree indexing #973
~~Optimize getExtraBuilderOrThrow (maybe with caching) as this is used a lot in the builders~~ - we already did caching
Optimize methods in DependencyManager, e.g. getOwnProfile = (path: string): Profile | undefined => this.models.map((dep) => dep.getOwnProfile(path)).find(isNonNullable); is expensive. - optimize path and element resolution when building graph #1068
Also, we should optimize methods such as getNullableElement() and getNullablePackage(), for example, maintain an index of elements (including packages), that is not observable and we also need to take rename/remove/addElement() into consideration - optimize path and element resolution when building graph #1068
Skipping serialization might not be a bad idea, but should we really do that? - Feature request: Optimize graph builder by skipping protocol (de)serialization #972
Skip unnecessary post processing. Uses getElement method and only used to freeze read only method. We can remove this in non editable modes. disable editing specific post processing for non editing graph #992 and skip post processing for dependencies #1001

Memory usage optimization

Investigate memory leaks:
- Bug: Investigate potential memory issue with Legend Query and Legend Taxonomy #1092
- Bug: Memory-leak issue when working with large model in text-mode #257
consider turning off keepAlive for certain things like class subtypes and supertypes to avoid leakage - https://medium.com/terria/when-and-why-does-mobxs-keepalive-cause-a-memory-leak-8c29feb9ff55

Scale-up resource

Consider using Electron to escape browser's resource limitation - #718: This will also enable us to do more optimization, such as local-file access, SDLC deployment locally (better stability than targeting gitlab)

The text was updated successfully, but these errors were encountered:

akphi · 2022-03-12T05:11:48Z

We should check #257 again after closing this 🤞

MauricioUyaguari · 2022-03-18T18:43:25Z

@akphi Here are the logs for the 20k, making the change from addUniqueEntry to .push on addChild in Package
packageAddChildrenSlowExample.txt

akphi · 2022-03-18T21:10:02Z

Thanks to @MauricioUyaguari we have the following comparisons using CDM models (1417) elements:

Local Chrome/Engine
- GRAPH_BUILDER_SYSTEM_BUILT - 178 ms
- GRAPH_BUILDER_DATA_MODEL_PARSED - 248 ms
- GRAPH_BUILDER_COMPLETED - 2545 ms
- GRAPH_INITIALIZED - 2594
- ENGINE COMPILATION - 151 ms
Local Node/Engine
GRAPH_BUILDER_SYSTEM_BUILT - 122 ms
GRAPH_BUILDER_DATA_MODEL_PARSED - 118 ms
GRAPH_BUILDER_COMPLETED - 1686 ms
ENGINE COMPILATION - 178ms
SDLC Entities as JSON (Project Structure <8) vs Entities as Grammar (Project Structure >= 8) Server https://legend-acct.finos.org
- SDLC JSON - 1.03s
- SDLC Grammar - 2.91s
SDLC Entities as JSON (Project Structure <8) vs Entities as Grammar (Project Structure >= 8) Server LOCAL
- SDLC JSON - 899 ms (3.77 MB)
- SDLC Grammar - 2.33 s (7.26 MB)
- SDLC Grammar no source information - 2.11 (3.10 MB)

repost since I updated the description of the RFC

akphi · 2022-04-20T05:12:12Z

After #1068, I think we can call this satisfactory. The latest stat I got from CDM is

{
  "timings": {
    "GRAPH_INITIALIZED": 799
  },
  "dependencies": {
    "timings": {
      "GRAPH_BUILDER_ELEMENTS_DESERIALIZED": 2,
      "GRAPH_BUILDER_ELEMENTS_INDEXED": 1,
      "GRAPH_BUILDER_SECTION_INDICES_BUILT": 1,
      "GRAPH_BUILDER_DOMAIN_MODELS_BUILT": 0,
      "GRAPH_BUILDER_STORES_BUILT": 1,
      "GRAPH_BUILDER_MAPPINGS_BUILT": 2,
      "GRAPH_BUILDER_CONNECTIONS_AND_RUNTIMES_BUILT": 1,
      "GRAPH_BUILDER_SERVICES_BUILT": 1,
      "GRAPH_BUILDER_OTHER_ELEMENTS_BUILT": 0,
      "GRAPH_BUILDER_COMPLETED": 9,
      "GRAPH_DEPENDENCIES_FETCHED": 11
    },
    "elementCount": {
      "total": 0
    },
    "otherStats": {
      "projectCount": 0
    }
  },
  "graph": {
    "timings": {
      "GRAPH_BUILDER_ELEMENTS_DESERIALIZED": 325,
      "GRAPH_BUILDER_ELEMENTS_INDEXED": 74,
      "GRAPH_BUILDER_SECTION_INDICES_BUILT": 1,
      "GRAPH_BUILDER_DOMAIN_MODELS_BUILT": 336,
      "GRAPH_BUILDER_STORES_BUILT": 1,
      "GRAPH_BUILDER_MAPPINGS_BUILT": 0,
      "GRAPH_BUILDER_CONNECTIONS_AND_RUNTIMES_BUILT": 1,
      "GRAPH_BUILDER_SERVICES_BUILT": 1,
      "GRAPH_BUILDER_OTHER_ELEMENTS_BUILT": 23,
      "GRAPH_BUILDER_COMPLETED": 762
    },
    "elementCount": {
      "total": 1417,
      "other": 2,
      "sectionIndex": 0,
      "association": 0,
      "class": 1045,
      "enumeration": 366,
      "function": 0,
      "profile": 1,
      "measure": 0,
      "store": 0,
      "mapping": 0,
      "connection": 0,
      "runtime": 0,
      "service": 0
    },
    "otherStats": {}
  },
  "generations": {
    "timings": {
      "GRAPH_BUILDER_ELEMENTS_DESERIALIZED": 1,
      "GRAPH_BUILDER_ELEMENTS_INDEXED": 2,
      "GRAPH_BUILDER_SECTION_INDICES_BUILT": 1,
      "GRAPH_BUILDER_DOMAIN_MODELS_BUILT": 1,
      "GRAPH_BUILDER_STORES_BUILT": 1,
      "GRAPH_BUILDER_MAPPINGS_BUILT": 0,
      "GRAPH_BUILDER_CONNECTIONS_AND_RUNTIMES_BUILT": 1,
      "GRAPH_BUILDER_SERVICES_BUILT": 1,
      "GRAPH_BUILDER_OTHER_ELEMENTS_BUILT": 0,
      "GRAPH_BUILDER_COMPLETED": 8
    },
    "elementCount": {
      "total": 0
    },
    "otherStats": {
      "generationCount": 0
    }
  }
}

So we have reduced the times from 8000ms -> ~800ms, that's ~10x improvement. I'd say this is rough number, we do a bunch of caching in several places (which CDM isn't the best to show this improvement), so it might can even go up to ~15x in some case.

This takes a crap load of refactoring, grinding, and fixing bugs. Thanks folks @MauricioUyaguari @YannanGao-gs @gayathrir11 🦄 !!!!

I will close this issue for now as the rest of the optimizations have already have their own threads and can be left in backlog to tackle over time.

MauricioUyaguari added the Type: Feature Request label Mar 9, 2022

github-actions bot added the Studio Core Team Opened by a member of the Studio core team label Mar 9, 2022

MauricioUyaguari modified the milestones: 5.0.0, Marathon Mar 9, 2022

MauricioUyaguari added Component: Graph Manager Issues related to graph processing and management (including interaction with engine server) logic Difficulty: Challenging Type: Refactor Type: Enhancement labels Mar 9, 2022

MauricioUyaguari assigned MauricioUyaguari, akphi, YannanGao-gs, gayathrir11 and pierredebelen Mar 9, 2022

MauricioUyaguari added the Priority: PURE Migration label Mar 9, 2022

akphi changed the title ~~Feature request: Improve Graph Building Performance and Memory leakage~~ Feature request: Improve graph building performance and avoid memory leakage Mar 11, 2022

akphi mentioned this issue Mar 12, 2022

Feature request: Improve UX when loading dataspaces #936

Closed

5 tasks

akphi added Component: Performance and removed Component: Performance labels Mar 12, 2022

akphi pinned this issue Mar 17, 2022

akphi changed the title ~~Feature request: Improve graph building performance and avoid memory leakage~~ Discussion: Improve graph building performance and avoid memory leakage Mar 17, 2022

akphi changed the title ~~Discussion: Improve graph building performance and avoid memory leakage~~ RFC: Address graph building performance and memory usage Mar 17, 2022

akphi added Type: Discussion and removed Type: Feature Request labels Mar 17, 2022

akphi unassigned pierredebelen Mar 17, 2022

MauricioUyaguari mentioned this issue Mar 18, 2022

change Package children to be set of PackageableElement instead of array for performance #965

Closed

3 tasks

akphi added the Type: Mega-thread Tracker for multiple related issues label Mar 21, 2022

MauricioUyaguari mentioned this issue Mar 22, 2022

move duplicate element check to before first pass #968

Closed

3 tasks

This was referenced Mar 22, 2022

Feature request: Optimize graph builder by skipping protocol (de)serialization #972

Open

improve graph builder performance by bypassing duplication checks in tree indexing #973

Merged

akphi mentioned this issue Apr 10, 2022

Feature request: Metamodel verifier #288

Closed

akphi closed this as completed Apr 20, 2022

akphi modified the milestones: Marathon, 5.0.0 Apr 20, 2022

akphi unpinned this issue Apr 20, 2022

This was referenced Apr 20, 2022

Bug: Investigate potential memory issue with Legend Query and Legend Taxonomy #1092

Closed

Iteration Plan for Version 5.0.0 (Mar-Apr 2022) #933

Closed

akphi mentioned this issue Jun 25, 2022

Feature request: Make use of engine for property mappedness check in query builder #1069

Closed

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Address graph building performance and memory usage #927

RFC: Address graph building performance and memory usage #927

MauricioUyaguari commented Mar 9, 2022 •

edited by akphi

Loading

akphi commented Mar 12, 2022

MauricioUyaguari commented Mar 18, 2022 •

edited

Loading

akphi commented Mar 18, 2022

akphi commented Apr 20, 2022

RFC: Address graph building performance and memory usage #927

RFC: Address graph building performance and memory usage #927

Comments

MauricioUyaguari commented Mar 9, 2022 • edited by akphi Loading

Overview

Implementation plan

Profiling

Performance optimization

Immediate measure before optimization

Identify and focus on critical workflow performance optimization

Taxonomy

Query

Studio

Optimization by deferred/lazy loading

Loading/Saving time optimization

Micro-optimization code

Memory usage optimization

Scale-up resource

akphi commented Mar 12, 2022

MauricioUyaguari commented Mar 18, 2022 • edited Loading

akphi commented Mar 18, 2022

akphi commented Apr 20, 2022

MauricioUyaguari commented Mar 9, 2022 •

edited by akphi

Loading

MauricioUyaguari commented Mar 18, 2022 •

edited

Loading