-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Address graph building performance and memory usage #927
Comments
We should check #257 again after closing this 🤞 |
@akphi Here are the logs for the 20k, making the change from addUniqueEntry to .push on addChild in Package |
Thanks to @MauricioUyaguari we have the following comparisons using
repost since I updated the description of the RFC |
After #1068, I think we can call this satisfactory. The latest stat I got from CDM is {
"timings": {
"GRAPH_INITIALIZED": 799
},
"dependencies": {
"timings": {
"GRAPH_BUILDER_ELEMENTS_DESERIALIZED": 2,
"GRAPH_BUILDER_ELEMENTS_INDEXED": 1,
"GRAPH_BUILDER_SECTION_INDICES_BUILT": 1,
"GRAPH_BUILDER_DOMAIN_MODELS_BUILT": 0,
"GRAPH_BUILDER_STORES_BUILT": 1,
"GRAPH_BUILDER_MAPPINGS_BUILT": 2,
"GRAPH_BUILDER_CONNECTIONS_AND_RUNTIMES_BUILT": 1,
"GRAPH_BUILDER_SERVICES_BUILT": 1,
"GRAPH_BUILDER_OTHER_ELEMENTS_BUILT": 0,
"GRAPH_BUILDER_COMPLETED": 9,
"GRAPH_DEPENDENCIES_FETCHED": 11
},
"elementCount": {
"total": 0
},
"otherStats": {
"projectCount": 0
}
},
"graph": {
"timings": {
"GRAPH_BUILDER_ELEMENTS_DESERIALIZED": 325,
"GRAPH_BUILDER_ELEMENTS_INDEXED": 74,
"GRAPH_BUILDER_SECTION_INDICES_BUILT": 1,
"GRAPH_BUILDER_DOMAIN_MODELS_BUILT": 336,
"GRAPH_BUILDER_STORES_BUILT": 1,
"GRAPH_BUILDER_MAPPINGS_BUILT": 0,
"GRAPH_BUILDER_CONNECTIONS_AND_RUNTIMES_BUILT": 1,
"GRAPH_BUILDER_SERVICES_BUILT": 1,
"GRAPH_BUILDER_OTHER_ELEMENTS_BUILT": 23,
"GRAPH_BUILDER_COMPLETED": 762
},
"elementCount": {
"total": 1417,
"other": 2,
"sectionIndex": 0,
"association": 0,
"class": 1045,
"enumeration": 366,
"function": 0,
"profile": 1,
"measure": 0,
"store": 0,
"mapping": 0,
"connection": 0,
"runtime": 0,
"service": 0
},
"otherStats": {}
},
"generations": {
"timings": {
"GRAPH_BUILDER_ELEMENTS_DESERIALIZED": 1,
"GRAPH_BUILDER_ELEMENTS_INDEXED": 2,
"GRAPH_BUILDER_SECTION_INDICES_BUILT": 1,
"GRAPH_BUILDER_DOMAIN_MODELS_BUILT": 1,
"GRAPH_BUILDER_STORES_BUILT": 1,
"GRAPH_BUILDER_MAPPINGS_BUILT": 0,
"GRAPH_BUILDER_CONNECTIONS_AND_RUNTIMES_BUILT": 1,
"GRAPH_BUILDER_SERVICES_BUILT": 1,
"GRAPH_BUILDER_OTHER_ELEMENTS_BUILT": 0,
"GRAPH_BUILDER_COMPLETED": 8
},
"elementCount": {
"total": 0
},
"otherStats": {
"generationCount": 0
}
}
} So we have reduced the times from 8000ms -> ~800ms, that's ~10x improvement. I'd say this is rough number, we do a bunch of caching in several places (which CDM isn't the best to show this improvement), so it might can even go up to ~15x in some case. This takes a crap load of refactoring, grinding, and fixing bugs. Thanks folks @MauricioUyaguari @YannanGao-gs @gayathrir11 🦄 !!!! I will close this issue for now as the rest of the optimizations have already have their own threads and can be left in backlog to tackle over time. |
Overview
The problem: when loading a
huge
project, depending on the size,Studio
can take several minutes to load or in the worst case, crash the browser. We need to address this problem in, potentially, 3 aspects:performance optimization
andmemory usage optimization
andscale-up resource
.We did some
profiling
, looks like the perceived performance of graph builder is impacted most visibly by SDLC being slow to return entities and the fact that we introducemobx
when we don't really have to (we could enablemobx
when we start change detection instead). We tried to crudely lean downsource information
coming back inSDLC
(using an older project structure) and removingmakeObservable
in meta models and the graph. This in total boost the perceived performance by30-35%
. This is significant, but not enough as it doesn't really address 2 problems:perceived performance
: i.e. add loading bar with info on loading stagesfull graph
so it's just the matter of time before users hit the ceiling of memory-usage and crash the browser. So we should address this problem as well. Some ideas include: build partial graph or lazy-build.Implementation plan
Profiling
mobx
in metamodels: timing forgraph-building
improves10-20%
source information
seems to improve entities loading time50%
: we say "seems" because we test using a different project structure, not on the same project structure and just pruning source information.engine
(compilation) andstudio
(graph-building): apparently, engine is much faster thanStudio
,Studio
performs badly at first phase: deserialize and indexing/initializing, other than that, the performance is somewhat acceptablePerformance optimization
Immediate measure before optimization
Before embarking to improve performance, we should
fix
the UX by being transparent about the graph builder progress. Hence, we must give better feedback and monitoring around graph-building:system
,dependencies
,generation
)coming back from text-mode
orcompilation
, we probably send telemetry as wellIdentify and focus on critical workflow performance optimization
Since we need to optimize around the graph-loading experience, we first should ask the question:
What is the purpose of building the graph, is it really the blocker and the pre-requisite for all else?
If not, what critical flows can we optimize to improve UXTaxonomy
For
Taxonomy
, the critical flow is buildingdatapsace
, so we should improve dataspace loading: #936Query
Query
, the critical flow is to build the graph, there's currently no way to avoid this. However, there are certain tasks we can do to avoid relying too much on the graph and potentially we could consider disable processing for certain parts of the graph that is not needed for query builder, such asfile generation
,text
,diagram
, etc.MAYBE
we only build UML and diagrams, but we could potentially look at the diagrams to scan which models we need to build? - Feature request: Improve UX when loading dataspaces #936mappedness
check toengine
- Feature request: Make use ofengine
for property mappedness check in query builder #1069Studio
For
Studio
, graph builder serves the purpose of building editor form (analytics and linking) and change detection, as such, we could:mobx
from graph-builder flow, we will activateobservability
after graph-builder completes - separatemobx
from metamodel and graph logic (part 1) #1000grammar
orJSON
form when loading the graph #1071Optimization by deferred/lazy loading
Here we can consider doing something like
Pure
, where the graph unfolds as we access them. This requires some rewrite at meta level on how weaccess
properties of metamodel. The initial graph is only built partially, only enough to build out the package tree perhaps, then as we access the elements, we start building the graphLoading/Saving time optimization
Micro-optimization code
There are micro code-construct which is written in a slightly inefficient manner, but can add-up and hurt performance significantly:
This is a- I think it's really silly and not worth exploring at all.silly
one IMHO, but worth trying: replaceforEach
byfor ... of ...
. Technicallymap
also suffer the same penalty, but it's a big thing to refactor to stop using map. Arguably, we don't need to try these, but we could also look for replacingfind
usingSet
when possible.https://leanylabs.com/blog/js-forEach-map-reduce-vs-for-for_of/
https://github.com/thlorenz/v8-perf/blob/master/language-features.md#iterating-maps-and-sets-via-foreach-and-callbacks
https://github.com/thlorenz/v8-perf/blob/master/language-features.md#array-builtins
https://medium.com/@ExplosionPills/map-vs-for-loop-2b4ce659fb03
V1_resolvePathsInRawLambda
, consider the usefulness ofTEMPORARY__disableRawLambdaResolver
- optimize path and element resolution when building graph #1068Package.addChild()
andPackage.getOrCreatePackage()
method (the former could be improved by usingpush()
instead of being so defensive, the latter could be optimized by constructing at top level a map of path to packages for quicker lookup, remember to cleanup in delete and rename though). - improve graph builder performance by bypassing duplication checks in tree indexing #973Optimize- we already did cachinggetExtraBuilderOrThrow
(maybe with caching) as this is used a lot in the buildersDependencyManager
, e.g.getOwnProfile = (path: string): Profile | undefined => this.models.map((dep) => dep.getOwnProfile(path)).find(isNonNullable);
is expensive. - optimize path and element resolution when building graph #1068getNullableElement()
andgetNullablePackage()
, for example, maintain an index of elements (including packages), that is notobservable
and we also need to takerename/remove/addElement()
into consideration - optimize path and element resolution when building graph #1068getElement
method and only used to freeze read only method. We can remove this in non editable modes. disable editing specific post processing for non editing graph #992 and skip post processing for dependencies #1001Memory usage optimization
Legend Query
andLegend Taxonomy
#1092consider
turning offkeepAlive
for certain things like classsubtypes
andsupertypes
to avoid leakage - https://medium.com/terria/when-and-why-does-mobxs-keepalive-cause-a-memory-leak-8c29feb9ff55Scale-up resource
Consider using
Electron
to escape browser's resource limitation - #718: This will also enable us to do more optimization, such as local-file access, SDLC deployment locally (better stability than targetinggitlab
)The text was updated successfully, but these errors were encountered: