-
-
Notifications
You must be signed in to change notification settings - Fork 11
Perf: Pre-merge global elements at normalization time #99
Comments
I looked into this initially and it's more complicated than it seems. Even though global configs are all merged together for every file, the order in which they appear matters because you might have a global config followed by a config with We could definitely pre-merge global configs that are next to each other, though. |
Sure, I saw that scenario as well, and the code I demo-ed above does actually account for that 😄 As we go down the array, any global configs are merged together, but that merged global is then merged into any subsequent non-global element in the array - the globals don't "skip" any of the elements. In effect, what we're trying to accomplish is:
That of course is extremely inefficient (quadratic at least). The algorithm I described above is linear, but I believe accomplishes the same effect. To work through an example step by step: [
{ files: ['**/*.js'] },
{ ignores: ['.git'] },
{ rules: { a: 1 } },
{ files: ['tests/**/*.js'], rules: { a: 2, b: 1 } },
{ rules: { b: 2, c: 1 } },
{ rules: { c: 2 } },
{ files: { a: 2 } },
{ ignores: ['node_modules', 'dist'] }
]
//----------
[
{ files: ['**/*.js'] },
{ ignores: ['.git'] },
{ rules: { a: 1 } },
{ files: ['tests/**/*.js'], rules: { a: 2, b: 1 } },
{ rules: { b: 2, c: 1 } },
{ rules: { c: 2 } },
{ files: { a: 2 } },
//--{ ignores: ['node_modules', 'dist'] }-- //<--- global ignores, remove and add to globalIgnores, globalIgnores = [['node_modules', 'dist']], currentGlobalToMerge = undefined;
]
//----------
[
{ files: ['**/*.js'] },
{ ignores: ['.git'] },
{ rules: { a: 1 } },
{ files: ['tests/**/*.js'], rules: { a: 2, b: 1 } },
{ rules: { b: 2, c: 1 } },
{ rules: { c: 2 } },
{ files: { a: 2 } } //<--- not global, keep with no merge as !currentGlobalToMerge, globalIgnores = [['node_modules', 'dist']], currentGlobalToMerge = undefined;
]
//----------
[
{ files: ['**/*.js'] },
{ ignores: ['.git'] },
{ rules: { a: 1 } },
{ files: ['tests/**/*.js'], rules: { a: 2, b: 1 } },
{ rules: { b: 2, c: 1 } },
//--{ rules: { c: 2 } },-- //<--- global, remove and merge into currentGlobalToMerge, globalIgnores = [['node_modules', 'dist']], currentGlobalToMerge = { rules: { c: 2 } };
{ files: { a: 2 } }
]
//----------
[
{ files: ['**/*.js'] },
{ ignores: ['.git'] },
{ rules: { a: 1 } },
{ files: ['tests/**/*.js'], rules: { a: 2, b: 1 } },
//--{ rules: { b: 2, c: 1 } },-- //<--- global, remove and merge into currentGlobalToMerge, globalIgnores = [['node_modules', 'dist']], currentGlobalToMerge = { rules: { b: 2, c: 2 } };
{ files: { a: 2 } }
]
//----------
[
{ files: ['**/*.js'] },
{ ignores: ['.git'] },
{ rules: { a: 1 } },
{ files: ['tests/**/*.js'], rules: { a: 2, b: 2, c: 2 } }, //<--- not global, keep and merge currentGlobalToMerge in, globalIgnores = [['node_modules', 'dist']], currentGlobalToMerge = { rules: { b: 2, c: 2 } };
{ files: { a: 2 } }
]
//----------
[
{ files: ['**/*.js'] },
{ ignores: ['.git'] },
//--{ rules: { a: 1 } },-- //<--- global, remove and merge into currentGlobalToMerge, globalIgnores = [['node_modules', 'dist']], currentGlobalToMerge = { rules: { a: 1, b: 2, c: 2 } };
{ files: ['tests/**/*.js'], rules: { a: 2, b: 2, c: 2 } },
{ files: { a: 2 } }
]
//----------
[
{ files: ['**/*.js'] },
//--{ ignores: ['.git'] },-- //<--- global ignores, remove and add to globalIgnores, globalIgnores = [['node_modules', 'dist'], ['.git']], currentGlobalToMerge = { rules: { a: 1, b: 2, c: 2 } };
{ files: ['tests/**/*.js'], rules: { a: 2, b: 2, c: 2 } },
{ files: { a: 2 } }
]
//----------
[
{ files: ['**/*.js'], rules: { a: 1, b: 2, c: 2 } }, //<--- not global, keep and merge currentGlobalToMerge in, globalIgnores = [['node_modules', 'dist'], ['.git']], currentGlobalToMerge = { rules: { a: 1, b: 2, c: 2 } };
{ files: ['tests/**/*.js'], rules: { a: 2, b: 2, c: 2 } },
{ files: { a: 2 } }
]
//----------
[
{ ignores: ['.git', 'node_modules', 'dist'] }, //<-- insert the merged global ignores in their correct order
{ files: ['**/*.js'], rules: { a: 1, b: 2, c: 2 } },
{ files: ['tests/**/*.js'], rules: { a: 2, b: 2, c: 2 } },
{ files: { a: 2 } }
]
//---------- Hopefully that all makes sense! |
Ah gotcha -- lots of issues to go through so it's tough to get deep into the details on each one. I think I follow what you're proposing. My concern is that we are introducing complexity for a theoretical performance advantage, and those tradeoffs are usually tricky. If you're up for creating a prototype to see what kind of perf advantage we might see, I'd be happy to evaluate it. |
TLDR: We get at least a 60% speedup with this approach 😄 OK, so, I've thrown together a prototype that merges the globals as described, and at the same time calculates the global ignores, Obviously there's some work that could be done to make this a bit nicer, and a lot of I've confirmed all tests are passing. I've benchmarked against two sets of data, both derived from the ESLint codebase. First, the set of files that are linted by ESLint in the ESLint codebase (derived by basically console.log'ing in the right spot during file enumeration), and second, a full file listing of every file in the ESLint codebase (excluding node_modules). The benchmarks below show the total time to
So, for files that we know have a configuration (i.e. case 1), we're seeing somewhere between a 75-85% speedup in config resolution (and this is with all the regular caching in place). For all files, including a huge number that are to be ignored (and might not even be sent through Also, I know we're talking about ~50ms on the ESLint codebase here, but every little helps with performance stuff, and for larger codebases, the absolute number would probably grow very quickly (ESLint is very conservative with the number of files it seems to use - many codebases I've seen would split the same code into a much larger set of files!) I'm happy to turn this into a proper implementation if you'd like to move forward. (I should note that the code on the branch is not usable as-is, as it's been written on my employer's time/resources, and I don't have authorization for it to be licensed for OSS use, so please don't move forward with it directly. I'll have to go through some internal processes to get the right approvals to implement this properly and allow the release/licensing of any code 😄) |
Interesting, thanks for doing that. It's definitely worth exploring further. Can you share the absolute numbers for each run? That would help put these results in context. |
The total millisecond times are at the end of the line if you scroll to the right - or is there a different measure you'd like to see? |
Ah sorry, I was misreading. What I was looking for was a before/after comparison. If I'm reading this correctly, it seems like all of the examples are pre-merging configs? I'd like to see that side by side with the current version's numbers. |
Yep, the The other lines use a local checkout of eslint with a small update to |
Ah! Thanks. I totally missed that there was a number all the way over on the right for that line. |
Since global elements will always end up being applied to every file (if they are going to be processed at all), then all the global elements { !files } are going to need to be merged again and again for every file config fetched (that's not cached).
Rather than repeat this for potentially every file, I'd propose that during the normalization process, all global elements be pre-merged backwards through the array and eliminated.
This obviously has the benefit of each individual file having to process fewer array elements when its config is fetched, and the number of merge operations therefore required. And depending on the merge function of individual schemas, it could significantly reduce the work they have to do overall if the global elements apply things that reduce the amount of processing required.
It does have some upfront cost, but we're using a O(n) operation once at normalisation time to reduce the n in a O(n * fileCount) at getConfig time.
This process could also be used to gather all the global ignores and eliminate those elements as well, doing the work that
this.ignores
usually has to do for "free".So, to be more concrete, here's some very quick and dirty example code of what this might look like:
Happy to put together a PR with properly thought out code for this if you think it's worth implementing.
The text was updated successfully, but these errors were encountered: