Likely O(n^2) behaviour for `watchTree` #101

Janiczek · 2021-02-10T19:43:41Z

I've had a situation where I ran watchTree with default config over a directory with 170k files and 15k dirs.

The recursive watch registration process ate ~400% CPU (as reported by htop) for about 20s, after which the CPU usage fell down to normal idle levels.

After trying to remove node_modules (erm...) the directory count went to 1k, and the CPU usage on startup was ~100% for about 2 seconds.

This makes me think there's some O(n^2) behaviour in the fsnotify code. (For example, inotifywait CLI tool has no such CPU usage when doing the same task.)

Here's a relevant snippet of profile information from the run with 15k dirs:

COST CENTRE                                                                 MODULE                                  SRC                                                              no.       entries  %time %alloc   %time %alloc

   watchTree                                                                System.FSNotify                         src/System/FSNotify.hs:183:1-44                                   9801          1    0.0    0.0    99.9  100.0
    threadChan                                                              System.FSNotify                         src/System/FSNotify.hs:(190,1)-(208,9)                            9803          1    0.0    0.0    99.9  100.0
     threadChan.\                                                           System.FSNotify                         src/System/FSNotify.hs:(191,41)-(208,9)                           9804          1    0.0    0.0    99.9  100.0
      listenRecursive                                                       System.FSNotify.Linux                   src/System/FSNotify/Linux.hs:(100,3)-(178,57)                     9811          1    0.0    0.0    99.9  100.0
       listenRecursive.listenRec                                            System.FSNotify.Linux                   src/System/FSNotify/Linux.hs:(122,7)-(126,47)                     9812          1    0.0    0.0    99.9  100.0
        findDirs                                                            System.FSNotify.Path                    src/System/FSNotify/Path.hs:(64,1)-(65,68)                        9860          1    0.0    0.0    99.9  100.0
         findAllDirs                                                        System.FSNotify.Path                    src/System/FSNotify/Path.hs:(52,1)-(55,36)                        9884      11696    0.0    0.0    99.9  100.0
          findImmediateDirs                                                 System.FSNotify.Path                    src/System/FSNotify/Path.hs:49:1-70                               9886          0   18.0   22.3    99.9  100.0
           fileDirContents                                                  System.FSNotify.Path                    src/System/FSNotify/Path.hs:(33,1)-(39,22)                        9887      11696    5.7    1.1    81.8   77.7
            getDirectoryContentsPath                                        System.FSNotify.Path                    src/System/FSNotify/Path.hs:(24,1)-(30,21)                        9888      11696    7.0    6.1    75.9   76.6
             getDirectoryContentsPath.exists                                System.FSNotify.Path                    src/System/FSNotify/Path.hs:27:5-68                               9900      72406   68.9   70.5    68.9   70.5

The other code paths not present above took ~0% of the measured time.

Let me know if I can help in some other way.

The text was updated successfully, but these errors were encountered:

thomasjm · 2021-02-11T00:39:32Z

Hmm, I don't see anything that looks O(N^2) but a couple things jump out at me in the recursive watch code

The recursive watch code first gathers all subdirectories into a list before adding any watches, which is probably not great for memory usage and may result in higher CPU usage due to garbage collection (?)
The code then loops over these subdirectories and adds a watch to each of them. Each of these operations requires waiting on a global MVar, and also creating one new MVar per watch. Either of these things might be causing high CPU.

We could improve 1) by gathering directories and adding watches in a streaming fashion, perhaps with pipes or conduit. We could perhaps improve 2) by restructuring our use of MVars, but it would have to be done carefully.

Any chance you could create a flamegraph with https://hackage.haskell.org/package/ghc-prof-flamegraph ?

Janiczek · 2021-02-11T07:29:49Z

I will try to find time to generate the flamegraph and get back to you.

By the way you should be able to reproduce this behaviour solely based on the number of directories watched over -- tens of thousands of directories should provoke the behaviour.

By the way 2, my friend @turboMaCk has tried a continuation-passing-style approach to listing contents of a directory recursively that seems to be pretty performant even on higher file/dir counts:

https://github.com/turboMaCk/unix-recursive

EDIT: given the profile info in the original post, I think your point 2 won't be the culprit, since it didn't show up in the profile info. It all seems to be in directory manipulation, canonicalization etc.

Janiczek · 2021-02-11T08:06:55Z

GitHub didn't let me upload a SVG, so here is a gist:
https://gist.github.com/Janiczek/f3bb8a1be8841b163aa233164bc67856
At the bottom there's a flamegraph attached.

thomasjm · 2021-02-11T08:36:49Z

Thanks! Interesting how the biggest amount of time is actually in the getDirectoryContentsPath.exists function. I just pushed a commit that might help when newer versions of System.Directory are available: 4bb7d27.

Hmm, this directory listing function will still realize all the directories in memory at once, right? If we're optimizing that then I'd rather do it in a streaming fashion so the peak memory usage is low.

Janiczek · 2021-02-11T08:51:56Z

@thomasjm There's also a possibility (that I'm currently doing in my app) of filtering certain folder IDs and not recursing deeper into them. I don't know how that fits into the FSNotify API / responsibilities, but perhaps it could be a configuration parameter. FilePath -> Bool for whether you should care about a folder, eg. folder /= "node_modules" or not $ Glob.match ignoredDirPattern folder.

That could cut a large chunk of unneeded work in certain usecases (like mine).

Janiczek · 2021-02-11T08:53:37Z

@thomasjm Is there a way I could try out the mentioned commit without a proper release? Does Stack support specifying the version with a commit hash perhaps?

thomasjm · 2021-02-11T09:12:45Z

filtering certain folder IDs and not recursing deeper into them

Well, you could use the non-recursive watch function to set the watches on exactly the folders you want.

Does Stack support specifying the version with a commit hash perhaps?

Yes, like this in stack.yaml:

extra-deps:
- git: https://github.com/haskell-fswatch/hfsnotify.git
  commit: 4bb7d27e3217207537f544e6e6602bfe55faf930

Hopefully the work in progress I have going on in master branch doesn't cause problems, I was working on some breaking changes for the next release. If it does you may need to cherry-pick--you can point to a local checkout in stack.yaml like this:

packages:
- .
- /path/to/your/hfsnotify/checkout

turboMaCk · 2021-02-11T09:20:51Z

sorry to jump in. I'm sort of in the middle of identifying different pieces which could be optimized to work better on large FS trees. I'm yet to identify exactly where the wins are but I'll let you know if I find something either by opening PR if there are some quick wins or by opening an issue where we can discuss more specific details.

turboMaCk · 2021-02-28T14:18:03Z

I had a deeper look into this issue both by analyzing isolated things as well as directly analyzing the performance of code @Janiczek observed this behavior on. For the record it's this commit Janiczek/elmid@8cf9636 but I also needed to change few hard-coded paths to make it possible for me to use.

I'm able to reproduce higher than optimal CPU loads but as far as I can say code isn't O(n^2). It scales close to O(n) based on number of directories and files - that's being said I did not put much more effort into analyzing big O once I discovered the scaling is acceptable.

For this and similar as this cases the biggest problem is inability of watchTree to filter directories to which to recurse into. The primary cost is simply recursion into directories and files which are then ignored using predicate anyway.

I also have few points to @thomasjm's first reply:

Hmm, I don't see anything that looks O(N^2) but a couple things jump out at me in the recursive watch code
1. The recursive watch code first gathers all subdirectories into a list before adding any watches, which is probably not great for memory usage and may result in higher CPU usage due to garbage collection (?)
2. The code then loops over these subdirectories and adds a watch to each of them. Each of these operations requires waiting on a global MVar, and also creating one new MVar per watch. Either of these things might be causing high CPU.
We could improve 1) by gathering directories and adding watches in a streaming fashion, perhaps with pipes or conduit. We could perhaps improve 2) by restructuring our use of MVars, but it would have to be done carefully.

I spent quite some time trying to look for what is the optimal way to implement listing. See results of my measurements https://github.com/turboMaCk/unix-recursive#comparision-with-other-libraries
- TLDR is that something like pipes/conduit would actually make performance worse but they do make it easy to keep lazyio and memory under control
- memory profile of fsnotify is actually great considering the work it's doing
is likely one of the which slow fsnotify down but there will likely be a tradeoff with different implementations.

thomasjm · 2021-03-01T01:03:34Z

Thanks for the information @turboMaCk, interesting stuff. A few thoughts/questions:

It looks like a large amount of the performance improvement in your lib explained by just the switch from FilePath to RawFilePath. Although it's a little hard to compare. Are you just recording overall CPU usage samples on your machine? It might help to smooth it out. Or to be really fancy, you could consider perf.
Any chance you could run this same test on the System.FSNotify.Path.findDirs function of hfsnotify? I'd be especially curious if the new commit I mentioned above makes a difference, since it's looking like functions such as doesFileExist are a lot more expensive than I would have expected.
Could you explain this statement a bit more?

pipes/conduit would actually make performance worse but they do make it easy to keep lazyio and memory under control

I'm confused because your results for dirstream, the one which uses pipes, seems to show the opposite (relatively good CPU usage but leaking memory)

Janiczek · 2021-03-01T07:19:02Z

@thomasjm I know the answer to 1: it's the psrecord tool that only records one process ID (and optionally its children). So the graphs in @turboMaCk's repo don't show the whole system but just the measured apps.

turboMaCk · 2021-03-01T08:50:38Z

Yes there is a huge difference between using packed ByteString an [Char]/String. Once you get close to optimal implementation the overhead of linked list is quite a killer in both CPU time and memory but also for these measurements I'm essentially creating a list of all paths on good portion of my $HOME. I'm not recording overall CPU just the process as @Janiczek said. Psrecord is like perf. Here is a script I used to measure
I'll try to measure two versions before and after the commit.

I'm confused because your results for dirstream, the one which uses pipes, seems to show the opposite (relatively good CPU usage but leaking memory)

Yes but it took the same program 70s which is 10x longer to finish than RawFilePath implementation and more that twice as long as FilePath one. Which is still quite a good result imo but it's also hardest implementation to use and it has most dependencies including deprecated system-filepath.

thomasjm · 2021-03-01T10:19:37Z

Oh I see, I didn't notice the total time for the dirstream one. Makes sense, although without knowing more about why it's behaving that way I don't think we can rule out a performant pipes implementation being possible.

Anyway, it would be good to see your measurements on the hfsnotify functions but here's the way I'm thinking about this -- please let me know how it matches up with your priorities.

I think we could invest effort in optimizing the crucial section here, which seems to be the findDirs function. Figuring out which existence check functions are most performant seems prudent. But it seems the thing that would make the biggest change would be to change to using RawFilePath or similar. Which I think is a great idea, but it would be a bit of work to do it without changing the API of this library. (At some point RawFilePaths would need to be decoded into FilePaths and any errors or decoding discrepancies dealt with.)

On the other hand, it sounds like what you really need is a way to avoid running over gigantic directories like node_modules in the first place, right? It's always better to not create watches you don't need, since each watch comes with a kernel memory cost and brings you closer to hitting your inotify watch limit. The fact that you're getting perf problems from this directory traversal suggests that you're creating too many watches :P. Doing a filtered version of watchTree like @Janiczek mentioned would not be difficult.

turboMaCk · 2021-03-01T17:54:32Z

So I have a result for an implementation from master/HEAD. I've ran all other tests as well and basically the performance of other implementations is same as in link I posted so I won't spam this issue further.

So CPU time is on par with dirstream, however current implementation consumes near 10GB of ram which is consistent with forcing the list too early from my other measurements.

I also agree with all what you say in last post. I think in ideal case there would be 2 separate issues and PRs coming from this issue.

a) provide new recursive API can help folks avoid into this trap of doing more work than is obvious/needed.
b) optimize System.FSNotify.Path functions to produce results lazily

I can help you with tackling these if you want. I think especially b) should be doable without changes in API even of this internal module. a) will involve some API design

Makes sense, although without knowing more about why it's behaving that way I don't think we can rule out a performant pipes implementation being possible.

While I think using conduit or pipes is a good way to develop application I personally don't like the idea to use it for library internals unless really necessary. Libraries with less dependencies easier to maintain and are often preferred by developers because they don't make as many changes for them.

Reproducing my results:

All the code including fsnotify implementation can be found there: https://github.com/turboMaCk/unix-recursive/tree/test-fsnotify-functions (make sure you're using branch test-fsnotify-functions).

just be aware

this won't work on windows because of posix specific functionality in my implementation
bin flag needs to be passed during compilation to produce binaries. with stack this is --flag unix-recursive:bin
I have nix integration for stack enabled in stack.yaml. If you want to use stack but don't use nix use --no-nix flag
measure.sh [filepath] could be used to compare all implementations
- script might need some small adjustments in paths based on platform (unless you use nixos)
it's good idea to run it at least twice because kernels FS cache can have an effect on performance of this type of IO.

EDIT:

Sorry one more thing I forget to mention. I think the fact that recursive directory walking is not as optimal as it should be at the moment is really not that big deal because with current watchTree implemetnation, fsnotify attaches watcher to all these files so there simply always will be resources allocated for all inodes it walks through.

thomasjm · 2021-03-02T03:45:20Z

Okay, I made a stab at an efficient implementation. There are a few TODOs before it's ready to ship but the tests are passing. Want to try it? I looked at your lib for reference on the dirstream stuff :)

https://github.com/haskell-fswatch/hfsnotify/blob/master/src/System/FSNotify/Linux.hs#L193

For the filtering stuff, I think we want to just add a new watchTree function that takes an extensible options object:

watchTree' :: WatchManager -> RecursiveListeningOptions -> FilePath -> ActionPredicate -> Action -> IO StopListening

data RecursiveListeningOptions = RecursiveListeningOptions {
  recursiveListeningOptionsDirectoryFilter :: FilePath -> Bool -- IO Bool ?
  }

defaultRecursiveListeningOptions = RecursiveListeningOptions {
  recursiveListeningOptionsDirectoryFilter = const True
  }

I haven't had time to do the filtering stuff yet (it will need to be added to each platform) but a PR would be welcome if you want it soon.

georgefst · 2021-10-07T21:41:43Z

But it seems the thing that would make the biggest change would be to change to using RawFilePath or similar.

If this happens, could we please also export the underlying RawFilePath versions of everything? Then the existing API can just wrap all those functions in pack and unpack.

That's what I've always wanted from this library - I end up converting all the FilePaths to/from RawFilePath in my own code anyway.

EDIT: The time of an abstract OsPath is finally almost upon us, so probably best to wait for that.

Ericson2314 · 2022-06-30T00:06:13Z

https://man7.org/linux/man-pages/man7/fanotify.7.html might be able to help with this?

turboMaCk mentioned this issue Mar 1, 2021

compare with fsnotify implementation turboMaCk/unix-recursive#17

Draft

thomasjm mentioned this issue Jul 19, 2021

perfomance in large projects #10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Likely O(n^2) behaviour for `watchTree` #101

Likely O(n^2) behaviour for `watchTree` #101

Janiczek commented Feb 10, 2021 •

edited

Loading

thomasjm commented Feb 11, 2021

Janiczek commented Feb 11, 2021 •

edited

Loading

Janiczek commented Feb 11, 2021 •

edited

Loading

thomasjm commented Feb 11, 2021

Janiczek commented Feb 11, 2021

Janiczek commented Feb 11, 2021

thomasjm commented Feb 11, 2021

turboMaCk commented Feb 11, 2021

turboMaCk commented Feb 28, 2021 •

edited

Loading

thomasjm commented Mar 1, 2021

Janiczek commented Mar 1, 2021

turboMaCk commented Mar 1, 2021

thomasjm commented Mar 1, 2021

turboMaCk commented Mar 1, 2021 •

edited

Loading

thomasjm commented Mar 2, 2021 •

edited

Loading

georgefst commented Oct 7, 2021 •

edited

Loading

Ericson2314 commented Jun 30, 2022

Likely O(n^2) behaviour for watchTree #101

Likely O(n^2) behaviour for watchTree #101

Comments

Janiczek commented Feb 10, 2021 • edited Loading

thomasjm commented Feb 11, 2021

Janiczek commented Feb 11, 2021 • edited Loading

Janiczek commented Feb 11, 2021 • edited Loading

thomasjm commented Feb 11, 2021

Janiczek commented Feb 11, 2021

Janiczek commented Feb 11, 2021

thomasjm commented Feb 11, 2021

turboMaCk commented Feb 11, 2021

turboMaCk commented Feb 28, 2021 • edited Loading

thomasjm commented Mar 1, 2021

Janiczek commented Mar 1, 2021

turboMaCk commented Mar 1, 2021

thomasjm commented Mar 1, 2021

turboMaCk commented Mar 1, 2021 • edited Loading

Reproducing my results:

thomasjm commented Mar 2, 2021 • edited Loading

georgefst commented Oct 7, 2021 • edited Loading

Ericson2314 commented Jun 30, 2022

Likely O(n^2) behaviour for `watchTree` #101

Likely O(n^2) behaviour for `watchTree` #101

Janiczek commented Feb 10, 2021 •

edited

Loading

Janiczek commented Feb 11, 2021 •

edited

Loading

Janiczek commented Feb 11, 2021 •

edited

Loading

turboMaCk commented Feb 28, 2021 •

edited

Loading

turboMaCk commented Mar 1, 2021 •

edited

Loading

thomasjm commented Mar 2, 2021 •

edited

Loading

georgefst commented Oct 7, 2021 •

edited

Loading