-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Holy RAM, batman! (Out of memory errors and excessively high memory usage) #3347
Comments
Smells like a memory leak indeed. However note that much of the performance comes from caching - trade-off is between memory consumption and performance. Not saying there's no room to consume less memory for the same performance, but RD isn't, and never was, intended to be lightweight at all. |
@retailcoder I'm totally down with RD never ever being lightweight. I wouldn't expect it, really. However, I think it is worthwhile to point out that I can load 2,282,482 records in an Access datasheet and only have 25-30 MB in the working set. I can do other operations such as grouping much faster. You can try it out yourself by playing w/ I know this is very an apple to oranges comparison because I'm comparing a C++ application against C# application but I also point out that Access is fundamentally disk-based which is an order of magnitude slower than RAM and we still see good performance nonetheless. I am not well-qualified to offer criticism but it does not appear to me we are using database but primarily in-memory collections. That makes me think many operations we are doing are always going to be expensive because it isn't a database and has no index to reduce the search space. It might be possible that RD would benefit more from using sqlite or something similar than caching service that is entirely in memory, given the large number of objects RD needs to have and comparatively only a small subset get invalidated after some event. Having tables and indexes would allow you to quickly prune and re-populate the subsets, likely in spite of saving to a much much slower subsystem. Just a thought. |
Most of our lookups are hash matches in dictionary keys, an essentially O(1) operation that completes pretty much instantaneously regardless of how many thousands of keys we're looking at... And it's all in RAM indeed. An empty project with standard library references is working with ~50-60K declarations, and we cache a number of aggregates too. Each declaration is an object that contains information about its name, type, accessibility, references in user code, etc.; this metadata is fundamental for Rubberduck to understand the code. To be honest it never even occurred to me that we could use a database and literally query the code. That's a very, very interesting idea... |
In that case, if you do use a database, your ideal database should support hash indexes as to keep the O(1) operation. I can't recall off the cuff if sqlite has hash index. I wrongly(?) assumed that you were also doing querying because in the quick fix inspection window, we group stuff, and display only first 100. That's an example of querying where a B-tree index can help cut down on the search space because you don't want to literally loop each single member of the collection in this case. You only want a scrolling window of data. From reading I did whenever you invalidate a module reference, you have to delete all references from that module, then re-build it. That's another example where a B-tree index would win out on a hash index -- just delete all references with module = "x", done in a sequential scan. Besides, it's much easier to optimize a database than it is code (at least that's the case for me). |
Oh, I was merely talking about the |
FWIW @bclothier sqlite does not maintain a hash-index. It does maintain some indexes, but it's IMO not suited to replace the mass of frequently updated information that Rubberduck needs. SQLite is simply not intended for the load of more or less completely refreshing a whole database whenever RD reparses. |
Just to give an apple-pear comparison... I fired up VIsual Studio 2015 and loaded Rubberduck solution (what else?). Initial loading took about ~245 MB of working set. Building it bumps it up to ~280 MB. Running code analysis on it adds some more up to ~315-320 MB. This is keeping in mind that Visual Studio is much more complicated and it likely has to handle far more references & declaration to support code analysis (which might not be as sophisticated as Resharper). That said, it's safe to say it doesn't need to increase by more than 100MB just to parse or to do inspection. Whether Visual Studio is doing this all in-memory or not, I don't know. That said, I question whether you really need to "completely refreshing a whole database whenever RD parses." Given that I can only type so fast and that I can only change one module at a time (which remains true even if I'm using VBIDE to automate my changes), you're always going to be invalidating a subset. Even if I clicked And if you are always going to work in subsets, you aren't going to get good performance from in-memory collections that can only support hash indexing; you need a range seek so you can enumerate only that subset, done. Maybe your answer is a in-memory collection that supports B-tree indices. Maybe it's really just a boring old memory leak.I don't know. But I really don't think we need that much RAM based on what I've observed. |
Visual Studio is also at version 15 or so (we work with v6.0 essentially), with Roslyn being the single most efficient compiler/analyzer out there, written by an army of brains. We are not Microsoft, or JetBrains, and we're not hundreds of contributors, we're a handful, doing this part-time when it's possible. It is thoroughly unfair a comparison. VS & Roslyn are integrated, they can parse a single modified line of code. Rubberduck does what it can with the shitty VBIDE API, and parses with ANTLR, and because it's not integrated the smallest possible granular unit we can parse is an entire module - anything else and we can kiss good-bye token positions. So yes, a subset - but not half as granular as a single modified line of code. That said I don't think we would have to ditch & recreate the whole database every time. A db could be a very interesting solution to project metadata, per-project and per-user settings, and for storage of the thousands of COM declarations, which definitely make up the bulk of the RAM we're eating up. |
@retailcoder I apologize if I've overstepped. You're absolutely right that VS has much more resources behind it than RD does, so what RD people achieved is nothing short of phenomenal. I clearly didn't think about ANTLR vs Roslyn. I did want to get a sense of what is possible, hence my comparison. That said, I'll stop putting on big britches now. |
@bclothier I didn't mean to come off rude or even annoyed - at the end of the day RD is eating up a ton of RAM, and likely leaking some; memory leaks are a critical issue, and embedding an actual database isn't a completely crazy idea at all. I think I'd go with SQL Server Express though. Gotta think about how to deploy RD to the website too (that online parser in the inspections page is performing terribly ATM). Thanks for your ideas! 😄 |
#3405 could be related? We should reevaluate this after a fix for that is merged |
Just to follow up on a newer version (admittedly a dev version of 2.1.6542.12529) Access loaded (34 MB) I think there's a clear memory leak with the Fix. When I got to the "Loading Code Inspector", the CI toolwindow was much more responsive than when I originally reported the issue, but it does get slower and slower each time I fix an inspection. |
Addendum from the chat where Mat asked about whether it's quickfix or reparse --- I noticed that the same project had receded from its original peak which was 641 MB to about 500 MB. Each parse adds 20 MB, and doing them in rapid succession can keep adding more and more but after a bit enough, they do get released, which is likely the result of GC's delayed cleanup. |
As per a suggestion in the chat, here's the test again with code explorer totally unwired from the startup process... Access loaded (35 MB) Seems to suggest it's not the UI but the inspections themselves or the parsing that's expensive. Addendum: As a comparison, loading the same project but with Rubberduck disabled (there are other addins still enabled), the memory climbs from 35MB to ~115MB, suggesting that pre-parse, RD adds ~100 MB to load. |
As this is banging around in the background, a thought occurred to me. If a change is made to use a DB in the background, many things could be stored in the DB permanently. There's no need to reparse and rebuild all the objects in EXCEL.EXE until/unless you change versions of Office. Many of the DLLs, OLBs, TLBs, etc that are referenced in a project don't change very often (if ever). A meta-metadata tag could be kept which indicates which version is being referenced in a project, and if that version is already in the DB, it never needs to be parsed. I would presume that should speed up parse times considerably. It might increase RD ship size by including a pre-populated DB, but that, IMHO, is a minor concern, or, RD could build the library of pre-parsed references and never touch them again (until a version flag is updated). |
Keep in mind that while we certainly could store the metadata, this means we are now trading performance for footprint. As already discussed in the first few posts of the issue, reading data off a hard drive (usually the slowest subsystem of a computer) is glacial in comparison to reading the same off a memory address within RAM. At very least, we would be able to use to load the data quickly without going through the motions of parsing the references of built-in libraries. However, it still will end up in memory if we want good performance. We need to have a concrete description of what changes we need to make to make this better to make the discussion meaningful. |
Since I love idiotic suggestions, but ones that could solve problems, couldn't we use some sort of in-memory DB like SQLite ? I don't know how it would perform against a regular hashed table, but it could provide a mean to better organize the data, and then prevent leakage, and you don't have the speed limit of a regular HDD. In case of a rebuild, just rebuild the DB. I don't know, in the case of that project, if the DB would, in the end, result in less RAM usage though. https://www.sqlite.org/inmemorydb.html Edit : nvm, by the time I wrote this post, from the time I had the page opened, bclothier wrote his post |
@retailcoder Oh, I somehow missed it ! Once I go back to a proper computer, I'll gladly join the project, seems quite interesting, and pretty active so far ! |
Ha! I love the suggestion. I did suggest database before but not sure if I explicitly suggested an in-memory database. To get the best use, we would need to index the properties we use the most in the inspections and whatever. But there's also the other thing --- if you read earlier, @retailcoder points out that most of lookups are hash lookup and therefore a O(1) operation. Hard to beat that. And by introducing a database, we pay a bit more in the index maintenance. This might be very good thing for the non-user-defined, since they won't change, but not so helpful for user-defined objects. To make it even more concrete, we should look closely at the |
Hi, @Imh0t3b As for reduce coupling, see below:
In fact I'm a physician :) and amateur programmer. More than a decade ago I managed to create a Patient Manager App for myself and a few others using Outlook Contacts () with a lot of Contact.UserProperties as a "DB" (in the .pst) for an Eletronic Patient Record - not a legal one, of course - and producing all needed Reports (prescriptions, referrals, etc) stamping my medical stamp, my signature, producing automatic text, diagnosis, bla-bla, but helping me a lot. It was almost all procedural code (!) but functioned very well. As always, new requirements from the client (me!) continuously pushed the envelope, and I started to split my God's UserForms and God's modules into smaller and smaller classes, respecting more and more SRP, but multiplying its numbers. And I said Good news, because I suppose that I'm capable to see RB functioning now because of some fine tuning that you RB guys did, since my developer's machine setup didn't change at all in last 2 y. |
Don't you read the blog? Always be explicit (not only on references)! ;) Any reason you don't switch to Office x64, as that solves all "Out Of Memory" issues. Ever tried Ms Access (only as frontend) as that is the easiest way creating form bound to data.
Yes they are great!! One cannot thank them enough!! Thank you @retailcoder , thank you @bclothier, thank you @Vogel612, thank you @MDoerner and thank you to all others that "Made Rubberduck" great (not again ;) ) If I had to pay for the knowledge I gained from you, my boss would be broke! But as they don't want our money, we can pay back with our time by contributing solutions to the easier issues! Hacktober is coming soon! |
Yeah, sorry. I should applied what I've learned the hard way about to be explicit ever on references!
Well, I'm from a resource-limited country... I still have more than 50% of the machines with x86 because of hardware and/or licenses limitation...
Thanks to you all for the hard and fruitful (duckful?) work ;) |
With reference to my closed issue above, it's a bit disappointing to realize that my main project by all means are too big for RD, VB6 apparently running into the 32-bit 2GB limit. Nevertheless, I will try to contribute with some observations, which may or may not be of value. When I load VB6 with my project and RB, Windows Task Manager shows a memory footprint of 86.5MB, and after clicking the Parse button it constantly runs up to end at approx. 1250MB before RB throw the out of memory exception. That may not be so useful as it simply means that we are running into the roof before the task is done. Other observations: Opening Code Explorer doesn't seem to consume much memory, but... if docked at the right edge of VB, if I change the size by dragging the bottom border up and down it seems to add 1-3MB of memory use each time, although there seems to be an upper limit about 10MB added. Leaving the IDE idle for some time seems release about half of that memory only. Closing the CE doesn't change that. W/o clicking the Parse button but open RB settings, seems to about about 15 MB of memory consumption, of which nothing is given back when closing Settings. Open it again adds another 15 MB approx. and another 15 MB next time etc. Very little if any of this is given back after leaving VB6 idle for some time. I don't know if these are signs of memory leaks or not, but thought leave my observations anyway. |
Regarding the memory added & eventually released --- that's normal given how .NET objects are garage-collected. The ducky may be already done with but it won't actually leave the memory until the .NET runtime decides to run its garbage collection in background. I am assuming that VB6 isn't "large address aware" in which case a typical 32-bit program would be only allowed 1 GB of memory, more or less even though there could be more memory available. Running as an add-in also brings problem because we are running in host's memory, which we may not have control. There is another discussion (#5176 ) which would move that work out of the host's memory space but that requires lot of work to make this happen. |
Surprisingly, I have had success! I unloaded 2 other Add-ins I had running but not really using, like the VB6 Add-in toolbar (not sure what it's doing really) and to my surprise RD was able to complete its process of both parsing and resolving references as well as running inspections. However, Win10's Task Manager shows a memory footprint of 1275.8 MB for the 'Visual Basic (32 bit) process, or when looking at the Resource Monitor VB6.EXE prints in KB Now if I understand this correctly, is "Shareable" what VB6 has left to "work with"? Hopefully, there will be enough with resources left to actually do something with RD as well. Unfortunately, I cannot load CS2013 at the same time, missing the tabbed UI in VB6 already, but for now I will have to alternate depending on work needed to be done. Anyway, this is a "Brontosaurus" project, both in size and age, so there are probably optimizations that can be done and maybe RD can help with that. Happy I finally made it anyway. |
Ok even greater success and thanks to @Imh0t3b for the 22 Aug tip above on " patching msaccess.exe for LARGEADDRESSAWARE (maybe vb6.exe too)!"! I did that, I patched vb6.exe as described in that link. After that, I can now load both CodeSmart 2013 and Rubberduck, click RB Parse button and vb6 happily eats 1500+ MB while RD gets the job done. Mission accomplished! |
To add to this discussion and maybe help others who wonder why when they are clearly using 64bit Office they're getting memory issues: But on a whim, I ran the LAA tool on I may run it on the rest of the RD DLLs (possibly the parsing one since that is used consistently), but just the one made a difference. Some interesting tidbits:
HOWEVERMAGICALLY (seemingly) the memory issues poofed out of existence. The memory use jumped to ~2.4+GB (with RD loaded; without is closer to ~100MB), but stability and speed increased dramatically, and I haven't had a memory error in 2 days. I'm going to keep monitoring this but the bottom line is that this appears to have done the trick. |
That is totally unexpected. However, I'd also add that we should also revert the files as to prove that the memory errors re-appear and thus is not a fluke that was just coincidental to you running the LAA tool. |
To confirm, you want me to revert the files (even though the tool didn't do anything)? Revert to not LAA, or just reinstall RD? |
Either would work, yes. If it was really the LAA tool (even though it reported it did nothing) then reverting should bring back the memory errors and re-running it should then make it go away again, which would be more stronger proof that there's something afoot with the LAA thingee. |
Alright, I'll give this a go and let you know; one moment, digs around in computer for files |
Alright. I did it. Was not able to change the flag; got a write error, in fact.
I cannot explain it. But, it clearly did something. RD info: Immediate Window outputNote: "rddllfile" is a string constant I set to C:\Users\USERNAME\AppData\Local\Rubberduck\Rubberduck.dll to make it easier to use immediate window. SetLaaFlag rddllfile,DisplayLaaStatusOnly
LAA is enabled.
SetLaaFlag rddllfile,TurnOffLaa
LAA is enabled.
Switching OFF LAA
(Failed error write to file)
SetLaaFlag rddllfile,TurnOnLaa
LAA is enabled.
Doing nothing |
Mind linking to the LAA tool you're using? |
Sure thing! I ran this via Excel (because you can't run this in the same Application you're trying to set), if that helps any. Direct link to DL: modLargeAddressAware.zip |
Thanks, reading the source code, it makes less sense because it really does nothing beyond reading the LAA flag from the file. I had surmised that maybe it was reporting "doing nothing" but in actuality doing something. That doesn't seems to be the case, so I'm not able to explain why just running the LAA tool affects it so. If it was simply because the LAA tool was reading the flag, then clearing the value (and failing) should have not have made it run slower again. |
I agree, I am also flummoxed. But in my anecdotal test size of 1, it seemed to work. I'll keep an eye on memory use for a bit and see if anything changes, but, other than my machine was placebo satisfied, I've got nothing. |
Try editbin from Visual Studio, I use this to set the flags on my software. |
I used Here is what I get:
NOTE: All the main RD DLL files appear to have the LAA flag set already. Here is the full header for the main DLL in question:
|
As a FYI - I did the same thing and it did not make any difference in the performance. |
I mean, I expect nearly no one else will have the same experience. I personally figured it would be another road to nowhere. But, if it was a fluke or not,... I swear it did do something. I haven't had an out of memory error since I did, and I was getting them left and right. |
I think the host has to be LAA aware. All office apps are except Access, which is supposed to get LAA this September. If you have patched your MSACCESS.EXE then it would load all extensions as LAA, ready or not. |
That's what I thought, except 64bit |
After observing random "System resources exceeded" or "Out of memory" with my usercode which didn't quite make sense, I later ran quick fixes. After applying changes, I went to start the application and immediately got the
System resources exceeded
*despite have NOT run any VBA code`.Suspicious, I restarted Access, then opened Access. Noted the working set memory at 8MB.
Launch open VBA, the working set rockets to 175MB working set memory.
Run parse, the working set goes up to 300 MB!
Open quick inspector, and refresh, the working set climbs to 475 MB!
I might be going out on a limb but it sure looks like to me something's hemorrhaging hemorrhages.
The text was updated successfully, but these errors were encountered: