-
-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extreme CPU and Memory Load since 2.5 Build 1579 #829
Comments
I see something similar on an older dual core desktop system (Fedore 28, OpenJDK Runtime Environment (build 1.8.0_201-b09)). I have other things running on the box, but it's resting load was typically 0.25-0.40. On S1597 and previous, it is 0.80-1.00 with Java taking 43% RAM. I thought this started with S1566 and the start of the ESH integration builds, but could have been a later build as suggested by the OP. I see similar high load in my test environment with an i7. |
I would like to have a look at.
Can you tell me what needs to be done to reproduce the problem?
|
I‘m running 1592 on an APU2 with Debian 9 and don‘t see that problem either. Maybe it is specific to openHabian or a certain Java Version? |
For me same Problem on RPI with Oracle vm and Ubuntu 18.04 x64 with Oracle for openjdk. |
Might be caused by recent changes to the server send events part? Each event bus event is send via SSE. |
Which changes do you refer to? I remember to (both done by me)
None of there "should" be related. I assume it would be simpler if someone provides the (minimal) steps to reproduce the problems with a newly downloaded distribution. |
Same Problem for me but on RPI with openHABian, after installed and run Simatic Binding the CPU Load raises over 200%, in end effect the system slow down and after some time faulty. |
Still a problem on #1607 System extremely sluggish. Zwave and http events take 5-10 minutes to show up in logs. |
Hi all, You can either try to identify the causing thread by running the
Or when your system has been running for a while you could make a heap dump and analyze where all the memory is consumed by with Eclipse Memory Analyzer Tool (MAT). If you configure the runtime such that it will always write a heap dump on
E.g. in the provided heap dump for openhab/openhabian#469 there was a Thanks in advance. |
Adding some info from what I'm seeing on a system running build 1606. This system was restarted at about 07:50:00. Edit: There are no openHAB UIs open during the collection of any of these stats. Edit: The system has an Intel Celeron 4 core 1.60GHz CPU and 4 GB RAM. Here are the top 8 cpu consuming threads as of 08:48.
Looking at memory using pidstat (
Looking at CPU load using pidstat (
Looking at the top CPU consuming thread, its CPU time is growing pretty rapidly. This system is running a relatively light load.
By comparison, on another system running an older build (1502) that is much more heavily loaded than the above system, the CPU time is growing at a much lower rate.
I know there's an issue with memory consumption when karaf tries to load a bogus binding (see here), but I ruled that out as a possible cause here. |
Let's have a look at the events...
|
@maggu2810 Ok. I'm on vacation this week and we're heading out for a few hours. I can do some more analysis later when we get back. All I see is this every 10 seconds from the Unifi and GlobalCache bindings.
|
This is after running for nearly a week on S1603...
|
So it's a WeakCache somewhere that causes the mem leak and the event manager using too much cpu. Every item change/update causes an event. But the only listeners that I know of are the rule engine (and DSL rule engine as well I guess) and SSE. SSE got changed a bit, but not in a way that it should behave much different than before. What happened however was the upgrade of xtend, so the DSL rule engine has changed. |
In case this matters, I only have one DSL rule that I use for testing. |
@maggu2810 Here's a snip of the log using the logging from your pull request. I waited for the system to start up and reach "steady state".
|
The event subscribers are notified by the event handler using an async safe caller, so the time the event subscribers consume to handle the event should not be part of the event manager thread at all. In your log that's about 9-10 seconds sleep and then much lower then 1 seconds load. |
To my amusement I also spotted quite a few large numbered xtext entries in the given screenshots. Still a dream to get rid of that |
@cweitkamp Can you debug the constructor call of |
Not much time for debugging but what I got from Google: Lines 70 to 73 in 306165d
|
I tried to find a way how to automate the tracing of special function calls. I checked the line numbers of the stuff I am interested in for my currently used Java version.
all the time this method is called, I would like a stacktrace and then the debugger should continue. Let's use the command line debugger
Set monitors ("execute command each time the program stops") to dump the stacktrace and continue the execution from the breakpoint:
After that set the breakpoints:
Now, you will get a trace of all calls. If you want to look other method calls, just change your breakpoints. |
Nice |
Not sure if this is helpful. If you need a longer trace, let me know. These were being produced every 10 seconds, presumably to coincide with the events that are being generated as shown in one of my previous posts.
|
AFAI can judge the WeakCache's CacheKey and CacheValue created because of the |
I would be very interested if the high load is caused by #763 |
@maggu2810 I reverted that commit. The rate of CPU consumption of the ESH-OSGiEventManager thread looks like it's back to what it was before the change.
However, the gradual uptick of memory consumption has not changed, so perhaps we are seeing symptoms from more than one cause? |
@maggu2810 I installed visualvm, but I'm not quite sure what to look for. If you could give me some pointers on things I should look at, I can post the results here. I took a few screen snips of the visualvm panels. Maybe this will provide some clues where to look further. Note that the version I'm running has the CombinedClassLoader reverted. From what I can see, the ESH-OSGiEventManager cpu time is very low and is growing only very, very slowly. Given the info below, I'd suggest that CPU time used by ESH-OSGiEventManager with the CombinedClassLoader change, while high, is not a contributor to the memory and CPU issues I'm seeing. Monitor panel - note the huge memory growth spurts... CPU panels Memory Some additional info from pidstat.
CPU
|
After reverting #763 in S1607, I'm also seeing much reduced load and memory usage. I'll run some stats tomorrow. |
What does that mean? Why a CCE if it's the same class... |
Classes differ if they are supplied from different classloader. |
@mhilbush As you are such a good tester, can you also this one give a try: #597 (comment) |
I tend to agree. I do not see the need for the safeCaller (and especially for a proxy) and think it was only used because "it was available". Working with |
After reverting 763 on S1607, ESH-OSGiEventManager has been tamed...
Markus, I'm about to put in your changes from #861 and your test workaround for SSEActivator. Should 763 stay reverted? |
Thanks. I see it now. I'll disregard this one.
If I can reproduce the issue again, I'll try it, assuming I'm able to reproduce the problem consistently like before. As @openhab-5iver has confirmed the taming of ESH-OSGiEventManager after reverting CombinedClassLoader, I don't think I need to test with CombinedClassLoader reverted. FWIW, I see no other ill effects of ESH-OSGiEventManager without the reverted CombinedClassLoader (other than it being a CPU hog with cpu time of 7630749 after a couple hours of run time at 31.4 events per second [more on this later]). From my latest testing... After uninstalling restdocs, upgrading to build 1607, and restarting openHAB, I'm running much more stable now. This issue with restdocs I've seen before. 🙄 At 31.4 events/second, memory is holding steady at ~15.4% of 8 GB, and CPU is ranging between 25% and 85% on one core. Note I'm measuring events per second this way, with all event types in
I'm going to add restdocs back into the mix to confirm that the CPU/memory problem does not reoccur (as was the case with the issue cited above). Barring any issues with that, I'm going to test #861. |
Initial results with #861... Events per second: 31.4 (measured as described in previous post) -- same as before CPU ranging between 5% and 30% (on one core) -- compared with 25-85% Memory < 8% (out of 8 GB total) -- compared with > 15% OH-OSGiEventManager thread CPU time growing VERY slowly (measured at 10 second intervals)
As compared to before.
|
So, it looks much better. |
No, it looks MUCH better!! 😁 |
Just to state, there are three threads:
The last two should continue with "-1" but just look at the prefix. The events that are received from the OSGi event admin are placed in a queue. So, to be more precise you need to add the CPU time of this three threads. |
Should 763 still be reverted? |
IMHO: We just don't need to use the safe caller here (and perhaps other places). |
On 10 second intervals as before. And still MUCH better.
I built org.openhab.core from a new clone with just #861. So, this still has the CombinedClassLoader, right? |
Correct. |
@mhilbush Perhaps you can add a summary of your findings in #861. Perhaps another @openhab/core-maintainers likes it, too and merges the changes. |
Certainly! I'll do that now. |
Summarized here. |
Maybe just a side issue, but with snapshot (1607) as well as my productive environment (2.4) ESH-usb-serial-discovery-linux-sysfs-1 takes a lot of load, though I do not use any binding using any usb or serial device. |
@bjoernbrings By default, it polls every 5 seconds (which seems a bit aggressive to me). Fortunately, the thoughtful developer made it configurable. 😄 I'm going to try changing it in
Edit: Sure enough, if you want to tame |
Just in case you need another benchmark, I'm running 2.5.0~S1601-1 (Build 1601) on a RPi 3B+ running OH for almost 15 days (1296000 seconds) now. Here's an overview of the most resource-hungry processes (build 1601).
|
@maggu2810 , should this be closed now that #861 is merged, or are you waiting for some more feedback? |
IMHO it can be closed as the issue should be resolved. |
For me issue is home with the new build. Thank you for the hard work. |
Since Build 1579 there is extreme high CPU-Load and Memory consumption.
OSGIEventManager uses nearly tue complete CPU.
Reverting to 1578 instantly fixes this.
Even a Fresh install ends up in this, weit no Bundle found causing this.
Testet on an I5 and RPI3 both show same behaviour resulting in an nearly unusable system.
The text was updated successfully, but these errors were encountered: