-
-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V6: high sustained CPU and memory load from pihole-FTL #2194
Comments
Do you only see high load when accessing the web interface or also during normal operation? |
I was just gonna create an issue about this as well. I'm having the same issue on 3 different systems. All running pihole on docker. One arm64 two amd64 hosts. Memory usage seems to be close to v5 on all of them. CPU usage though increases quite a bit while using the web interface. Anything live updating (like recent queries and total queries on the home page) also spikes the CPU a lot more than v5. |
No difference between the two. I’m seeing CPU spikes as high as 69%. This is a secondary, so very little traffic. |
Also seeing CPU usage as high as 60% during normal operation.
Debug logs: https://tricorder.pi-hole.net/Hx4uwChN/ |
Just adding up to the conversation, some graphs with container's disk/CPU/mem usage as of the upgrade, reported by the Proxmox host (using Unbound as recursive DNS resolver, not sure if it's relevant to understand the issue though...). Network usage is pretty much the same as v5. Also noticing some random stutter on the DNS resolution, maybe related to the huge spike on resource usage (when it happens takes about 5-10 seconds to finish resolution, in comparison of the nearly instantaneous response, as expected for a LAN install). Happy to provide more info as requested by you devs. Thanks in advance. |
A somewhat larger need in RAM is expected for v6.0. It mainly comes from an additional in-memory database we need to use so we can offer the new server-side Query Log feature with reasonable responsiveness. It just needs a few extra B-trees to live in memory to be fast. I am pretty unsure where the disk-reads are coming from. Could you please use something like |
This is htop with thread names, not seeing sustained spikes at the moment, but quite a lot of CPU time has been used on the thread Need a reboot for query IO % for the container, because of a kernel config flag I need to enable on the host ( |
One observation from today: CPU consumption was low, until I accessed the dashboard via browser, then CPU increased dramatically. here is htop output with dashboard being accessed:
Here is http with Dashboard closed:
|
Here is what you actually asked for, but CPU usage is now low:
|
Could you please run
It will probably need some time (and may require
Does this solve your slowness issues? |
I'm not seeing high CPU utilization, nor issues with the web interface. Would it make sense to run this after doing the 6.x update, or is something similar going on in the background, and perhaps the source of the high CPU? |
Here is another highish CPU snapshot:
|
I ended up here through Reddit, where a bunch more people are reporting the same issue - https://www.reddit.com/r/pihole/comments/1iss62l/pihole_v6_extremely_slow_gui_high_cpu_usage/ I really wish major upgrades via pihole -up would at least prompt to continue. Some of us had custom nginx/fpm/doh/dot/vpn setups that got completely hosed. Same issue here ever since upgrading. As soon as I point clients to it, requests start lagging, intermittent outages, and it pegs a single CPU. All 10.255.255 IP's in the recording are local to pihole, and it spikes on each request. Screen.Recording.2025-02-20.at.6.00.58.PM.movIf I move /etc/pihole/pihole-FTL.db, then it doesn't get recreated. Here's my debug data https://tricorder.pi-hole.net/hD9JvUof/ |
Why not? Do you have any logs for us?
No, moving the corrupted database away is what is currently being suggested. |
This issue has been mentioned on Pi-hole Userspace. There might be relevant details there: https://discourse.pi-hole.net/t/web-interface-slow-after-update-from-5-to-6/76280/17 |
@DL6ER did you already rule out that it can be the database size? Is there a quick way to fill a query db to 1.5 GiB with Pi-hole v6? I'm happy to test a bit on my RPi Zero W. |
I tested this myself blowing up the database with identical copies of the the same query (only the ID incrementing) to almost 20 GB without any issues. Only when at least one of the indexes is corrupted, we are getting these issues as the database has no other way than reading the entire table and performing a manual search. This seems to be what is happening in virtually all comparable cases I have seen so far. I remains totally unclear to my why the upgrade of |
Can you give me a script to blow up a v5 query db? 1.5 GiB is sufficient I guess, so I do not need to raise my test VM's disk size 😅. Maybe I can replicate it when upgrading to v6.
Yeah I thought that maybe the library used by the web UI, or the way the web UI does the query call might somehow have a different result, compared to the CLI call. But I missed that you already found an actual corruption to be the cause. |
https://github.com/pi-hole/FTL/blob/v5.25.2/test/pihole-FTL.db.sql should generate a minimal v5 database. |
Idk, it would just say the db file was missing is all.
I tried again, this time it recreated, but the file was barely 400MB to begin with and it passed integrity checks:
I'll point some more clients at it now and see if it still has outages/sluggish perf |
I'm still experiencing the same sluggish performance once it reaches around 5-30 q/s. I tried checking the ui but it's unbearable when dns queries aren't being served either. It's running on a dual core amd epyc (3ghz) kvm that isn't showing performance issues otherwise. Looking at htop I did catch a short lived zstate thread that keeps coming up every 60s or so, but nothing else stands out aside from the constant 99-100% cpu usage stracing the process I see ton of/nonstop pread's The pihole-FTL.db that got created automatically has the following indexes
The rest of the strace seems to suggest it's bogging down going through gravity db for non-cached lookups (I'm blocking ~14m domains). There's a bunch of checks for missing -wal/-journal's
And my gravity.db is pretty big
pihole -g -c -d seems to end with an error:
I added a 4GB swap file, ran pihole -g -c -d again, it didn't even touch the swap/run low enough on ram to need swap and finished the tree building process. After that though, the gravity.db is 1.5GB meanwhile it was 750MB prior to the run with the same lists/number of domains being blocked:
|
Adding the unused swap fixed the issue for me, the culprit was the gravity tree build fail. I don't see the reason for having to add swap, when it didn't even get used. It's almost like the process tries to reserve memory and fails, however the system had 1GB free which should have been enough considering it didn't touch the swap that got added. I didn't dig any further into it since I already wasted enough time. These types of major release updates with possible breaking changes should not be pushed out to the masses with a simple 'pihole -up'. I'm sure plenty of us had custom setups running for years which were now affected, and hours were wasted as a result. Please consider a normal production release cycle for major releases aka leave it up to the end user. Think dist upgrades. |
I am sorry for the issues. We had a rather long beta period with a pretty large number of participants so we hoped to have covered many special cases. It appears not... Looking at your |
Are you sure this all comes from |
All containers stopped only Pihole running. |
Thank you for all the hard work put into this project! Any thoughts on the tree build fail without swap? This also seems to be a commonly reported issue now, and it was the direct reason for my cpu pegged FTL |
I can only assume that it is used for a very brief moment. Maybe it is a bug in |
I also could not reproduce it, but I was also not able to create a large query database yet. Is there an easy way to blow it up quickly to 1.5 GiB? I tried running |
I only seemed to run into the pegged cpu issue when gravity didn't finish building its tree, and there were 5-30 concurrent queries happening for "new records". Note I had ~14m domains in gravity. If you can skip the tree build then run something like |
What is the solution to this now? I have restored v5 from Backup, the v6 is unusable due to high CPU load.... |
If CPU load is again high after the update: sudo systemctl stop pihole-FTL
sudo mv /etc/pihole/pihole-FTL.db /etc/pihole/pihole-FTL.db.bak
sudo systemctl start pihole-FTL You can also remove the old database, if the old query logs are not important for you: sudo rm /etc/pihole/pihole-FTL.db.bak |
Running piholev6 (fresh install) on a raspi zero. Had no issues with running previous pihole versions on this pi, have done it since at least 2020. according to
Immediately, the CPU shot back up to 95-99% and continues to stay there. Edit: I realized that my 8GB SD card could've been problematic here, so I installed a fresh instance of dietpi on a brand new 128GB SD card. Installed pihole fresh. Restored from backup via teleporter, then ran And of course, CPU usage % remains pinned at >95% due to Edit 2: I attempted to increase the swapfile to 12GB then re-run the sqlite index creation, but got the same memory error:
The maximum swap usage during the index creation attempt was ~2.5GB, nowhere near 12GB (I watched it the whole time via Edit 3: I copied While this fixed my problem for now, I sincerely hope this issue is temporary. It seems that I'd have to re-do these steps after every weekly gravity update which is far from ideal. |
How ... many ... domains are there in your gravity database?
That's absolutely true, we have pi-hole/pi-hole#5977 as permanente fix for this. Could you please try weather
? |
Migrating from v5 to v6 has been a nightmare. I was unaware a major version was live, and updated as usual. Everything broke. Everything kinda worked fine. But then, sometimes I would have RAM spikes. I used to have 128MB RAM allocated for years using v5 I'm also using Proxmox, without swap, so I gradually increased the RAM for v6 to 256MB. I have around 2 millions domains.
Restarting the service after building the tree is now working fine. Guys, something aint right |
I'm having the same issue. PiHole v5 has been problem free for over a year on a Proxmox LXC w/ 256M of RAM. Upon upgrade to v6, gravity update ("updating tree" step) causes maxed out CPU and RAM usage and I have to force restart the container. I have about 3M domains across various blocklists. Increasing RAM to 2G allows everything to succeed in just a couple seconds, but that seems unnecessary for a DNS server and I'd rather have that as e.g. storage cache.
The process that was using 100% CPU before
though I suspect this is from the previous step. Also, not to pile on, but yes it would have been nice to have this major update handled a little better. I have PiHole auto-updating with a script (not a great idea I know) and it took itself out for a few hours until I was able to restore from backup since it also broke all the custom |
Thank you for your comment, pi-hole/pi-hole#5977 is indeed not covering
I did just open a separate PR for this in #2320. Could you please try whether
fixes this for you?
We have discussed this in the past with many users. We are by no means a hardliner on our way and are currently discussing this further in pi-hole/pi-hole#6008. This is what the past two weeks - and also your particular experience seems to tell me: Try as hard as possible to prevent users fro shooting themselves into their feet while not being there to debug it. Stuff falling over is another thing, but this happening when you do not expect it (because you have forgotten the auto-upgrade script exists or because your thought "what should happen?") should be suppressed. Opinions about this differ in the Pi-hole developer team. |
Just tried out
Meaning all the list parsing is working fine (even on If I make
Re: Auto-updating -- My update script is just a cron job that by default runs as root, so requiring |
This line is missing the
which is at the very heart of pi-hole/pi-hole#5977. Without this, it's no surprise what you are seeing. |
Would I still be able to switch back to stable after trying the development branch? |
If you use Tree view in
This branch should "finally" fix it without the need for any changes on core (#2321).
Sure, you can always use
to go back to the stable release, however, please, instead, run the command
for testing (can be undone with |
I will be able to try it in a few hours when I'm back. The development core branch still has the pragma file set, should I switch to master for core? |
I doesn't matter but I will, nonetheless, create a reverting PR just for cleanness in the code (no need to set to a default value). |
|
Versions
Platform
Expected behavior
Modest memory and CPU consumption
Actual behavior / bug
High sustained memory and CPU consumption by pihole-FTL. Very slow web interface response.
Steps to reproduce
Steps to reproduce the behavior:
Debug Token
https://tricorder.pi-hole.net/KqhHPC9x/
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
updated to 6.x from 5.x via pinhole -up.
The text was updated successfully, but these errors were encountered: