Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast memory leak #53

Open
fluxionary opened this issue Jun 1, 2019 · 31 comments
Open

Fast memory leak #53

fluxionary opened this issue Jun 1, 2019 · 31 comments
Labels
bad crash Easy way to crash the server bug Something isn't working high priority Critical issue affecting the daily functioning of the server performance Something to do w/ server performance server Administrative task to do on the server

Comments

@fluxionary
Copy link
Member

BlS is leaking memory, which has so far resulted in taking out the entire server instance (not just BlS) at least once.

  1. We need to track down the memory leak, though this is exceedingly difficult.
  2. We need to enforce some way of limiting the amount of memory BlS allocates, so that it does not affect the performance of the other games running on the same server.

Item 2 can be achieved by using ulimit or something to the same effect. The following (old) links have descriptions of several solutions:

@fluxionary fluxionary added bug Something isn't working bad crash Easy way to crash the server high priority Critical issue affecting the daily functioning of the server server Administrative task to do on the server labels Jun 1, 2019
@fluxionary
Copy link
Member Author

fluxionary commented Jun 8, 2019

I talked to Terumoc, and he had some insight into this. He told me to run collectgarbage("count") to get a measure of how much memory Lua is using. This has shown that the issue is not a mod - lua memory usage remains around 120-150mb no matter how much memory the server is chewing up (there's a button in the Admin HQ).

He also reported that this is apparently a known issue in minetest 5.0+, having something to do with pathing for entities. I haven't found the reference to that issue yet, but I'll put it here when I can.

@fluxionary
Copy link
Member Author

note: watchdog mod which reboots the server if it starts lagging too much
https://git.rudin.io/minetest/watchdog/src/master/init.lua

@fluxionary
Copy link
Member Author

fancy-schmansy monitoring mod
https://github.com/thomasrudin-mt/monitoring

@luk3yx luk3yx pinned this issue Jun 11, 2019
@ExeVirus
Copy link
Collaborator

ExeVirus commented Jul 3, 2019

I'll see what I can find in the core game on entity pathing. Perhaps I can help fix the known issue.

@ragulanramkumar ragulanramkumar unpinned this issue Jul 13, 2019
@ragulanramkumar ragulanramkumar pinned this issue Jul 13, 2019
@fluxionary
Copy link
Member Author

I've found that this might be caused by our use of a flat-file for our authentication DB: minetest/minetest#7279
Note that updating is already #33

@billy-s : At your convenience, could you migrate the auth file to SQLite? I'd do it myself, but I can't shutdown the server without it starting up again immediately...

@fluxionary
Copy link
Member Author

And as a further note, while the nightly reboots are annoying, we can get by for another couple weeks if you're busy.

@ragulanramkumar
Copy link
Collaborator

Can a back up be made today? I can migrate the database, but I don't want to run the risk of it going wrong

@fluxionary
Copy link
Member Author

@xerox123official Backup of what? I've discovered i can use the ".backup" sqlite command to backup-up most of the DBs while the server is running, but I can't do that for the main map, because it's constantly locked. Backing up the Auth file should be quite easy, though, since it's a flat file.

While we're on the topic, it'd be great if we had a regular backup procedure again..

@ragulanramkumar
Copy link
Collaborator

ragulanramkumar commented Jul 16, 2019

To stop the server from starting up again just kill the restart script, it's called something like start_mt.sh, then shutdown the server. When you want to restart the server just run that script and fork it to the background with ./path/to/script.sh &

@fluxionary
Copy link
Member Author

@xerox123official Luk gave me sudo access to run Billy's backup script, which I think does all that stuff itself, correct? I plan on running it around 5AM UTC when the server is quietest. I'll post a note when I do.

@ragulanramkumar
Copy link
Collaborator

Yup, you can schedule it in Billy's crontab or something

@luk3yx
Copy link
Member

luk3yx commented Jul 16, 2019

To be clear: They can only use sudo to run the script, nothing else.

@fluxionary
Copy link
Member Author

The migration went well, we'll have to check back in a day to see if memory is still leaking though.

@krypticbit
Copy link
Collaborator

I'll see if I can change the backup script so that it doesn't require sudo (I didn't think it did)

@krypticbit
Copy link
Collaborator

It seems that the latest backup is from July 17 (there are three of them from that data actually) so at least the script works when run manually

@krypticbit
Copy link
Collaborator

Hmm, it appears that I can't change the script to not need sudo; the script needs to be run as me so that minetest runs under my name. If it doesn't run under my name, it won't work next time. For now, sudo -u billys <command> should be fine.

@fluxionary
Copy link
Member Author

The memory leak has not gone away, and if anything, it is worse than before. The server had chewed up 20GB of memory in 24 hours when it was rebooted.

I've created a topic on the minetest forums to solicit help: https://forum.minetest.net/viewtopic.php?f=6&t=22882

@ExeVirus
Copy link
Collaborator

ExeVirus commented Aug 7, 2019

Something to note, I run many of the mods of this server on my family's local server for the game, and I have never once had a memory leak issue.

My guess then is that the build for Linux (I'm running windows) may have a memory leak or one of the dependencies introduces a memory leak.

I will take it upon myself to to collect all the mods from the bls_mods page (are there others beside lasers I should be aware of) and put them into a test client (5.1) and run that on both windows (7) and Debian 18 LTS.

I'm unsure if ill find anything, but if I don't find anything then the only code this could be coming from would be the net code, which is something I cannot test myself easily.

@fluxionary
Copy link
Member Author

fluxionary commented Aug 7, 2019

No need to "collect" all the mods - clone the bls_mods repo, then run "git submodule update --recursive --init".
Thanks for your effort, but I don't think you'll find anything if the server isn't heavily using all the available mods. I've never noticed the leak in my local clone of the server, but (1) I don't have the BlS map (2) I never run the server for all that long...

Also note, that recently LS-Wonderland, on the same host but running minetest 0.4.17.1, has been experiencing a memory leak as well, though it is much slower than ours.

@niwla23
Copy link

niwla23 commented Aug 13, 2019

HAve you ever tried to switch the engine?
So recompiling everything?
Or what happens when using a another map?

@niwla23
Copy link

niwla23 commented Aug 13, 2019

also, when its at spawn why not just switching to new spawn?
It looks finsished

@fluxionary
Copy link
Member Author

@niwla23 The issue has followed us through at least a couple upgrades to the minetest engine. I've tried replicating the issue on my local world, but I've had no luck. Switching to new spawn will have exactly no impact on this issue.

@fluxionary fluxionary changed the title Memory leak Fast memory leak Aug 29, 2019
@fluxionary
Copy link
Member Author

fluxionary commented Aug 29, 2019

update:

  1. This seems to really ramp up when more players who run large techpack/terumet factories are active. We need to investigate why.
  2. @billy-s we could still mitigate the issue through use of ulimit. We should make sure our software sucking up memory shouldn't crash other software running on the same hardware. (see early links to stackexchange for instructions). I'd suggest limiting its memory usage to 8gb.

@fluxionary
Copy link
Member Author

Also, the server LS-Wonderland, also hosted on the same machine, seems to have memory-leak issues as well. However, they are much slower, and require much less frequent reboots. Their weekly reboot cycle has taken care of this all except once. This is despite LS-Wonderland being a creative server w/ few mods in common w/ blocky.

@thomasrudin
Copy link

thomasrudin commented Aug 29, 2019

@fluxionary i asked you this on the forums: is there a way you can use the same engine (minetestserver) and run this locally with your modpack?

That way you can run valgrind and the other fancy analysis tools without interrupting the main world.
If there are memory leaks they should be detectable even in in singleplayer..
https://stackoverflow.com/questions/5134891/how-do-i-use-valgrind-to-find-memory-leaks

note: watchdog mod which reboots the server if it starts lagging too much
https://git.rudin.io/minetest/watchdog/src/master/init.lua

This does not help your issue, that was a problem with the pathfinder.

fancy-schmansy monitoring mod
https://github.com/thomasrudin-mt/monitoring

This may help visualize things but i don't think the problem is in the lua-code...

EDIT: how do you compile the mintest-code? A closer look into this might provide some insights..

@fluxionary
Copy link
Member Author

@thomasrudin My earlier attempt to run valgrind was fruitless, because (1) it slowed the game down too much to do anything and (2) my local server doesn't get the usage blocky does - I don't have a ton of players w/ large factories or large buildings, and the leak only really shows up when the server's been busy for a long time. I've got some ideas on how to get around those issues, but I've been doing other stuff.
@luk3yx Can you answer the question about how the code was compiled?

@fluxionary
Copy link
Member Author

I'm fairly certain that the issue is, at least in part, related to players running large factories, or having tons of stocked smartshops. Since Futureismine left the server, the memory leak has been much less pronounced (e.g. we can run a few days now without having to reboot...)

@luk3yx
Copy link
Member

luk3yx commented Sep 5, 2019

I have forgotten the exact cmake options I used, however it was something similar to this:

$ cmake . -DRUN_IN_PLACE=FALSE -DBUILD_CLIENT=FALSE -DBUILD_SERVER=TRUE
-- *** Will build version 5.1.0-dev ***
-- Using GMP provided by system.
-- Using bundled JSONCPP library.
-- Using LuaJIT provided by system.
-- cURL support enabled.
-- GetText enabled; locales found: be;ca;cs;da;de;dv;eo;es;et;fr;he;hu;id;it;ja;jbo;kk;kn;ko;ky;lt;ms;nb;nl;pl;pt;pt_BR;ro;ru;sl;sr_Cyrl;sv;sw;tr;uk;zh_CN;zh_TW
-- Freetype enabled.
-- ncurses console enabled.
-- PostgreSQL backend enabled
-- PostgreSQL includes: /usr/include/postgresql;/usr/include/postgresql/10/server
-- LevelDB backend enabled.
-- Redis backend enabled.
-- SpatialIndex not found!
-- Locale blacklist applied; Locales used: ca;cs;da;de;dv;eo;es;et;fr;hu;id;it;ja;jbo;kk;kn;lt;ms;nb;nl;pl;pt;pt_BR;ro;ru;sl;sr_Cyrl;sv;sw;tr;uk
-- Configuring done
-- Generating done
-- Build files have been written to: /[...]/minetest5

Although LevelDB and PostgreSQL support is enabled, the server uses map.sqlite.

@fluxionary
Copy link
Member Author

I haven't monitored this whatsoever since the move to multicraft; I'm curious to see if anything's changed.

@fluxionary
Copy link
Member Author

yup, it's still a thing. about 9.5* as much memory as the next largest server

top - 23:44:23 up 276 days,  8:17,  3 users,  load average: 0.71, 0.71, 0.78
Tasks: 300 total,   1 running, 298 sleeping,   1 stopped,   0 zombie
%Cpu(s):  9.2 us,  0.6 sy,  0.0 ni, 90.0 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 st
MiB Mem :  32182.8 total,    260.6 free,  20330.5 used,  11591.7 buff/cache
MiB Swap:   1533.0 total,      0.0 free,   1533.0 used.   3078.4 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                         
2992953 billys    20   0   13.9g  13.5g  11372 S  56.8  42.8   1232:14 multicraftserve   

@Lanhild Lanhild added the performance Something to do w/ server performance label Nov 5, 2020
@fluxionary
Copy link
Member Author

recent data:

29790 billys    20   0   20.6g  20.1g  49308 S  53.0  15.9   4108:20 minetestserver                                                    
30472 noah      20   0   10.9g  10.3g  20296 S  21.9   8.2   1523:49 multicraftserve                                                   
13600 noah      20   0 4137868   2.4g  28512 S  39.1   1.9 657:11.68 minetestserver                                                    
24666 srinivas  20   0 4224976   2.3g  11272 S  10.9   1.8  27663:00 minetestserver                                                    
19284 ivan      20   0 2695100   2.0g  17284 S  64.6   1.6 123:57.73 multicraftserve                                                   
 2436 prismo    20   0 2532124   1.9g  11032 S  45.0   1.5   1779:54 multicraftserve                                                   
24656 prismo    20   0 2452060   1.9g  17100 S  79.1   1.5 161:49.40 multicraftserve                                                   
20120 trainta+  20   0 3792680   1.9g  24024 S  34.4   1.5   5462:34 minetestserver                                                    
15691 pteroda+  20   0 2347576   1.7g   6548 S  55.6   1.4 832:45.12 multicraftserve                                                   
14715 medic     20   0 3169040   1.5g  83928 S   9.9   1.2 769:58.06 minetestserver                                                    
31774 1hit      20   0 2055988   1.5g   8224 S   1.3   1.2 448:11.16 multicraftserve                                                   
28870 pteroda+  20   0 1922180   1.3g   6436 S   2.0   1.0 770:45.23 multicraftserve                                                   
 1355 kiwi      20   0 1745572   1.2g  50828 S  21.5   0.9  31:41.34 multicraftserve                                                   
10388 pteroda+  20   0 1765456   1.2g   8076 S   5.0   0.9  68:35.63 minetestserver                                                    
16579 billys    20   0 1466976   1.1g   7804 S   0.7   0.9  45:13.00 minetestserver                                                    
17404 cg        20   0 2968960   1.0g   7628 S   1.3   0.8 797:43.55 multicraftserve                                                   
12378 cora      20   0 1391004 997956   5556 S   0.7   0.8 135:54.42 minetest                                                          
 3679 pteroda+  20   0 1531820 975840   6256 S   1.3   0.7 978:06.56 multicraftserve                                                   
14744 kako      20   0 1415400 932648  16220 S  19.9   0.7  91:41.86 multicraftserve                                                   
16087 santy     20   0 2712036 921724  72796 S  14.6   0.7 308:25.68 multicraftserve  

no clear reasons why the bls server process is restarted in the logs. this is absolutely still an issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bad crash Easy way to crash the server bug Something isn't working high priority Critical issue affecting the daily functioning of the server performance Something to do w/ server performance server Administrative task to do on the server
Projects
None yet
Development

No branches or pull requests

8 participants