Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show bgp ipv4 json detail command increases the Linux VM size dramatically #16643

Open
2 tasks done
pguibert6WIND opened this issue Aug 23, 2024 · 6 comments
Open
2 tasks done
Labels
bgp triage Needs further investigation

Comments

@pguibert6WIND
Copy link
Member

pguibert6WIND commented Aug 23, 2024

Description

Under a linux device that received a 900K prefixes full route, if I dump the detailed json output on a file, I can see a dramatic increase in the virtual memory size used.

root@dut-sureau-nianticvf:~# ps -aux | grep bgpd
root        7905  9.2 10.2 1224200 1044208 ?     Ssl  18:20   1:46 /usr/bin/bgpd -A 127.0.0.1 -M snmp -M rpki -M bmp
root@dut-sureau-nianticvf:~# vtysh -c "show bgp ipv4 json detail" > /tmp/showbgpipv4detailjson.txt
root@dut-sureau-nianticvf:~# ps -aux | grep bgpd
root        7905 11.7 24.1 2638212 2457428 ?     Ssl  18:20   2:22 /usr/bin/bgpd -A 127.0.0.1 -M snmp -M rpki -M bmp
dut-sureau-nianticvf(config)# debu bgp memory dump-show-bgp-route
number of gc occurence for 'show bgp route': 100u

Virtual Memory size went from 1224200 to 2638212 KB
Resident Memory size went from 1044208 to 2457428 KB

Version

10.0
I think problem happens with all routes.

How to reproduce

get a full route setup, wait for stabilisation in the ZEBRA RIB.
Then request bgpd with above command.

Expected behavior

I dont expect a memory increase in VM size

Actual behavior

dramatic increase of VM size

Additional context

This is a full route extract with router peering with a single device.
However, in a real ISP scenario, multiple peering may happen.
Increasing the number of peers increases the memory used.

Checklist

  • I have searched the open issues for this bug.
  • I have not included sensitive information in this report.
@mjstapp
Copy link
Contributor

mjstapp commented Aug 23, 2024

will the work in the open PR about memory footprint in vtysh show commands ( #16498 ) help with this - have you tried that diff in this scenario?

@pguibert6WIND
Copy link
Member Author

pguibert6WIND commented Aug 23, 2024

will the work in the open PR about memory footprint in vtysh show commands ( #16498 ) help with this - have you tried that diff in this scenario?

The result is slightly better, but is not zero effort.
We still have virtual memory going from 1224212 KB to 2137220 GB.
We still have residential memory going from 1044440 KB to 1959064 GB.

root@dut-sureau-nianticvf:~# ps -aux | grep bgpd
root       10874 54.9 10.2 1224212 1044440 ?     Ssl  08:25   1:42 /usr/bin/bgpd -A 127.0.0.1 -M snmp -M rpki -M bmp
root@dut-sureau-nianticvf:~# time vtysh -c "show bgp ipv4 json detail" > /tmp/showbgpipv4detailjson.txt       

real    0m30.286s
user    0m2.796s
sys     0m5.565s
root@dut-sureau-nianticvf:~# ps -aux | grep bgpd
root        7702 75.7 19.2 2137220 1959064 ?     Ssl  08:14   2:23 /usr/bin/bgpd -A 127.0.0.1 -M snmp -M rpki -M bmp

My fear is still memory fragmentation,

@ton31337 ton31337 added the bgp label Aug 23, 2024
@ton31337
Copy link
Member

what does leak sanitizer say when running that command?

@hawicz
Copy link

hawicz commented Aug 25, 2024

Assuming I'm looking at the right code, I'm guessing that command emits a json that looks a bit like:

{"vrfs": {   "<vrfname>": {
	"protocols": {
		"<zebra_route_string_i>": "<NHT_RM_NAME>",
		...
	}
   },
   ...
}
  • json-c base json_object size: 40 bytes
  • json_object_object: 48 (base object + hash table ptr) + 56 (hash table) + #entries * (40 + avg key size + avg entry object size)
  • each entry in the vrfs.<vrfname>.protocols object will be a json_object_string: 48 bytes + length of string

If there are 900k of those, with each having a 10 bytes key and value, you should expect roughly 92MB for in-memory object storage, and definitely no more than that to serialize the object to a string.

If you find that your use of the json-c library uses much more than that, open a new issue over in the json-c project (i.e. please don't just piggy-back on json-c/json-c#552)

@pguibert6WIND
Copy link
Member Author

pguibert6WIND commented Aug 26, 2024

Assuming I'm looking at the right code, I'm guessing that command emits a json that looks a bit like:

{"vrfs": {   "<vrfname>": {
	"protocols": {
		"<zebra_route_string_i>": "<NHT_RM_NAME>",
		...
	}
   },
   ...
}
* json-c base json_object size: 40 bytes

* json_object_object: 48 (base object + hash table ptr) + 56 (hash table) + #entries * (40 + avg key size + avg entry object size)

* each entry in the `vrfs.<vrfname>.protocols` object will be a json_object_string: 48 bytes + length of string

If there are 900k of those, with each having a 10 bytes key and value, you should expect roughly 92MB for in-memory object storage, and definitely no more than that to serialize the object to a string.

If you find that your use of the json-c library uses much more than that, open a new issue over in the json-c project (i.e. please don't just piggy-back on json-c/json-c#552)

Hi Eric, thanks for the quick update.

As example, please find an extract of what the output looks like. The below route entry represents one of the 993276 entries present.

{
 "vrfId": 0,
 "vrfName": "default",
 "tableVersion": 993276,
 "routerId": "165.16.221.64",
 "defaultLocPrf": 100,
 "localAS": 65500,
 "routes": {
		"0.0.0.0/0":{
				"prefix": "0.0.0.0/0",
				"version": "1",
				"advertisedTo":{
					"165.16.221.65":{
					"hostname":"dut2-sureau-nianticvf"
					}
				},
				"paths":[{
					"aspath":{
						"string":"37721 3257",
						"segments":[{
							"type":"as-sequence",
							"list":[37721,3257]
						}],
						"length":2
					},
					"origin":"IGP","valid":true,"version":1,
					"bestpath":{
						"overall":true,
						"selectionReason":"First path received"
					},
					"community":{
						"string":"37721:4000 37721:4006 37721:4200 37721:4230",
						"list":[
							"37721:4000","37721:4006","37721:4200","37721:4230"
						]},
					"lastUpdate":{
						"epoch":1724653537,"string":"Mon Aug 26 08:25:37 2024\n"
					},
					"nexthops":[{
						"ip":"165.16.221.66","hostname":"dut2-sureau-nianticvf","afi":"ipv4","metric":0,
						"accessible":true,"used":true
					}],
					"peer":{
						"peerId":"165.16.221.65",
						"routerId":"165.16.221.65","hostname":"dut2-sureau-nianticvf","type":"external"
					}
				}]
		},

The whole file is https://drive.google.com/file/d/1NnXSUX_wuKN2Zcu8r1b8jkjbg63kG8Jx/view?usp=sharing
Basically, it is a list of paths with many different options for each time.

Thanks also for the numbers provided.
The json functionality itself works very well.
I have been a bit clumsy by addressing a comment directly on the json repository, and I apologise for that.
My guess is that the memory management is a problem on Linux, and that limiting the memory usage by all means can help reduce the memory footprint.

I do some experiments on memory management:

  • I try to separate memory used for the show, from the remaining (bgpd: generate json by default for [l]communities and aspaths #16654)
  • I tried to avoid using json_object_get() call (json_object_lock() in frrouting)) on aspath, large communities, and communities, and replaced the current json object with a json object with a simple string:
    before:
					"community":{
						"string":"37721:4000 37721:4006 37721:4200 37721:4230",
						"list":[
							"37721:4000","37721:4006","37721:4200","37721:4230"
						]},
...
					"aspath":{
						"string":"37721 3257",
						"segments":[{
							"type":"as-sequence",
							"list":[37721,3257]
						}],
						"length":2
					},

after:

					"community":{
						"string":"37721:4000 37721:4006 37721:4200 37721:4230",
                                         }
...
					"aspath":{
						"string":"37721 3257",
					},

I could have a global VM size of 2043920 KB instead of 2137220 KB

  • I would like to reuse memory blocks used by each prefix. Actually, why would I need to do malloc/free 993276 times, whereas I basically dump each time the same prefix structure.

This is based on the last experiment that I need some help on the json APIs available to build such cases.

@pguibert6WIND
Copy link
Member Author

As additional test done, without changing the json model, I could see that the vty_json_no_pretty() function takes a lot of memory.

text = json_object_to_json_string_ext()
json_object_free(json);

if the call is not done, the virtual memory size is far better.
Virtual Memory size increased from 1663864 KB to 1704664 KB instead of 2043920 KB.

root@dut-sureau-nianticvf:~# ps -aux | grep bgpd
root       59828 28.3 14.5 1663864 1485696 ?     Ssl  12:02   1:47 /usr/bin/bgpd -A 127.0.0.1 -M snmp -M rpki -M bmp

root@dut-sureau-nianticvf:~# time vtysh -c "show bgp ipv4 json detail" > /tmp/showbgpipv4detailjson.txt

real    0m24.767s
user    0m1.152s
sys     0m1.152s

root@dut-sureau-nianticvf:~# ps -aux | grep bgpd
root       59828 31.6 14.9 1704664 1526640 ?     Ssl  12:02   2:10 /usr/bin/bgpd -A 127.0.0.1 -M snmp -M rpki -M bmp

Finding out how to optimize the display could help resolve this spike in VM size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bgp triage Needs further investigation
Projects
None yet
Development

No branches or pull requests

4 participants