Skip to content

Memcached Benchmark

Dor Laor edited this page Feb 19, 2015 · 12 revisions

The following describe the details of the Memcached benchmark making it reproducible. Let us know if you find anything is missing.

Latest Results (Feb 19)

Raw data:

CPU Seastar Memcached with DPDK Stock Memcached (multi process) Stock memcached (multi threaded)
2 553,175 350,844 321,287
4 1,021,918 615,270 573,149
6 1,703,790 857,428 709,502
8 2,149,162 1,102,417 741,356
10 2,629,885 1,335,069 608,014
12 2,870,919 1,528,598 608,968
14 3,217,044 1,726,642 440,658
16 3,460,167 1,887,060 603,479
18 4,049,397 2,167,573 902,192
20 4,426,457 2,281,064 1,128,469

As you can see, SeaStar's Memcache server is 4X faster than the stock threaded memcache. The later suffers from various locking issues, especially the mutex_trylock busy wait look. In order to squeeze more performance out of stock memcache we executed it as multiple single processes that share nothing. It's not a fair comparison since this way memory isn't shared and it puts some responsibility and complexity on the client. Even with this approach SeaStar outperforms stock memcache by 2X.

It worth to note that SeaStar was designed for much more complex scenarios than memcache and should excel even more when high level of parallelism is needed.

Collectd/Graphite statistics

The stats were retrieved using graphite and the internal collectd client when run with 4 cores. The top right graph shows the packet coalescing rate - as the load increases (the top left graph shows the idle time shrinks to zero), each packet processing round handles 30 packets.

The bottom right graph shows #tasks executed per core. The number is 1,250,000/sec. Remember it's the amount of SeaStar tasks, not memcache. The bottom left graph shows the number of network packets each core handles (in this setup, between 200k/s-250k/s).

Test bed:

  • Server 1: Memcache server
  • Server 2: Memcache Client - memaslap

Software

Server 1 (Stock Memcached) Setup:

  • Memcached version 1.4.17
  • One, single threaded, Memcached process per CPU

Server 1 (Seastar Memcached with DPDK) Setup:

  1. Fetch dpdk from upstream (support for i40e is not sufficient in 1.8.0)
  2. update config/common_linuxapp
  3. update CONFIG_RTE_MBUF_REFCNT to 'n'
  4. update CONFIG_RTE_MAX_MEMSEG=4096
  5. follow instructions from Seastar readme on DPDK installation for 1.8.0
  • hugepages define 2048,2048 pages
  • compile seastar
  1. sudo build/release/apps/memcached/memcached --network-stack native --dpdk-pmd --dhcp 0 --host-ipv4-addr $seastar_ip --netmask-ipv4-addr 255.255.255.0 --collectd 0 --smp $cpu

Server 2 (memaslap) Setup

  • memaslap from libmemcached-1.0.18
  • Disable irqbalance
  • Fix the irq smp_affinity of the 40Gb card to invoke each interrupt on a single cpu
for $cpu < 6 
for ((i = 0; i < 12; ++i)); do taskset -c $i memaslap -s $seastar_ip:11211 -t 60s -T 1 -c 60 -X 64 & done

for $cpu >= 6 
for ((i = 0; i < 52; ++i)); do taskset -c $i memaslap -s $seastar_ip:11211 -t 60s -T 1 -c 60 -X 64 & done
  • verify there are no misses in each test - restart memcached for each test

Hardware

Same as HTTPD Test

Clone this wiki locally