-
-
Notifications
You must be signed in to change notification settings - Fork 604
OSv Case Study: Memcached
Memcached is a popular in-memory key-value store. It is used by many high-profile Web-sites to cache results of database queries and prepared page sections, to significantly boost these sites' performance.
An unmodified memcached on OSv was able to handle about 20% more requests per second than the same memcached version on Linux. A modified memcached, designed to use OSv-specific network APIs, had nearly four times the throughput.
Memcached is a good case study for OSv for several reasons:
- Memcached is a real and popular application, on both physical servers and the cloud. It is not a toy application or a benchmark.
- Memcached makes high demands on the operating system. It needs to handle a huge number of TCP or UDP requests, doing very little computation on each. It needs to manage a lot of memory filled with small objects, and it needs to have as much as possible free memory for the actual data. We believe that OSv is a better operating system for applications on the cloud, so we expect that a memcached VM ("virtual appliance") built on OSv can significantly outperform the traditional Linux-based VM.
- The peak throughput (requests-per-second) of a memcached server is normally limited only by the efficiency of the software (OS and memcached). It is not limited by unrelated factors like disk speed.
As we show below, the performance of an OSv-based memcached VM is significantly better than what is achievable on a Linux-based VM. The standard memcached performed better on OSv than on Linux (answering 20% more requests per second). But OSv can do even better, by not being bound by the 30 year old Unix networking APIs available on Linux; As we shall show below, we can get almost 4 times the performance of the Linux+Memcached baseline by writing a memcached server using the lower overhead APIs available in OSv.
Memcached's protocol supports both UDP and TCP. Each has different advantages and disadvantages; TCP is slower but more reliable, UDP is faster but only works for small (packet-sized) requests and responses. When the cached values are small, and request/response loss is acceptable (as is true when we remember that memcached is just a cache), UDP has the potential to provide better performance, and indeed companies like Facebook report using it.
This is why we decided to initially focus on performance of memcached on UDP. We also plan to look at TCP (which OSv also supports, of course), but later.
For a benchmark, we chose memaslap from libmemcached. This fairly known benchmark (available, for example, in a Fedora package), which repeatedly sends a configurable number of memcached requests, 10% of which are "SET" and 90% are "GET", and measures the achieved throughput (requests per second). We run memaslap from a different physical host to the host running the VM with memcached, and both hosts are connected via a direct 40 Gbps link. We give memcached a big enough memory for the memaslap benchmark to have zero cache misses in the test duration (e.g., we found 5 GB of cache to be enough for measurements of 30 seconds).
All our benchmarks below are using a 1-CPU VM. Ideally, memcached's performance should scale up with the number of CPUs, but to actually achieve this one needs a multi-queue network card, which aren't often available on virtual machines. So initially we focus our effort on bringing best memcached performance on a single-CPU VM.
The host was a single-socket quad-core Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz. It ran the KVM hypervisor and vhost-net, KVM's paravirtual network driver. Though the guest received a single CPU, the host actually used additional CPUs on its behalf, for processing the paravirtual I/O requests and for processing physical network interrupts; For each setup separately, we chose how to pin those various threads and interrupts to physical CPUs in the way that achieved the best performance.
For a Linux-based memcached VM we took an up-to-date Linux distribution, Fedora 20, and its included Linux kernel and memcached server. With this setup, with firewall disabled we achieved 104,394 requests per second.
When we ran the same version of memcached on OSv, we measured 20% better performance: 127,275.
A 20% performance improvement is nice, but we set out to demonstrate that it is possible to build a memcached virtual appliance using OSv with performance at least double that of the baseline, still using only a single CPU.
Some performance can be gained by rewriting memcached more efficiently. We wrote a memcached clone, which supports the subset of the memcached protocol needed for the benchmark (namely, UDP, and "GET" and "SET" operations). This memcached clone still used the Linux socket APIs, and achieved 161,740 requests/sec.
That's an impressive 55% improvement over the baseline, but not the end of the story. The socket APIs supported by Linux are great, and have served us well for 30 years, but have numerous overheads that prevent memcached from achieving truely mind-blowing performance. For example, a UDP socket can be used concurrently by multiple threads, so the implementation needs to use a lock to protect it - even if we know that in our application we'll never access the socket from more than one thread. Also, the various layers of the TCP/IP and socket stack are important for a full-featured operating system running a variety of servers, but it only slows down a VM whose single purpose is to provide one single UDP service - memcached.
So we wrote a new OSv-specific memcached clone which bypasses the socket APIs and most of the TCP/IP stack. This VM achieved a whopping 406,750 requests per second - 3.9 the baseline performance.
Our new osv-memcached is still a limited prototype: It currently only supports UDP, only supports "GET" and "SET" commands as required by the benchmark, and is limited to MTU-sized requests and responses (no IP fragmentation). However, we expect that these missing features can be added without sacrificing any of the performance that we've achieved.
- CPU: single socket, 4 cores Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz
- RAM: 32 GB
- NIC: Mellanox Technologies MT27500 Family (ConnectX-3)
- Client:
memaslap -s <server IP> -T 3 --concurrency 120 -t 30s --udp
- Server:
- Unmodified memcached:
memcached -u root -t 1 -m 5048
- Modified memcached:
osv-memcached.so -m 5048
- Unmodified memcached: