Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update README #412

Merged
merged 1 commit into from
Nov 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .golangci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ linters-settings:
gosec:
excludes:
- G404 # it is okay to use math/rand at times.
- G115 # presents false positives for conversion

linters:
disable-all: true
Expand Down
121 changes: 24 additions & 97 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Ristretto
[![Go Doc](https://img.shields.io/badge/godoc-reference-blue.svg)](http://godoc.org/github.com/dgraph-io/ristretto)
[![Go Doc](https://img.shields.io/badge/godoc-reference-blue.svg)](https://pkg.go.dev/github.com/dgraph-io/ristretto/v2)
[![ci-ristretto-tests](https://github.com/dgraph-io/ristretto/actions/workflows/ci-ristretto-tests.yml/badge.svg)](https://github.com/dgraph-io/ristretto/actions/workflows/ci-ristretto-tests.yml)
[![ci-ristretto-lint](https://github.com/dgraph-io/ristretto/actions/workflows/ci-ristretto-lint.yml/badge.svg)](https://github.com/dgraph-io/ristretto/actions/workflows/ci-ristretto-lint.yml)
[![Coverage Status](https://coveralls.io/repos/github/dgraph-io/ristretto/badge.svg?branch=main)](https://coveralls.io/github/dgraph-io/ristretto?branch=main)
Expand All @@ -11,6 +11,7 @@ The motivation to build Ristretto comes from the need for a contention-free cach

[Dgraph]: https://github.com/dgraph-io/dgraph


## Features

* **High Hit Ratios** - with our unique admission/eviction policy pairing, Ristretto's performance is best in class.
Expand All @@ -22,56 +23,33 @@ The motivation to build Ristretto comes from the need for a contention-free cach
* **Metrics** - optional performance metrics for throughput, hit ratios, and other stats.
* **Simple API** - just figure out your ideal `Config` values and you're off and running.


## Status

Ristretto is production-ready. See [Projects using Ristretto](#projects-using-ristretto).

## Table of Contents

- [Ristretto](#ristretto)
- [Features](#features)
- [Status](#status)
- [Table of Contents](#table-of-contents)
- [Usage](#usage)
- [Example](#example)
- [Config](#config)
- [Benchmarks](#benchmarks)
- [Hit Ratios](#hit-ratios)
- [Search](#search)
- [Database](#database)
- [Looping](#looping)
- [CODASYL](#codasyl)
- [Throughput](#throughput)
- [Mixed](#mixed)
- [Read](#read)
- [Write](#write)
- [Projects Using Ristretto](#projects-using-ristretto)
- [FAQ](#faq)
- [How are you achieving this performance? What shortcuts are you taking?](#how-are-you-achieving-this-performance-what-shortcuts-are-you-taking)
- [Is Ristretto distributed?](#is-ristretto-distributed)

## Usage

### Example

```go
package main

import (
"fmt"

"github.com/dgraph-io/ristretto"
"github.com/dgraph-io/ristretto/v2"
)

func main() {
cache, err := ristretto.NewCache(&ristretto.Config[string,string]{
cache, err := ristretto.NewCache(&ristretto.Config[string, string]{
NumCounters: 1e7, // number of keys to track frequency of (10M).
MaxCost: 1 << 30, // maximum cost of cache (1GB).
BufferItems: 64, // number of keys per Get buffer.
})
if err != nil {
panic(err)
}
defer cache.Close()

// set a value with a cost of 1
cache.Set("key", "value", 1)
Expand All @@ -91,65 +69,12 @@ func main() {
}
```

### Config

The `Config` struct is passed to `NewCache` when creating Ristretto instances (see the example above).

**NumCounters** `int64`

NumCounters is the number of 4-bit access counters to keep for admission and eviction. We've seen good performance in setting this to 10x the number of items you expect to keep in the cache when full.

For example, if you expect each item to have a cost of 1 and MaxCost is 100, set NumCounters to 1,000. Or, if you use variable cost values but expect the cache to hold around 10,000 items when full, set NumCounters to 100,000. The important thing is the *number of unique items* in the full cache, not necessarily the MaxCost value.

**MaxCost** `int64`

MaxCost is how eviction decisions are made. For example, if MaxCost is 100 and a new item with a cost of 1 increases total cache cost to 101, 1 item will be evicted.

MaxCost can also be used to denote the max size in bytes. For example, if MaxCost is 1,000,000 (1MB) and the cache is full with 1,000 1KB items, a new item (that's accepted) would cause 5 1KB items to be evicted.

MaxCost could be anything as long as it matches how you're using the cost values when calling Set.

**BufferItems** `int64`

BufferItems is the size of the Get buffers. The best value we've found for this is 64.

If for some reason you see Get performance decreasing with lots of contention (you shouldn't), try increasing this value in increments of 64. This is a fine-tuning mechanism and you probably won't have to touch this.

**Metrics** `bool`

Metrics is true when you want real-time logging of a variety of stats. The reason this is a Config flag is because there's a 10% throughput performance overhead.

**OnEvict** `func(hashes [2]uint64, value interface{}, cost int64)`

OnEvict is called for every eviction.

**KeyToHash** `func(key interface{}) [2]uint64`

KeyToHash is the hashing algorithm used for every key. If this is nil, Ristretto has a variety of [defaults depending on the underlying interface type](https://github.com/dgraph-io/ristretto/blob/master/z/z.go#L19-L41).

Note that if you want 128bit hashes you should use the full `[2]uint64`,
otherwise just fill the `uint64` at the `0` position and it will behave like
any 64bit hash.

**Cost** `func(value interface{}) int64`

Cost is an optional function you can pass to the Config in order to evaluate
item cost at runtime, and only for the Set calls that aren't dropped (this is
useful if calculating item cost is particularly expensive and you don't want to
waste time on items that will be dropped anyways).

To signal to Ristretto that you'd like to use this Cost function:

1. Set the Cost field to a non-nil function.
2. When calling Set for new items or item updates, use a `cost` of 0.

## Benchmarks

The benchmarks can be found in https://github.com/dgraph-io/benchmarks/tree/master/cachebench/ristretto.

### Hit Ratios

#### Search
### Hit Ratios for Search

This trace is described as "disk read accesses initiated by a large commercial
search engine in response to various web search requests."
Expand All @@ -158,7 +83,7 @@ search engine in response to various web search requests."
<img src="https://raw.githubusercontent.com/dgraph-io/ristretto/master/benchmarks/Hit%20Ratios%20-%20Search%20(ARC-S3).svg">
</p>

#### Database
### Hit Ratio for Database

This trace is described as "a database server running at a commercial site
running an ERP application on top of a commercial database."
Expand All @@ -167,46 +92,41 @@ running an ERP application on top of a commercial database."
<img src="https://raw.githubusercontent.com/dgraph-io/ristretto/master/benchmarks/Hit%20Ratios%20-%20Database%20(ARC-DS1).svg">
</p>

#### Looping
### Hit Ratio for Looping

This trace demonstrates a looping access pattern.

<p align="center">
<img src="https://raw.githubusercontent.com/dgraph-io/ristretto/master/benchmarks/Hit%20Ratios%20-%20Glimpse%20(LIRS-GLI).svg">
</p>

#### CODASYL
### Hit Ratio for CODASYL

This trace is described as "references to a CODASYL database for a one hour
period."
This trace is described as "references to a CODASYL database for a one hour period."

<p align="center">
<img src="https://raw.githubusercontent.com/dgraph-io/ristretto/master/benchmarks/Hit%20Ratios%20-%20CODASYL%20(ARC-OLTP).svg">
</p>

### Throughput

All throughput benchmarks were ran on an Intel Core i7-8700K (3.7GHz) with 16gb
of RAM.

#### Mixed
### Throughput for Mixed Workload

<p align="center">
<img src="https://raw.githubusercontent.com/dgraph-io/ristretto/master/benchmarks/Throughput%20-%20Mixed.svg">
</p>

#### Read
### Throughput ffor Read Workload

<p align="center">
<img src="https://raw.githubusercontent.com/dgraph-io/ristretto/master/benchmarks/Throughput%20-%20Read%20(Zipfian).svg">
</p>

#### Write
### Through for Write Workload

<p align="center">
<img src="https://raw.githubusercontent.com/dgraph-io/ristretto/master/benchmarks/Throughput%20-%20Write%20(Zipfian).svg">
</p>


## Projects Using Ristretto

Below is a list of known projects that use Ristretto:
Expand All @@ -216,13 +136,20 @@ Below is a list of known projects that use Ristretto:
- [Vitess](https://github.com/vitessio/vitess) - Database clustering system for horizontal scaling of MySQL
- [SpiceDB](https://github.com/authzed/spicedb) - Horizontally scalable permissions database


## FAQ

### How are you achieving this performance? What shortcuts are you taking?

We go into detail in the [Ristretto blog post](https://blog.dgraph.io/post/introducing-ristretto-high-perf-go-cache/), but in short: our throughput performance can be attributed to a mix of batching and eventual consistency. Our hit ratio performance is mostly due to an excellent [admission policy](https://arxiv.org/abs/1512.00727) and SampledLFU eviction policy.
We go into detail in the [Ristretto blog post](https://blog.dgraph.io/post/introducing-ristretto-high-perf-go-cache/),
but in short: our throughput performance can be attributed to a mix of batching and eventual consistency. Our hit ratio
performance is mostly due to an excellent [admission policy](https://arxiv.org/abs/1512.00727) and SampledLFU eviction policy.

As for "shortcuts," the only thing Ristretto does that could be construed as one is dropping some Set calls. That means a Set call for a new item (updates are guaranteed) isn't guaranteed to make it into the cache. The new item could be dropped at two points: when passing through the Set buffer or when passing through the admission policy. However, this doesn't affect hit ratios much at all as we expect the most popular items to be Set multiple times and eventually make it in the cache.
As for "shortcuts," the only thing Ristretto does that could be construed as one is dropping some Set calls. That means
a Set call for a new item (updates are guaranteed) isn't guaranteed to make it into the cache. The new item could be
dropped at two points: when passing through the Set buffer or when passing through the admission policy. However, this
doesn't affect hit ratios much at all as we expect the most popular items to be Set multiple times and eventually make
it in the cache.

### Is Ristretto distributed?

Expand Down
62 changes: 50 additions & 12 deletions cache.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ func zeroValue[T any]() T {
return zero
}

// Key is the generic type to represent the keys type in key-value pair of the cache.
type Key = z.Key

// Cache is a thread-safe implementation of a hashmap with a TinyLFU admission
Expand Down Expand Up @@ -98,7 +99,15 @@ type Config[K Key, V any] struct {
// counter for the bloom filter). Note that the number of counters is
// internally rounded up to the nearest power of 2, so the space usage
// may be a little larger than 3 bytes * NumCounters.
//
// We've seen good performance in setting this to 10x the number of items
// you expect to keep in the cache when full.
NumCounters int64

// MaxCost is how eviction decisions are made. For example, if MaxCost is
// 100 and a new item with a cost of 1 increases total cache cost to 101,
// 1 item will be evicted.
//
// MaxCost can be considered as the cache capacity, in whatever units you
// choose to use.
//
Expand All @@ -107,40 +116,69 @@ type Config[K Key, V any] struct {
// the `cost` parameter for calls to Set. If new items are accepted, the
// eviction process will take care of making room for the new item and not
// overflowing the MaxCost value.
//
// MaxCost could be anything as long as it matches how you're using the cost
// values when calling Set.
MaxCost int64

// BufferItems determines the size of Get buffers.
//
// Unless you have a rare use case, using `64` as the BufferItems value
// results in good performance.
//
// If for some reason you see Get performance decreasing with lots of
// contention (you shouldn't), try increasing this value in increments of 64.
// This is a fine-tuning mechanism and you probably won't have to touch this.
BufferItems int64
// Metrics determines whether cache statistics are kept during the cache's
// lifetime. There *is* some overhead to keeping statistics, so you should
// only set this flag to true when testing or throughput performance isn't a
// major factor.

// Metrics is true when you want variety of stats about the cache.
// There is some overhead to keeping statistics, so you should only set this
// flag to true when testing or throughput performance isn't a major factor.
Metrics bool
// OnEvict is called for every eviction and passes the hashed key, value,
// and cost to the function.

// OnEvict is called for every eviction with the evicted item.
OnEvict func(item *Item[V])

// OnReject is called for every rejection done via the policy.
OnReject func(item *Item[V])

// OnExit is called whenever a value is removed from cache. This can be
// used to do manual memory deallocation. Would also be called on eviction
// and rejection of the value.
// as well as on rejection of the value.
OnExit func(val V)

// KeyToHash function is used to customize the key hashing algorithm.
// Each key will be hashed using the provided function. If keyToHash value
// is not set, the default keyToHash function is used.
//
// Ristretto has a variety of defaults depending on the underlying interface type
// https://github.com/dgraph-io/ristretto/blob/master/z/z.go#L19-L41).
//
// Note that if you want 128bit hashes you should use the both the values
// in the return of the function. If you want to use 64bit hashes, you can
// just return the first uint64 and return 0 for the second uint64.
KeyToHash func(key K) (uint64, uint64)
// Cost evaluates a value and outputs a corresponding cost. This function
// is ran after Set is called for a new item or an item update with a cost
// param of 0.

// Cost evaluates a value and outputs a corresponding cost. This function is ran
// after Set is called for a new item or an item is updated with a cost param of 0.
//
// Cost is an optional function you can pass to the Config in order to evaluate
// item cost at runtime, and only whentthe Set call isn't going to be dropped. This
// is useful if calculating item cost is particularly expensive and you don't want to
// waste time on items that will be dropped anyways.
//
// To signal to Ristretto that you'd like to use this Cost function:
// 1. Set the Cost field to a non-nil function.
// 2. When calling Set for new items or item updates, use a `cost` of 0.
Cost func(value V) int64

// IgnoreInternalCost set to true indicates to the cache that the cost of
// internally storing the value should be ignored. This is useful when the
// cost passed to set is not using bytes as units. Keep in mind that setting
// this to true will increase the memory usage.
IgnoreInternalCost bool
// TtlTickerDurationInSec set the value of time ticker for cleanup keys on ttl

// TtlTickerDurationInSec sets the value of time ticker for cleanup keys on TTL expiry.
TtlTickerDurationInSec int64
}

Expand All @@ -152,7 +190,7 @@ const (
itemUpdate
)

// Item is passed to setBuf so items can eventually be added to the cache.
// Item is a full representation of what's stored in the cache for each key-value pair.
type Item[V any] struct {
flag itemFlag
Key uint64
Expand Down
8 changes: 0 additions & 8 deletions sketch.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,6 @@
* limitations under the License.
*/

// This package includes multiple probabalistic data structures needed for
// admission/eviction metadata. Most are Counting Bloom Filter variations, but
// a caching-specific feature that is also required is a "freshness" mechanism,
// which basically serves as a "lifetime" process. This freshness mechanism
// was described in the original TinyLFU paper [1], but other mechanisms may
// be better suited for certain data distributions.
//
// [1]: https://arxiv.org/abs/1512.00727
package ristretto

import (
Expand Down
4 changes: 2 additions & 2 deletions z/rtutil_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -285,13 +285,13 @@ func BenchmarkCPUTicks(b *testing.B) {

// goos: linux
// goarch: amd64
// pkg: github.com/dgraph-io/ristretto/z
// pkg: github.com/dgraph-io/ristretto/v2/z
// BenchmarkFastRand-16 1000000000 0.292 ns/op
// BenchmarkRandSource-16 1000000000 0.747 ns/op
// BenchmarkRandGlobal-16 6822332 176 ns/op
// BenchmarkRandAtomic-16 77950322 15.4 ns/op
// PASS
// ok github.com/dgraph-io/ristretto/z 4.808s
// ok github.com/dgraph-io/ristretto/v2/z 4.808s
func benchmarkRand(b *testing.B, fab func() func() uint32) {
b.RunParallel(func(pb *testing.PB) {
gen := fab()
Expand Down
Loading