diff --git a/CHANGELOG.md b/CHANGELOG.md
index 4f4915d..c3c2c61 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -10,6 +10,21 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Added
 
 * `hbf` FIRs, symmetric FIRs, half band filters, HBF decimators and interpolators
+* `iir::Pid`, `iir:Filter` a builder for PID coefficients and the collection of standard Biquad filters
+* `iir::Biquad::{HOLD, IDENTITY, proportional}`
+* `iir::Biquad` getter/setter
+* `iir`: support for other integers (i8, i16, i128)
+* `iir::Biquad`: support for reduced DF1 state and DF2T state
+* `svf`: state variable filter
+
+### Removed
+
+* `iir::Vec5` type alias has been removed.
+* `iir_int`: integrated into `iir`.
+
+### Changed
+
+* `iir`: The biquad IIR filter API has been reworked. `IIR -> Biquad` renamed.
 
 ## [0.10.0](https://github.com/quartiq/idsp/compare/v0.9.2..v0.10.0) - 2023-07-20
 
diff --git a/Cargo.toml b/Cargo.toml
index 17c1f69..ae150bf 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -16,16 +16,10 @@ num-complex = { version = "0.4.0", features = ["serde"], default-features = fals
 num-traits = { version = "0.2.14", features = ["libm"], default-features = false}
 
 [dev-dependencies]
-easybench = "1.0"
 rand = "0.8"
-ndarray = "0.15"
 rustfft = "6.1.0"
 # futuredsp = "0.0.6"
 # sdr = "0.7.0"
 
-[[bench]]
-name = "micro"
-harness = false
-
 [profile.release]
 debug = 1
diff --git a/README.md b/README.md
index e6eadac..f8ec790 100644
--- a/README.md
+++ b/README.md
@@ -1,49 +1,92 @@
 # Embedded DSP algorithms
 
 [![GitHub release](https://img.shields.io/github/v/release/quartiq/idsp?include_prereleases)](https://github.com/quartiq/idsp/releases)
+[![crates.io](https://img.shields.io/crates/v/idsp.svg)](https://crates.io/crates/idsp)
 [![Documentation](https://img.shields.io/badge/docs-online-success)](https://docs.rs/idsp)
 [![QUARTIQ Matrix Chat](https://img.shields.io/matrix/quartiq:matrix.org)](https://matrix.to/#/#quartiq:matrix.org)
 [![Continuous Integration](https://github.com/quartiq/idsp/actions/workflows/ci.yml/badge.svg)](https://github.com/quartiq/idsp/actions/workflows/ci.yml)
 
 This crate contains some tuned DSP algorithms for general and especially embedded use.
-Many of the algorithms are implemented on integer datatypes for reasons that become important in certain cases:
-
-* Speed: even with a hard floating point unit integer operations are faster.
-* Accuracy: single precision FP has a 24 bit mantissa, `i32` has full 32 bit.
-* No rounding errors.
-* Natural wrap around (modulo) at the integer overflow: critical for phase/frequency applications.
-* Natural definition of "full scale".
+Many of the algorithms are implemented on integer (fixed point) datatypes.
 
 One comprehensive user for these algorithms is [Stabilizer](https://github.com/quartiq/stabilizer).
 
-## Cosine/Sine `cossin`
-
-This uses a small (128 element or 512 byte) LUT, smart octant (un)mapping, linear interpolation and comprehensive analysis of corner cases to achieve a very clean signal (4e-6 RMS error, 9e-6 max error, 108 dB SNR typ), low spurs, and no bias with about 40 cortex-m instruction per call. It computes both cosine and sine (i.e. the complex signal) at once given a phase input.
+## Fixed point
 
-## Two-argument arcus-tangens `atan2`
+### Cosine/Sine
 
-This returns a phase given a complex signal (a pair of in-phase/`x`/cosine and quadrature/`y`/sine). The RMS phase error is less than 5e-6 rad, max error is less than 1.2e-5 rad, i.e. 20.5 bit RMS, 19.1 bit max accuracy. The bias is minimal.
+[`cossin()`] uses a small (128 element or 512 byte) LUT, smart octant (un)mapping, linear interpolation and comprehensive analysis of corner cases to achieve a very clean signal (4e-6 RMS error, 9e-6 max error, 108 dB SNR typ), low spurs, and no bias with about 40 cortex-m instruction per call. It computes both cosine and sine (i.e. the complex signal) at once given a phase input.
 
-## ComplexExt
+### Two-argument arcus-tangens
 
-An extension trait for the `num::Complex` type featuring especially a `std`-like API to the two functions above.
+[`atan2()`] returns a phase given a complex signal (a pair of in-phase/`x`/cosine and quadrature/`y`/sine). The RMS phase error is less than 5e-6 rad, max error is less than 1.2e-5 rad, i.e. 20.5 bit RMS, 19.1 bit max accuracy. The bias is minimal.
 
 ## PLL, RPLL
 
-High accuracy, zero-assumption, fully robust, forward and reciprocal PLLs with dynamically adjustable time constant and arbitrary (in the Nyquist sampling sense) capture range.
+[`PLL`], [`RPLL`]: High accuracy, zero-assumption, fully robust, forward and reciprocal PLLs with dynamically adjustable time constant and arbitrary (in the Nyquist sampling sense) capture range, and noise shaping.
 
-## Unwrapper, Accu, saturating_scale
+## `Unwrapper`, `Accu`, `saturating_scale()`
 
+[`Unwrapper`], [`Accu`], [`saturating_scale()`]:
 Tools to handle, track, and unwrap phase signals or generate them.
 
-## iir_int, iir
+## Float and Fixed point
+
+## IIR/Biquad
+
+[`iir::Biquad`] are fixed point (`i8`, `i16`, `i32`, `i64`) and floating point (`f32`, `f64`) biquad IIR filters.
+Robust and clean clipping and offset (anti-windup, no derivative kick, dynamically adjustable gains and gain limits) suitable for PID controller applications.
+Three kinds of filter actions: Direct Form 1, Direct Form 2 Transposed, and Direct Form 1 with noise shaping supported.
+Coefficient sharing for multiple channels.
+
+### Comparison
+
+This is a rough feature comparison of several available `biquad` crates, with no claim for completeness, accuracy, or even fairness.
+TL;DR: `idsp` is slower but offers more features.
+
+| Feature\Crate | [`biquad-rs`](https://crates.io/crates/biquad) | [`fixed-filters`](https://crates.io/crates/fixed-filters) | `idsp::iir` |
+|---|---|---|---|
+| Floating point `f32`/`f64` | ✅ | ❌ | ✅ |
+| Fixed point `i32` | ❌ | ✅ | ✅ |
+| Parametric fixed point `i32` | ❌ | ✅ | ❌ |
+| Fixed point `i8`/`i16`/`i64`/`i128` | ❌ | ❌ | ✅ |
+| DF2T | ✅ | ❌ | ✅ |
+| Limiting/Clamping | ❌ | ✅ | ✅ |
+| Fixed point accumulator guard bits | ❌ | ❌ | ✅ |
+| Summing junction offset | ❌ | ❌ | ✅ |
+| Fixed point noise shaping | ❌ | ❌ | ✅ |
+| Configuration/state decoupling/multi-channel | ❌ | ❌ | ✅ |
+| `f32` parameter audio filter builder | ✅ | ✅ | ✅ |
+| `f64` parameter audio filter builder | ✅ | ❌ | ✅ |
+| Additional filters (I/HO) | ❌ | ❌ | ✅ |
+| `f32` PI builder | ❌ | ✅ | ✅ |
+| `f32/f64` PI²D² builder | ❌ | ❌ | ✅ |
+| PI²D² builder limits | ❌ | ❌ | ✅ |
+| Support for fixed point `a1=-2` | ❌ | ❌ | ✅ |
+
+Three crates have been compared when processing 4x1M samples (4 channels) with a biquad lowpass.
+Hardware was `thumbv7em-none-eabihf`, `cortex-m7`, code in ITCM, data in DTCM, caches enabled.
+
+| Crate | Type, features | Cycles per sample |
+|---|---|---|
+| [`biquad-rs`](https://crates.io/crates/biquad) | `f32` | 11.4 |
+| `idsp::iir` | `f32`, limits, offset | 15.5 |
+| [`fixed-filters`](https://crates.io/crates/fixed-filters) | `i32`, limits | 20.3 |
+| `idsp::iir` | `i32`, limits, offset | 23.5 |
+| `idsp::iir` | `i32`, limits, offset, noise shaping | 30.0 |
+
+## State variable filter
 
-`i32` and `f32` biquad IIR filters with robust and clean clipping and offset (anti-windup, no derivative kick, dynamically adjustable gains).
+[`svf`] is a simple IIR state variable filter simultaneously providing highpass, lowpass,
+bandpass, and notch filtering of a signal.
 
-## Lowpass, Lockin
+## `Lowpass`, `Lockin`
 
-Fast, infinitely cascadable, first- and second-order lowpass and the corresponding integration into a lockin amplifier algorithm.
+[`Lowpass`], [`Lockin`] are fast, infinitely cascadable, first- and second-order lowpass and the corresponding integration into a lockin amplifier algorithm.
 
 ## FIR filters
 
+[`hbf::HbfDec`], [`hbf::HbfInt`], [`hbf::HbfDecCascade`], [`hbf::HbfIntCascade`]:
 Fast `f32` symmetric FIR filters, optimized half-band filters, half-band filter decimators and integators and cascades.
+These are used in [`stabilizer-stream`](https://github.com/quartiq/stabilizer-stream) for online PSD calculation on log
+frequency scale for arbitrarily large amounts of data.
diff --git a/benches/micro.rs b/benches/micro.rs
deleted file mode 100644
index 9fb552d..0000000
--- a/benches/micro.rs
+++ /dev/null
@@ -1,103 +0,0 @@
-use core::f32::consts::PI;
-
-use easybench::bench_env;
-
-use idsp::{atan2, cossin, iir, iir_int, Filter, Lowpass, PLL, RPLL};
-
-fn atan2_bench() {
-    let xi = (10 << 16) as i32;
-    let xf = xi as f32 / i32::MAX as f32;
-
-    let yi = (-26_328 << 16) as i32;
-    let yf = yi as f32 / i32::MAX as f32;
-
-    println!(
-        "atan2(yi, xi): {}",
-        bench_env((yi, xi), |(yi, xi)| atan2(*yi, *xi))
-    );
-    println!(
-        "yf.atan2(xf): {}",
-        bench_env((yf, xf), |(yf, xf)| yf.atan2(*xf))
-    );
-}
-
-fn cossin_bench() {
-    let zi = -0x7304_2531_i32;
-    let zf = zi as f32 / i32::MAX as f32 * PI;
-    println!("cossin(zi): {}", bench_env(zi, |zi| cossin(*zi)));
-    println!("zf.sin_cos(): {}", bench_env(zf, |zf| zf.sin_cos()));
-}
-
-fn rpll_bench() {
-    let mut dut = RPLL::new(8);
-    println!(
-        "RPLL::update(Some(t), 21, 20): {}",
-        bench_env(Some(0x241), |x| dut.update(*x, 21, 20))
-    );
-    println!(
-        "RPLL::update(Some(t), sf, sp): {}",
-        bench_env((Some(0x241), 21, 20), |(x, p, q)| dut.update(*x, *p, *q))
-    );
-}
-
-fn pll_bench() {
-    let mut dut = PLL::default();
-    println!(
-        "PLL::update(Some(t), 12, 12): {}",
-        bench_env(Some(0x241), |x| dut.update(*x, 12))
-    );
-    println!(
-        "PLL::update(Some(t), sf, sp): {}",
-        bench_env((Some(0x241), 21), |(x, p)| dut.update(*x, *p))
-    );
-}
-
-fn iir_int_bench() {
-    let dut = iir_int::IIR::default();
-    let mut xy = iir_int::Vec5::default();
-    println!(
-        "int_iir::IIR::update(s, x): {}",
-        bench_env(0x2832, |x| dut.update(&mut xy, *x))
-    );
-}
-
-fn iir_f32_bench() {
-    let dut = iir::IIR::<f32>::default();
-    let mut xy = iir::Vec5::default();
-    println!(
-        "int::IIR::<f32>::update(s, x): {}",
-        bench_env(0.32241, |x| dut.update(&mut xy, *x, true))
-    );
-}
-
-fn iir_f64_bench() {
-    let dut = iir::IIR::<f64>::default();
-    let mut xy = iir::Vec5::default();
-    println!(
-        "int::IIR::<f64>::update(s, x): {}",
-        bench_env(0.32241, |x| dut.update(&mut xy, *x, true))
-    );
-}
-
-fn lowpass_bench() {
-    let mut dut = Lowpass::<1>::default();
-    println!(
-        "Lowpass::<1>::update(x, k): {}",
-        bench_env((0x32421, 14), |(x, k)| dut.update(*x, &[*k]))
-    );
-    println!(
-        "Lowpass::<1>::update(x, 14): {}",
-        bench_env(0x32421, |x| dut.update(*x, &[14]))
-    );
-}
-
-fn main() {
-    atan2_bench();
-    cossin_bench();
-    rpll_bench();
-    pll_bench();
-    iir_int_bench();
-    iir_f32_bench();
-    iir_f64_bench();
-    lowpass_bench();
-}
diff --git a/rustfmt.toml b/rustfmt.toml
new file mode 100644
index 0000000..16bdde9
--- /dev/null
+++ b/rustfmt.toml
@@ -0,0 +1 @@
+format_code_in_doc_comments = true
diff --git a/src/accu.rs b/src/accu.rs
index 343361d..6bf2d91 100644
--- a/src/accu.rs
+++ b/src/accu.rs
@@ -1,5 +1,6 @@
 use num_traits::ops::wrapping::WrappingAdd;
 
+/// Wrapping Accumulator
 #[derive(Copy, Clone, Default, PartialEq, Eq, Debug)]
 pub struct Accu<T> {
     state: T,
@@ -7,6 +8,7 @@ pub struct Accu<T> {
 }
 
 impl<T> Accu<T> {
+    /// Create a new accumulator with given initial state and step.
     pub fn new(state: T, step: T) -> Self {
         Self { state, step }
     }
diff --git a/src/complex.rs b/src/complex.rs
index 87c6c58..27915b4 100644
--- a/src/complex.rs
+++ b/src/complex.rs
@@ -4,11 +4,17 @@ use super::{atan2, cossin};
 
 /// Complex extension trait offering DSP (fast, good accuracy) functionality.
 pub trait ComplexExt<T, U> {
+    /// Unit magnitude from angle
     fn from_angle(angle: T) -> Self;
+    /// Square of magnitude
     fn abs_sqr(&self) -> U;
+    /// Log2 approximation
     fn log2(&self) -> T;
+    /// Angle
     fn arg(&self) -> T;
+    /// Staturating addition
     fn saturating_add(&self, other: Self) -> Self;
+    /// Saturating subtraction
     fn saturating_sub(&self, other: Self) -> Self;
 }
 
@@ -20,8 +26,8 @@ impl ComplexExt<i32, u32> for Complex<i32> {
     /// ```
     /// use idsp::{Complex, ComplexExt};
     /// Complex::<i32>::from_angle(0);
-    /// Complex::<i32>::from_angle(1 << 30);  // pi/2
-    /// Complex::<i32>::from_angle(-1 << 30);  // -pi/2
+    /// Complex::<i32>::from_angle(1 << 30); // pi/2
+    /// Complex::<i32>::from_angle(-1 << 30); // -pi/2
     /// ```
     fn from_angle(angle: i32) -> Self {
         let (c, s) = cossin(angle);
@@ -97,6 +103,7 @@ impl ComplexExt<i32, u32> for Complex<i32> {
 
 /// Full scale fixed point multiplication.
 pub trait MulScaled<T> {
+    /// Scaled multiplication for fixed point
     fn mul_scaled(self, other: T) -> Self;
 }
 
diff --git a/src/filter.rs b/src/filter.rs
index db1a85b..0433cc5 100644
--- a/src/filter.rs
+++ b/src/filter.rs
@@ -1,4 +1,9 @@
+/// Single inpout single output i32 filter
 pub trait Filter {
+    /// Filter configuration type.
+    ///
+    /// While the filter struct owns the state,
+    /// the configuration is decoupled to allow sharing.
     type Config;
     /// Update the filter with a new sample.
     ///
@@ -16,6 +21,9 @@ pub trait Filter {
     fn set(&mut self, x: i32);
 }
 
+/// Nyquist zero
+///
+/// Filter with a flat transfer function and a transfer function zero at Nyquist.
 #[derive(Copy, Clone, Default)]
 pub struct Nyquist(i32);
 impl Filter for Nyquist {
@@ -34,6 +42,7 @@ impl Filter for Nyquist {
     }
 }
 
+/// Repeat another filter
 #[derive(Copy, Clone)]
 pub struct Repeat<const N: usize, T>([T; N]);
 impl<const N: usize, T: Filter> Filter for Repeat<N, T> {
@@ -54,6 +63,7 @@ impl<const N: usize, T: Default + Copy> Default for Repeat<N, T> {
     }
 }
 
+/// Combine two different filters in cascade
 #[derive(Copy, Clone, Default)]
 pub struct Cascade<T, U>(T, U);
 impl<T: Filter, U: Filter> Filter for Cascade<T, U> {
diff --git a/src/hbf.rs b/src/hbf.rs
index 757837f..ad404b5 100644
--- a/src/hbf.rs
+++ b/src/hbf.rs
@@ -1,3 +1,7 @@
+//! Half-band filters and cascades
+//!
+//! Used to perform very efficient high-dynamic range rate changes by powers of two.
+
 use core::{
     iter::Sum,
     ops::{Add, Mul},
@@ -457,12 +461,18 @@ impl Default for HbfDecCascade {
 }
 
 impl HbfDecCascade {
+    /// Set cascade depth
+    ///
+    /// Sets the number of HBF filter stages to apply.
     #[inline]
     pub fn set_depth(&mut self, n: usize) {
         assert!(n <= 4);
         self.depth = n;
     }
 
+    /// Cascade depth
+    ///
+    /// The number of HBF filter stages to apply.
     #[inline]
     pub fn depth(&self) -> usize {
         self.depth
@@ -543,7 +553,7 @@ impl Filter for HbfDecCascade {
 #[derive(Copy, Clone, Debug)]
 pub struct HbfIntCascade {
     depth: usize,
-    pub stages: (
+    stages: (
         HbfInt<
             'static,
             f32,
@@ -586,11 +596,17 @@ impl Default for HbfIntCascade {
 }
 
 impl HbfIntCascade {
+    /// Set cascade depth
+    ///
+    /// Sets the number of HBF filter stages to apply.
     pub fn set_depth(&mut self, n: usize) {
         assert!(n <= 4);
         self.depth = n;
     }
 
+    /// Cascade depth
+    ///
+    /// The number of HBF filter stages to apply.
     pub fn depth(&self) -> usize {
         self.depth
     }
diff --git a/src/iir.rs b/src/iir.rs
deleted file mode 100644
index 3217bc9..0000000
--- a/src/iir.rs
+++ /dev/null
@@ -1,158 +0,0 @@
-use serde::{Deserialize, Serialize};
-
-use super::{abs, copysign, macc};
-use core::iter::Sum;
-use num_traits::{clamp, Float, One, Zero};
-
-/// IIR state and coefficients type.
-///
-/// To represent the IIR state (input and output memory) during the filter update
-/// this contains the three inputs (x0, x1, x2) and the two outputs (y1, y2)
-/// concatenated. Lower indices correspond to more recent samples.
-/// To represent the IIR coefficients, this contains the feed-forward
-/// coefficients (b0, b1, b2) followd by the negated feed-back coefficients
-/// (-a1, -a2), all five normalized such that a0 = 1.
-pub type Vec5<T> = [T; 5];
-
-/// IIR configuration.
-///
-/// Contains the coeeficients `ba`, the output offset `y_offset`, and the
-/// output limits `y_min` and `y_max`. Data is represented in variable precision
-/// floating-point. The dataformat is the same for all internal signals, input
-/// and output.
-///
-/// This implementation achieves several important properties:
-///
-/// * Its transfer function is universal in the sense that any biquadratic
-///   transfer function can be implemented (high-passes, gain limits, second
-///   order integrators with inherent anti-windup, notches etc) without code
-///   changes preserving all features.
-/// * It inherits a universal implementation of "integrator anti-windup", also
-///   and especially in the presence of set-point changes and in the presence
-///   of proportional or derivative gain without any back-off that would reduce
-///   steady-state output range.
-/// * It has universal derivative-kick (undesired, unlimited, and un-physical
-///   amplification of set-point changes by the derivative term) avoidance.
-/// * An offset at the input of an IIR filter (a.k.a. "set-point") is
-///   equivalent to an offset at the output. They are related by the
-///   overall (DC feed-forward) gain of the filter.
-/// * It stores only previous outputs and inputs. These have direct and
-///   invariant interpretation (independent of gains and offsets).
-///   Therefore it can trivially implement bump-less transfer.
-/// * Cascading multiple IIR filters allows stable and robust
-///   implementation of transfer functions beyond bequadratic terms.
-///
-/// # Serialization/Deserialization/Miniconf
-///
-/// `{"y_offset": y_offset, "y_min": y_min, "y_max": y_max, "ba": [b0, b1, b2, a1, a2]}`
-///
-/// * `y0` is the output offset code
-/// * `ym` is the lower saturation limit
-/// * `yM` is the upper saturation limit
-///
-/// IIR filter tap gains (`ba`) are an array `[b0, b1, b2, a1, a2]` such that the
-/// new output is computed as `y0 = a1*y1 + a2*y2 + b0*x0 + b1*x1 + b2*x2`.
-/// The IIR coefficients can be mapped to other transfer function
-/// representations, for example as described in <https://arxiv.org/abs/1508.06319>
-#[derive(Copy, Clone, Debug, Default, Deserialize, Serialize)]
-pub struct IIR<T> {
-    pub ba: Vec5<T>,
-    pub y_offset: T,
-    pub y_min: T,
-    pub y_max: T,
-}
-
-impl<T: Float + Zero + One + Sum<T>> IIR<T> {
-    pub fn new(gain: T, y_min: T, y_max: T) -> Self {
-        let mut ba = [T::zero(); 5];
-        ba[0] = gain;
-        Self {
-            ba,
-            y_offset: T::zero(),
-            y_min,
-            y_max,
-        }
-    }
-
-    /// Configures IIR filter coefficients for proportional-integral behavior
-    /// with gain limit.
-    ///
-    /// # Arguments
-    ///
-    /// * `kp` - Proportional gain. Also defines gain sign.
-    /// * `ki` - Integral gain at Nyquist. Sign taken from `kp`.
-    /// * `g` - Gain limit.
-    pub fn set_pi(&mut self, kp: T, ki: T, g: T) -> Result<(), &str> {
-        let ki = copysign(ki, kp);
-        let g = copysign(g, kp);
-        let (a1, b0, b1) = if abs(ki) < T::epsilon() {
-            (T::zero(), kp, T::zero())
-        } else {
-            let c = if abs(g) < T::epsilon() {
-                T::one()
-            } else {
-                T::one() / (T::one() + ki / g)
-            };
-            let a1 = (T::one() + T::one()) * c - T::one();
-            let b0 = ki * c + kp;
-            let b1 = ki * c - a1 * kp;
-            if abs(b0 + b1) < T::epsilon() {
-                return Err("low integrator gain and/or gain limit");
-            }
-            (a1, b0, b1)
-        };
-        self.ba.copy_from_slice(&[b0, b1, T::zero(), a1, T::zero()]);
-        Ok(())
-    }
-
-    /// Compute the overall (DC feed-forward) gain.
-    pub fn get_k(&self) -> T {
-        self.ba[..3].iter().copied().sum()
-    }
-
-    // /// Compute input-referred (`x`) offset from output (`y`) offset.
-    pub fn get_x_offset(&self) -> Result<T, &str> {
-        let k = self.get_k();
-        if abs(k) < T::epsilon() {
-            Err("k is zero")
-        } else {
-            Ok(self.y_offset / k)
-        }
-    }
-    /// Convert input (`x`) offset to equivalent output (`y`) offset and apply.
-    ///
-    /// # Arguments
-    /// * `xo`: Input (`x`) offset.
-    pub fn set_x_offset(&mut self, xo: T) {
-        self.y_offset = xo * self.get_k();
-    }
-
-    /// Feed a new input value into the filter, update the filter state, and
-    /// return the new output. Only the state `xy` is modified.
-    ///
-    /// # Arguments
-    /// * `xy` - Current filter state.
-    /// * `x0` - New input.
-    pub fn update(&self, xy: &mut Vec5<T>, x0: T, hold: bool) -> T {
-        let n = self.ba.len();
-        debug_assert!(xy.len() == n);
-        // `xy` contains       x0 x1 y0 y1 y2
-        // Increment time      x1 x2 y1 y2 y3
-        // Shift               x1 x1 x2 y1 y2
-        // This unrolls better than xy.rotate_right(1)
-        xy.copy_within(0..n - 1, 1);
-        // Store x0            x0 x1 x2 y1 y2
-        xy[0] = x0;
-        // Compute y0 by multiply-accumulate
-        let y0 = if hold {
-            xy[n / 2 + 1]
-        } else {
-            macc(self.y_offset, xy, &self.ba)
-        };
-        // Limit y0
-        let y0 = clamp(y0, self.y_min, self.y_max);
-        // Store y0            x0 x1 y0 y1 y2
-        xy[n / 2] = y0;
-        y0
-    }
-}
diff --git a/src/iir/biquad.rs b/src/iir/biquad.rs
new file mode 100644
index 0000000..4450913
--- /dev/null
+++ b/src/iir/biquad.rs
@@ -0,0 +1,494 @@
+use num_traits::{AsPrimitive, Float};
+use serde::{Deserialize, Serialize};
+
+use crate::Coefficient;
+
+/// Biquad IIR filter
+///
+/// A biquadratic IIR filter supports up to two zeros and two poles in the transfer function.
+/// It can be used to implement a wide range of responses to input signals.
+///
+/// The Biquad performs the following operation to compute a new output sample `y0` from a new
+/// input sample `x0` given its configuration and previous samples:
+///
+/// `y0 = clamp(b0*x0 + b1*x1 + b2*x2 - a1*y1 - a2*y2 + u, min, max)`
+///
+/// This implementation here saves storage and improves caching opportunities by decoupling
+/// filter configuration (coefficients, limits and offset) from filter state
+/// and thus supports both (a) sharing a single filter between multiple states ("channels") and (b)
+/// rapid switching of filters (tuning, transfer) for a given state without copying either
+/// state of configuration.
+///
+/// # Filter architecture
+///
+/// Direct Form 1 (DF1) and Direct Form 2 transposed (DF2T) are the only IIR filter
+/// structures with an (effective bin the case of TDF2) single summing junction
+/// this allows clamping of the output before feedback.
+///
+/// DF1 allows atomic coefficient change because only inputs and outputs are pipelined.
+/// The summing junctuion pipelining of TDF2 would require incremental
+/// coefficient changes and is thus less amenable to online tuning.
+///
+/// DF2T needs less state storage (2 instead of 4). This is in addition to the coefficient
+/// storage (5 plus 2 limits plus 1 offset)
+///
+/// DF2T is less efficient and accurate for fixed-point architectures as quantization
+/// happens at each intermediate summing junction in addition to the output quantization. This is
+/// especially true for common `i64 + i32 * i32 -> i64` MACC architectures.
+/// One could use wide state storage for fixed point DF2T but that would negate the storage
+/// and processing advantages.
+///
+/// # Coefficients
+///
+/// `ba: [T; 5] = [b0, b1, b2, a1, a2]` is the coefficients type.
+/// To represent the IIR coefficients, this contains the feed-forward
+/// coefficients `b0, b1, b2` followed by the feed-back coefficients
+/// `a1, a2`, all five normalized such that `a0 = 1`.
+///
+/// The summing junction of the filter also receives an offset `u`.
+///
+/// The filter applies clamping such that `min <= y <= max`.
+///
+/// See [`crate::iir::Filter`] and [`crate::iir::Pid`] for ways to generate coefficients.
+///
+/// # Fixed point
+///
+/// Coefficient scaling (see [`Coefficient`]) is fixed and optimized such that -2 is exactly
+/// representable. This is tailored to low-passes, PID, II etc, where the integration rule is
+/// [1, -2, 1].
+///
+/// There are two guard bits in the accumulator before clamping/limiting.
+/// While this isn't enough to cover the worst case accumulator, it does catch many real world
+/// overflow cases.
+///
+/// # State
+///
+/// To represent the IIR state (input and output memory) during [`Biquad::update()`]
+/// the DF1 state contains the two previous inputs and output `[x1, x2, y1, y2]`
+/// concatenated. Lower indices correspond to more recent samples.
+///
+/// In the DF2T case the state contains `[b1*x1 + b2*x2 - a1*y1 - a2*y2, b2*x1 - a2*y1]`
+///
+/// In the DF1 case with first order noise shaping, the state contains `[x1, x2, y1, y2, e1]`
+/// where `e0` is the accumulated quantization error.
+///
+/// # PID controller
+///
+/// The IIR coefficients can be mapped to other transfer function
+/// representations, for example PID controllers as described in
+/// <https://hackmd.io/IACbwcOTSt6Adj3_F9bKuw> and
+/// <https://arxiv.org/abs/1508.06319>.
+///
+/// Using a Biquad as a template for a PID controller achieves several important properties:
+///
+/// * Its transfer function is universal in the sense that any biquadratic
+///   transfer function can be implemented (high-passes, gain limits, second
+///   order integrators with inherent anti-windup, notches etc) without code
+///   changes preserving all features.
+/// * It inherits a universal implementation of "integrator anti-windup", also
+///   and especially in the presence of set-point changes and in the presence
+///   of proportional or derivative gain without any back-off that would reduce
+///   steady-state output range.
+/// * It has universal derivative-kick (undesired, unlimited, and un-physical
+///   amplification of set-point changes by the derivative term) avoidance.
+/// * An offset at the input of an IIR filter (a.k.a. "set-point") is
+///   equivalent to an offset at the summing junction (in output units).
+///   They are related by the overall (DC feed-forward) gain of the filter.
+/// * It stores only previous outputs and inputs. These have direct and
+///   invariant interpretation (independent of coefficients and offset).
+///   Therefore it can trivially implement bump-less transfer between any
+///   coefficients/offset sets.
+/// * Cascading multiple IIR filters allows stable and robust
+///   implementation of transfer functions beyond bequadratic terms.
+#[derive(Copy, Clone, Debug, Deserialize, Serialize, PartialEq, PartialOrd)]
+pub struct Biquad<T> {
+    ba: [T; 5],
+    u: T,
+    min: T,
+    max: T,
+}
+
+impl<T: Coefficient> Default for Biquad<T> {
+    fn default() -> Self {
+        Self {
+            ba: [T::ZERO; 5],
+            u: T::ZERO,
+            min: T::MIN,
+            max: T::MAX,
+        }
+    }
+}
+
+impl<T: Coefficient> From<[T; 5]> for Biquad<T> {
+    fn from(ba: [T; 5]) -> Self {
+        Self {
+            ba,
+            ..Default::default()
+        }
+    }
+}
+
+impl<T, C> From<&[C; 6]> for Biquad<T>
+where
+    T: Coefficient + AsPrimitive<C>,
+    C: Float + AsPrimitive<T>,
+{
+    fn from(ba: &[C; 6]) -> Self {
+        let ia0 = C::one() / ba[3];
+        Self::from([
+            T::quantize(ba[0] * ia0),
+            T::quantize(ba[1] * ia0),
+            T::quantize(ba[2] * ia0),
+            // b[3]: a0*ia0
+            T::quantize(ba[4] * ia0),
+            T::quantize(ba[5] * ia0),
+        ])
+    }
+}
+
+impl<T, C> From<&Biquad<T>> for [C; 6]
+where
+    T: Coefficient + AsPrimitive<C>,
+    C: 'static + Copy,
+{
+    fn from(value: &Biquad<T>) -> Self {
+        let ba = value.ba();
+        [
+            ba[0].as_(),
+            ba[1].as_(),
+            ba[2].as_(),
+            T::ONE.as_(),
+            ba[3].as_(),
+            ba[4].as_(),
+        ]
+    }
+}
+
+impl<T: Coefficient> Biquad<T> {
+    /// A "hold" filter that ingests input and maintains output
+    ///
+    /// ```
+    /// # use idsp::iir::*;
+    /// let mut xy = [0.0, 1.0, 2.0, 3.0];
+    /// let x0 = 7.0;
+    /// let y0 = Biquad::HOLD.update(&mut xy, x0);
+    /// assert_eq!(y0, 2.0);
+    /// assert_eq!(xy, [x0, 0.0, y0, y0]);
+    /// ```
+    pub const HOLD: Self = Self {
+        ba: [T::ZERO, T::ZERO, T::ZERO, T::NEG_ONE, T::ZERO],
+        u: T::ZERO,
+        min: T::MIN,
+        max: T::MAX,
+    };
+
+    /// A unity gain filter
+    ///
+    /// ```
+    /// # use idsp::iir::*;
+    /// let x0 = 3.0;
+    /// let y0 = Biquad::IDENTITY.update(&mut [0.0; 4], x0);
+    /// assert_eq!(y0, x0);
+    /// ```
+    pub const IDENTITY: Self = Self::proportional(T::ONE);
+
+    /// A filter with the given proportional gain at all frequencies
+    ///
+    /// ```
+    /// # use idsp::iir::*;
+    /// let x0 = 2.0;
+    /// let k = 5.0;
+    /// let y0 = Biquad::proportional(k).update(&mut [0.0; 4], x0);
+    /// assert_eq!(y0, x0 * k);
+    /// ```
+    pub const fn proportional(k: T) -> Self {
+        Self {
+            ba: [k, T::ZERO, T::ZERO, T::ZERO, T::ZERO],
+            u: T::ZERO,
+            min: T::MIN,
+            max: T::MAX,
+        }
+    }
+
+    /// Filter coefficients
+    ///
+    /// IIR filter tap gains (`ba`) are an array `[b0, b1, b2, a1, a2]` such that
+    /// [`Biquad::update(&mut xy, x0)`] returns
+    /// `y0 = clamp(b0*x0 + b1*x1 + b2*x2 - a1*y1 - a2*y2 + u, min, max)`.
+    ///
+    /// ```
+    /// # use idsp::Coefficient;
+    /// # use idsp::iir::*;
+    /// assert_eq!(Biquad::<i32>::IDENTITY.ba()[0], <i32 as Coefficient>::ONE);
+    /// assert_eq!(Biquad::<i32>::HOLD.ba()[3], -<i32 as Coefficient>::ONE);
+    /// ```
+    pub fn ba(&self) -> &[T; 5] {
+        &self.ba
+    }
+
+    /// Mutable reference to the filter coefficients.
+    ///
+    /// See [`Biquad::ba()`].
+    ///
+    /// ```
+    /// # use idsp::Coefficient;
+    /// # use idsp::iir::*;
+    /// let mut i = Biquad::default();
+    /// i.ba_mut()[0] = <i32 as Coefficient>::ONE;
+    /// assert_eq!(i, Biquad::IDENTITY);
+    /// ```
+    pub fn ba_mut(&mut self) -> &mut [T; 5] {
+        &mut self.ba
+    }
+
+    /// Summing junction offset
+    ///
+    /// This offset is applied to the output `y0` summing junction
+    /// on top of the feed-forward (`b`) and feed-back (`a`) terms.
+    /// The feedback samples are taken at the summing junction and
+    /// thus also include (and feed back) this offset.
+    pub fn u(&self) -> T {
+        self.u
+    }
+
+    /// Set the summing junction offset
+    ///
+    /// See [`Biquad::u()`].
+    ///
+    /// ```
+    /// # use idsp::iir::*;
+    /// let mut i = Biquad::default();
+    /// i.set_u(5);
+    /// assert_eq!(i.update(&mut [0; 4], 0), 5);
+    /// ```
+    pub fn set_u(&mut self, u: T) {
+        self.u = u;
+    }
+
+    /// Lower output limit
+    ///
+    /// Guaranteed minimum output value.
+    /// The value is inclusive.
+    /// The clamping also cleanly affects the feedback terms.
+    ///
+    /// For fixed point types, during the comparison,
+    /// the lowest two bits of value and limit are truncated.
+    ///
+    /// ```
+    /// # use idsp::iir::*;
+    /// assert_eq!(Biquad::<i32>::default().min(), i32::MIN);
+    /// ```
+    pub fn min(&self) -> T {
+        self.min
+    }
+
+    /// Set the lower output limit
+    ///
+    /// See [`Biquad::min()`].
+    ///
+    /// ```
+    /// # use idsp::iir::*;
+    /// let mut i = Biquad::default();
+    /// i.set_min(4);
+    /// assert_eq!(i.update(&mut [0; 4], 0), 4);
+    /// ```
+    pub fn set_min(&mut self, min: T) {
+        self.min = min;
+    }
+
+    /// Upper output limit
+    ///
+    /// Guaranteed maximum output value.
+    /// The value is inclusive.
+    /// The clamping also cleanly affects the feedback terms.
+    ///
+    /// For fixed point types, during the comparison,
+    /// the lowest two bits of value and limit are truncated.
+    /// The behavior is as if those two bits were 0 in the case
+    /// of `min` and one in the case of `max`.
+    ///
+    /// ```
+    /// # use idsp::iir::*;
+    /// assert_eq!(Biquad::<i32>::default().max(), i32::MAX);
+    /// ```
+    pub fn max(&self) -> T {
+        self.max
+    }
+
+    /// Set the upper output limit
+    ///
+    /// See [`Biquad::max()`].
+    ///
+    /// ```
+    /// # use idsp::iir::*;
+    /// let mut i = Biquad::default();
+    /// i.set_max(-5);
+    /// assert_eq!(i.update(&mut [0; 4], 0), -5);
+    /// ```
+    pub fn set_max(&mut self, max: T) {
+        self.max = max;
+    }
+
+    /// Compute the overall (DC/proportional feed-forward) gain.
+    ///
+    /// ```
+    /// # use idsp::iir::*;
+    /// assert_eq!(Biquad::proportional(3.0).forward_gain(), 3.0);
+    /// ```
+    ///
+    /// # Returns
+    /// The sum of the `b` feed-forward coefficients.
+    pub fn forward_gain(&self) -> T {
+        self.ba[0] + self.ba[1] + self.ba[2]
+    }
+
+    /// Compute input-referred (`x`) offset.
+    ///
+    /// ```
+    /// # use idsp::Coefficient;
+    /// # use idsp::iir::*;
+    /// let mut i = Biquad::proportional(3);
+    /// i.set_u(3);
+    /// assert_eq!(i.input_offset(), <i32 as Coefficient>::ONE);
+    /// ```
+    pub fn input_offset(&self) -> T {
+        self.u.div_scaled(self.forward_gain())
+    }
+
+    /// Convert input (`x`) offset to equivalent summing junction offset (`u`) and apply.
+    ///
+    /// In the case of a "PID" controller the response behavior of the controller
+    /// to the offset is "stabilizing", and not "tracking": its frequency response
+    /// is exclusively according to the lowest non-zero [`crate::iir::Action`] gain.
+    /// There is no high order ("faster") response as would be the case for a "tracking"
+    /// controller.
+    ///
+    /// ```
+    /// # use idsp::iir::*;
+    /// let mut i = Biquad::proportional(3.0);
+    /// i.set_input_offset(2.0);
+    /// let x0 = 0.5;
+    /// let y0 = i.update(&mut [0.0; 4], x0);
+    /// assert_eq!(y0, (x0 + i.input_offset()) * i.forward_gain());
+    /// ```
+    ///
+    /// ```
+    /// # use idsp::Coefficient;
+    /// # use idsp::iir::*;
+    /// let mut i = Biquad::proportional(-<i32 as Coefficient>::ONE);
+    /// i.set_input_offset(1);
+    /// assert_eq!(i.u(), -1);
+    /// ```
+    ///
+    /// # Arguments
+    /// * `offset`: Input (`x`) offset.
+    pub fn set_input_offset(&mut self, offset: T) {
+        self.u = offset.mul_scaled(self.forward_gain());
+    }
+
+    /// Direct Form 1 Update
+    ///
+    /// Ingest a new input value into the filter, update the filter state, and
+    /// return the new output. Only the state `xy` is modified.
+    ///
+    /// ## `N=4` Direct Form 1
+    ///
+    /// `xy` contains:
+    /// * On entry: `[x1, x2, y1, y2]`
+    /// * On exit:  `[x0, x1, y0, y1]`
+    ///
+    /// ```
+    /// # use idsp::iir::*;
+    /// let mut xy = [0.0, 1.0, 2.0, 3.0];
+    /// let x0 = 4.0;
+    /// let y0 = Biquad::IDENTITY.update(&mut xy, x0);
+    /// assert_eq!(y0, x0);
+    /// assert_eq!(xy, [x0, 0.0, y0, 2.0]);
+    /// ```
+    ///
+    /// ## `N=5` Direct Form 1 with first order noise shaping
+    ///
+    /// ```
+    /// # use idsp::iir::*;
+    /// let mut xy = [1, 2, 3, 4, 5];
+    /// let x0 = 6;
+    /// let y0 = Biquad::IDENTITY.update(&mut xy, x0);
+    /// assert_eq!(y0, x0);
+    /// assert_eq!(xy, [x0, 1, y0, 3, 5]);
+    /// ```
+    ///
+    /// `xy` contains:
+    /// * On entry: `[x1, x2, y1, y2, e1]`
+    /// * On exit:  `[x0, x1, y0, y1, e0]`
+    ///
+    /// Note: This is only useful for fixed point filters.
+    ///
+    /// ## `N=2` Direct Form 2 transposed
+    ///
+    /// Note: This is only useful for floating point filters.
+    /// Don't use this for fixed point: Quantization happens at each state store operation.
+    /// Ideally the state would be `[T::ACCU; 2]` but then for fixed point it would use equal amount
+    /// of storage compared to DF1 for no gain in performance and loss in functionality.
+    /// There are also no guard bits here.
+    ///
+    /// `xy` contains:
+    /// * On entry: `[b1*x1 + b2*x2 - a1*y1 - a2*y2, b2*x1 - a2*y1]`
+    /// * On exit:  `[b1*x0 + b2*x1 - a1*y0 - a2*y1, b2*x0 - a2*y0]`
+    ///
+    /// ```
+    /// # use idsp::iir::*;
+    /// let mut xy = [0.0, 1.0];
+    /// let x0 = 3.0;
+    /// let y0 = Biquad::IDENTITY.update(&mut xy, x0);
+    /// assert_eq!(y0, x0);
+    /// assert_eq!(xy, [1.0, 0.0]);
+    /// ```
+    ///
+    /// # Arguments
+    /// * `xy` - Current filter state.
+    /// * `x0` - New input.
+    ///
+    /// # Returns
+    /// The new output `y0 = clamp(b0*x0 + b1*x1 + b2*x2 - a1*y1 - a2*y2 + u, min, max)`
+    pub fn update<const N: usize>(&self, xy: &mut [T; N], x0: T) -> T {
+        match N {
+            // DF1
+            4 => {
+                let s = self.ba[0].as_() * x0.as_()
+                    + self.ba[1].as_() * xy[0].as_()
+                    + self.ba[2].as_() * xy[1].as_()
+                    - self.ba[3].as_() * xy[2].as_()
+                    - self.ba[4].as_() * xy[3].as_();
+                let (y0, _) = self.u.macc(s, self.min, self.max, T::ZERO);
+                xy[1] = xy[0];
+                xy[0] = x0;
+                xy[3] = xy[2];
+                xy[2] = y0;
+                y0
+            }
+            // DF1 with noise shaping for fixed point
+            5 => {
+                let s = self.ba[0].as_() * x0.as_()
+                    + self.ba[1].as_() * xy[0].as_()
+                    + self.ba[2].as_() * xy[1].as_()
+                    - self.ba[3].as_() * xy[2].as_()
+                    - self.ba[4].as_() * xy[3].as_();
+                let (y0, e0) = self.u.macc(s, self.min, self.max, xy[4]);
+                xy[4] = e0;
+                xy[1] = xy[0];
+                xy[0] = x0;
+                xy[3] = xy[2];
+                xy[2] = y0;
+                y0
+            }
+            // DF2T for floating point
+            2 => {
+                let y0 = (xy[0] + self.ba[0].mul_scaled(x0)).clip(self.min, self.max);
+                xy[0] = xy[1] + self.ba[1].mul_scaled(x0) - self.ba[3].mul_scaled(y0);
+                xy[1] = self.u + self.ba[2].mul_scaled(x0) - self.ba[4].mul_scaled(y0);
+                y0
+            }
+            _ => unimplemented!(),
+        }
+    }
+}
diff --git a/src/iir/coefficients.rs b/src/iir/coefficients.rs
new file mode 100644
index 0000000..a0d096e
--- /dev/null
+++ b/src/iir/coefficients.rs
@@ -0,0 +1,628 @@
+use num_traits::{AsPrimitive, Float, FloatConst};
+use serde::{Deserialize, Serialize};
+
+#[derive(Copy, Clone, Debug, PartialEq, PartialOrd, Serialize, Deserialize)]
+enum Shape<T> {
+    /// Inverse Q, sqrt(2) for critical
+    InverseQ(T),
+    /// Relative bandwidth in octaves
+    Bandwidth(T),
+    /// Slope steepnes, 1 for critical
+    Slope(T),
+}
+
+impl<T: Float + FloatConst> Default for Shape<T> {
+    fn default() -> Self {
+        Self::InverseQ(T::SQRT_2())
+    }
+}
+
+/// Standard audio biquad filter builder
+///
+/// <https://www.w3.org/TR/audio-eq-cookbook/>
+#[derive(Copy, Clone, Debug, PartialEq, PartialOrd, Serialize, Deserialize)]
+pub struct Filter<T> {
+    /// Angular critical frequency (in units of sampling frequency)
+    /// Corner frequency, or 3dB cutoff frequency,
+    w0: T,
+    /// Passband gain
+    gain: T,
+    /// Shelf gain (only for peaking, lowshelf, highshelf)
+    /// Relative to passband gain
+    shelf: T,
+    /// Inverse Q
+    shape: Shape<T>,
+}
+
+impl<T: Float + FloatConst> Default for Filter<T> {
+    fn default() -> Self {
+        Self {
+            w0: T::zero(),
+            gain: T::one(),
+            shape: Shape::default(),
+            shelf: T::one(),
+        }
+    }
+}
+
+impl<T> Filter<T>
+where
+    T: 'static + Float + FloatConst,
+    f32: AsPrimitive<T>,
+{
+    /// Set crititcal frequency from absolute units.
+    ///
+    /// # Arguments
+    /// * `critical_frequency`: "Relevant" or "corner" or "center" frequency
+    ///   in the same units as `sample_frequency`
+    /// * `sample_frequency`: The sample frequency in the same units as `critical_frequency`.
+    ///   E.g. both in SI Hertz or `rad/s`.
+    pub fn frequency(&mut self, critical_frequency: T, sample_frequency: T) -> &mut Self {
+        self.critical_frequency(critical_frequency / sample_frequency)
+    }
+
+    /// Set relative critical frequency
+    ///
+    /// # Arguments
+    /// * `f0`: Relative critical frequency in units of the sample frequency.
+    ///   Must be `0 <= f0 <= 0.5`.
+    pub fn critical_frequency(&mut self, f0: T) -> &mut Self {
+        self.angular_critical_frequency(T::TAU() * f0)
+    }
+
+    /// Set relative critical angular frequency
+    ///
+    /// # Arguments
+    /// * `w0`: Relative critical angular frequency.
+    ///   Must be `0 <= w0 <= π`. Defaults to `0.0`.
+    pub fn angular_critical_frequency(&mut self, w0: T) -> &mut Self {
+        self.w0 = w0;
+        self
+    }
+
+    /// Set reference gain
+    ///
+    /// # Arguments
+    /// * `k`: Linear reference gain. Defaults to `1.0`.
+    pub fn gain(&mut self, k: T) -> &mut Self {
+        self.gain = k;
+        self
+    }
+
+    /// Set reference gain in dB
+    ///
+    /// # Arguments
+    /// * `k_db`: Reference gain in dB. Defaults to `0.0`.
+    pub fn gain_db(&mut self, k_db: T) -> &mut Self {
+        self.gain(10.0.as_().powf(k_db / 20.0.as_()))
+    }
+
+    /// Set linear shelf gain
+    ///
+    /// Used only for `peaking`, `highshelf`, `lowshelf` filters.
+    ///
+    /// # Arguments
+    /// * `a`: Linear shelf gain. Defaults to `1.0`.
+    pub fn shelf(&mut self, a: T) -> &mut Self {
+        self.shelf = a;
+        self
+    }
+
+    /// Set shelf gain in dB
+    ///
+    /// Used only for `peaking`, `highshelf`, `lowshelf` filters.
+    ///
+    /// # Arguments
+    /// * `a_db`: Linear shelf gain. Defaults to `0.0`.
+    pub fn shelf_db(&mut self, a_db: T) -> &mut Self {
+        self.shelf(10.0.as_().powf(a_db / 20.0.as_()))
+    }
+
+    /// Set inverse Q parameter of the filter
+    ///
+    /// The inverse "steepness"/"narrowness" of the filter transition.
+    /// Defaults `sqrt(2)` which is as steep as possible without overshoot.
+    ///
+    /// # Arguments
+    /// * `qi`: Inverse Q parameter.
+    pub fn inverse_q(&mut self, qi: T) -> &mut Self {
+        self.shape = Shape::InverseQ(qi);
+        self
+    }
+
+    /// Set Q parameter of the filter
+    ///
+    /// The "steepness"/"narrowness" of the filter transition.
+    /// Defaults `1/sqrt(2)` which is as steep as possible without overshoot.
+    ///
+    /// This affects the same parameter as `bandwidth()` and `shelf_slope()`.
+    /// Use only one of them.
+    ///
+    /// # Arguments
+    /// * `q`: Q parameter.
+    pub fn q(&mut self, q: T) -> &mut Self {
+        self.inverse_q(T::one() / q)
+    }
+
+    /// Set the relative bandwidth
+    ///
+    /// This affects the same parameter as `inverse_q()` and `shelf_slope()`.
+    /// Use only one of them.
+    ///
+    /// # Arguments
+    /// * `bw`: Bandwidth in octaves
+    pub fn bandwidth(&mut self, bw: T) -> &mut Self {
+        self.shape = Shape::Bandwidth(bw);
+        self
+    }
+
+    /// Set the shelf slope.
+    ///
+    /// This affects the same parameter as `inverse_q()` and `bandwidth()`.
+    /// Use only one of them.
+    ///
+    /// # Arguments
+    /// * `s`: Shelf slope. A slope of `1.0` is maximally steep without overshoot.
+    pub fn shelf_slope(&mut self, s: T) -> &mut Self {
+        self.shape = Shape::Slope(s);
+        self
+    }
+
+    /// Get inverse Q
+    fn qi(&self) -> T {
+        match self.shape {
+            Shape::InverseQ(qi) => qi,
+            Shape::Bandwidth(bw) => {
+                2.0.as_() * (T::LN_2() / 2.0.as_() * bw * self.w0 / self.w0.sin()).sinh()
+            }
+            Shape::Slope(s) => {
+                ((self.gain + T::one() / self.gain) * (T::one() / s - T::one()) + 2.0.as_()).sqrt()
+            }
+        }
+    }
+
+    /// Get (cos(w0), alpha=sin(w0)/(2*q))
+    fn fcos_alpha(&self) -> (T, T) {
+        let (fsin, fcos) = self.w0.sin_cos();
+        (fcos, 0.5.as_() * fsin * self.qi())
+    }
+
+    /// Low pass filter
+    ///
+    /// Builds second order biquad low pass filter coefficients.
+    ///
+    /// ```
+    /// use idsp::iir::*;
+    /// let ba = Filter::default()
+    ///     .critical_frequency(0.1)
+    ///     .gain(1000.0)
+    ///     .lowpass();
+    /// let iir = Biquad::<i32>::from(&ba);
+    /// let mut xy = [0; 4];
+    /// let x = vec![3, -4, 5, 7, -3, 2];
+    /// let y: Vec<_> = x.iter().map(|x0| iir.update(&mut xy, *x0)).collect();
+    /// assert_eq!(y, [5, 3, 9, 25, 42, 49]);
+    /// ```
+    pub fn lowpass(&self) -> [T; 6] {
+        let (fcos, alpha) = self.fcos_alpha();
+        let b = self.gain * 0.5.as_() * (T::one() - fcos);
+        [
+            b,
+            (2.0).as_() * b,
+            b,
+            T::one() + alpha,
+            (-2.0).as_() * fcos,
+            T::one() - alpha,
+        ]
+    }
+
+    /// High pass filter
+    ///
+    /// Builds second order biquad high pass filter coefficients.
+    ///
+    /// ```
+    /// use idsp::iir::*;
+    /// let ba = Filter::default()
+    ///     .critical_frequency(0.1)
+    ///     .gain(1000.0)
+    ///     .highpass();
+    /// let iir = Biquad::<i32>::from(&ba);
+    /// let mut xy = [0; 4];
+    /// let x = vec![3, -4, 5, 7, -3, 2];
+    /// let y: Vec<_> = x.iter().map(|x0| iir.update(&mut xy, *x0)).collect();
+    /// assert_eq!(y, [5, -9, 11, 12, -1, 17]);
+    /// ```
+    pub fn highpass(&self) -> [T; 6] {
+        let (fcos, alpha) = self.fcos_alpha();
+        let b = self.gain * 0.5.as_() * (T::one() + fcos);
+        [
+            b,
+            (-2.0).as_() * b,
+            b,
+            T::one() + alpha,
+            (-2.0).as_() * fcos,
+            T::one() - alpha,
+        ]
+    }
+
+    /// Band pass
+    ///
+    /// ```
+    /// use idsp::iir::*;
+    /// let ba = Filter::default()
+    ///     .frequency(1000.0, 48e3)
+    ///     .q(5.0)
+    ///     .gain_db(3.0)
+    ///     .bandpass();
+    /// println!("{ba:?}");
+    /// ```
+    pub fn bandpass(&self) -> [T; 6] {
+        let (fcos, alpha) = self.fcos_alpha();
+        let b = self.gain * alpha;
+        [
+            b,
+            T::zero(),
+            -b,
+            T::one() + alpha,
+            (-2.0).as_() * fcos,
+            T::one() - alpha,
+        ]
+    }
+
+    /// A notch filter
+    ///
+    /// Has zero gain at the critical frequency.
+    pub fn notch(&self) -> [T; 6] {
+        let (fcos, alpha) = self.fcos_alpha();
+        let f2 = (-2.0).as_() * fcos;
+        [
+            self.gain,
+            f2 * self.gain,
+            self.gain,
+            T::one() + alpha,
+            f2,
+            T::one() - alpha,
+        ]
+    }
+
+    /// An allpass filter
+    ///
+    /// Has constant `gain` at all frequency but a variable phase shift.
+    pub fn allpass(&self) -> [T; 6] {
+        let (fcos, alpha) = self.fcos_alpha();
+        let f2 = (-2.0).as_() * fcos;
+        [
+            (T::one() - alpha) * self.gain,
+            f2 * self.gain,
+            (T::one() + alpha) * self.gain,
+            T::one() + alpha,
+            f2,
+            T::one() - alpha,
+        ]
+    }
+
+    /// A peaking/dip filter
+    ///
+    /// Has `gain*shelf_gain` at critical frequency and `gain` elsewhere.
+    pub fn peaking(&self) -> [T; 6] {
+        let (fcos, alpha) = self.fcos_alpha();
+        let s = self.shelf.sqrt();
+        let f2 = (-2.0).as_() * fcos;
+        [
+            (T::one() + alpha * s) * self.gain,
+            f2 * self.gain,
+            (T::one() - alpha * s) * self.gain,
+            T::one() + alpha / s,
+            f2,
+            T::one() - alpha / s,
+        ]
+    }
+
+    /// Low shelf
+    ///
+    /// Approaches `gain*shelf_gain` below critical frequency and `gain` above.
+    ///
+    /// ```
+    /// use idsp::iir::*;
+    /// let ba = Filter::default()
+    ///     .frequency(1000.0, 48e3)
+    ///     .shelf_slope(2.0)
+    ///     .shelf_db(20.0)
+    ///     .lowshelf();
+    /// println!("{ba:?}");
+    /// ```
+    pub fn lowshelf(&self) -> [T; 6] {
+        let (fcos, alpha) = self.fcos_alpha();
+        let s = self.shelf.sqrt();
+        let tsa = 2.0.as_() * s.sqrt() * alpha;
+        let sp1 = s + T::one();
+        let sm1 = s - T::one();
+        [
+            s * self.gain * (sp1 - sm1 * fcos + tsa),
+            2.0.as_() * s * self.gain * (sm1 - sp1 * fcos),
+            s * self.gain * (sp1 - sm1 * fcos - tsa),
+            sp1 + sm1 * fcos + tsa,
+            (-2.0).as_() * (sm1 + sp1 * fcos),
+            sp1 + sm1 * fcos - tsa,
+        ]
+    }
+
+    /// Low shelf
+    ///
+    /// Approaches `gain*shelf_gain` above critical frequency and `gain` below.
+    pub fn highshelf(&self) -> [T; 6] {
+        let (fcos, alpha) = self.fcos_alpha();
+        let s = self.shelf.sqrt();
+        let tsa = 2.0.as_() * s.sqrt() * alpha;
+        let sp1 = s + T::one();
+        let sm1 = s - T::one();
+        [
+            s * self.gain * (sp1 + sm1 * fcos + tsa),
+            (-2.0).as_() * s * self.gain * (sm1 + sp1 * fcos),
+            s * self.gain * (sp1 + sm1 * fcos - tsa),
+            sp1 - sm1 * fcos + tsa,
+            2.0.as_() * (sm1 - sp1 * fcos),
+            sp1 - sm1 * fcos - tsa,
+        ]
+    }
+
+    /// I/HO
+    ///
+    /// Notch, integrating below, flat `shelf_gain` above
+    pub fn iho(&self) -> [T; 6] {
+        let (fcos, alpha) = self.fcos_alpha();
+        let fsin = 0.5.as_() * self.w0.sin();
+        let a = (T::one() + fcos) / (2.0.as_() * self.shelf);
+        [
+            self.gain * (T::one() + alpha),
+            (-2.0).as_() * self.gain * fcos,
+            self.gain * (T::one() - alpha),
+            a + fsin,
+            (-2.0).as_() * a,
+            a - fsin,
+        ]
+    }
+}
+
+// TODO
+// SOS cascades:
+// butterworth
+// elliptic
+// chebychev1/2
+// bessel
+
+#[cfg(test)]
+mod test {
+    use super::*;
+
+    use core::f64;
+    use num_complex::Complex64;
+
+    use crate::iir::*;
+
+    #[test]
+    #[ignore]
+    fn lowpass_noise_shaping() {
+        let ba = Biquad::<i32>::from(
+            &Filter::default()
+                .critical_frequency(1e-5f64)
+                .gain(1e3)
+                .lowpass(),
+        );
+        println!("{:?}", ba);
+        let mut xy = [0; 5];
+        for _ in 0..(1 << 24) {
+            ba.update(&mut xy, 1);
+        }
+        for _ in 0..10 {
+            ba.update(&mut xy, 1);
+            println!("{xy:?}");
+        }
+    }
+
+    fn polyval(p: &[f64], x: Complex64) -> Complex64 {
+        p.iter()
+            .fold(
+                (Complex64::default(), Complex64::new(1.0, 0.0)),
+                |(a, xi), pi| (a + xi * *pi, xi * x),
+            )
+            .0
+    }
+
+    fn freqz(b: &[f64], a: &[f64], f: f64) -> Complex64 {
+        let z = Complex64::new(0.0, -f64::consts::TAU * f).exp();
+        polyval(b, z) / polyval(a, z)
+    }
+
+    #[derive(Copy, Clone, Debug, PartialEq, PartialOrd)]
+    enum Tol {
+        GainDb(f64, f64),
+        GainBelowDb(f64),
+    }
+    impl Tol {
+        fn check(&self, h: Complex64) -> bool {
+            let g = 10.0 * h.norm_sqr().log10();
+            match self {
+                Self::GainDb(want, tol) => (g - want).abs() <= *tol,
+                Self::GainBelowDb(want) => g <= *want,
+            }
+        }
+    }
+
+    fn check_freqz(f: f64, g: Tol, ba: &[f64; 6]) {
+        let h = freqz(&ba[..3], &ba[3..], f);
+        let hp = h.to_polar();
+        assert!(
+            g.check(h),
+            "freq {f}: response {h}={hp:?} does not meet {g:?}"
+        );
+    }
+
+    fn check_transfer(ba: &[f64; 6], fg: &[(f64, Tol)]) {
+        println!("{ba:?}");
+
+        for (f, g) in fg {
+            check_freqz(*f, *g, ba);
+        }
+
+        // Quantize and back
+        let bai = (&Biquad::<i32>::from(ba)).into();
+        println!("{bai:?}");
+
+        for (f, g) in fg {
+            check_freqz(*f, *g, &bai);
+        }
+    }
+
+    #[test]
+    fn lowpass() {
+        check_transfer(
+            &Filter::default()
+                .critical_frequency(0.01)
+                .gain_db(20.0)
+                .lowpass(),
+            &[
+                (1e-3, Tol::GainDb(20.0, 0.01)),
+                (0.01, Tol::GainDb(17.0, 0.02)),
+                (4e-1, Tol::GainBelowDb(-40.0)),
+            ],
+        );
+    }
+
+    #[test]
+    fn highpass() {
+        check_transfer(
+            &Filter::default()
+                .critical_frequency(0.1)
+                .gain_db(-2.0)
+                .highpass(),
+            &[
+                (1e-3, Tol::GainBelowDb(-40.0)),
+                (0.1, Tol::GainDb(-5.0, 0.02)),
+                (4e-1, Tol::GainDb(-2.0, 0.01)),
+            ],
+        );
+    }
+
+    #[test]
+    fn bandpass() {
+        check_transfer(
+            &Filter::default()
+                .critical_frequency(0.02)
+                .bandwidth(2.0)
+                .gain_db(3.0)
+                .bandpass(),
+            &[
+                (1e-4, Tol::GainBelowDb(-35.0)),
+                (0.01, Tol::GainDb(0.0, 0.02)),
+                (0.02, Tol::GainDb(3.0, 0.01)),
+                (0.04, Tol::GainDb(0.0, 0.04)),
+                (4e-1, Tol::GainBelowDb(-25.0)),
+            ],
+        );
+    }
+
+    #[test]
+    fn allpass() {
+        check_transfer(
+            &Filter::default()
+                .critical_frequency(0.02)
+                .gain_db(-10.0)
+                .allpass(),
+            &[
+                (1e-4, Tol::GainDb(-10.0, 0.01)),
+                (0.01, Tol::GainDb(-10.0, 0.01)),
+                (0.02, Tol::GainDb(-10.0, 0.01)),
+                (0.04, Tol::GainDb(-10.0, 0.01)),
+                (4e-1, Tol::GainDb(-10.0, 0.01)),
+            ],
+        );
+    }
+
+    #[test]
+    fn notch() {
+        check_transfer(
+            &Filter::default()
+                .critical_frequency(0.02)
+                .bandwidth(2.0)
+                .notch(),
+            &[
+                (1e-4, Tol::GainDb(0.0, 0.01)),
+                (0.01, Tol::GainDb(-3.0, 0.02)),
+                (0.02, Tol::GainBelowDb(-140.0)),
+                (0.04, Tol::GainDb(-3.0, 0.02)),
+                (4e-1, Tol::GainDb(0.0, 0.01)),
+            ],
+        );
+    }
+
+    #[test]
+    fn peaking() {
+        check_transfer(
+            &Filter::default()
+                .critical_frequency(0.02)
+                .bandwidth(2.0)
+                .gain_db(-10.0)
+                .shelf_db(20.0)
+                .peaking(),
+            &[
+                (1e-4, Tol::GainDb(-10.0, 0.01)),
+                (0.01, Tol::GainDb(0.0, 0.04)),
+                (0.02, Tol::GainDb(10.0, 0.01)),
+                (0.04, Tol::GainDb(0.0, 0.04)),
+                (4e-1, Tol::GainDb(-10.0, 0.05)),
+            ],
+        );
+    }
+
+    #[test]
+    fn highshelf() {
+        check_transfer(
+            &Filter::default()
+                .critical_frequency(0.02)
+                .gain_db(-10.0)
+                .shelf_db(-20.0)
+                .highshelf(),
+            &[
+                (1e-6, Tol::GainDb(-10.0, 0.01)),
+                (1e-4, Tol::GainDb(-10.0, 0.01)),
+                (0.02, Tol::GainDb(-20.0, 0.01)),
+                (4e-1, Tol::GainDb(-30.0, 0.01)),
+            ],
+        );
+    }
+
+    #[test]
+    fn lowshelf() {
+        check_transfer(
+            &Filter::default()
+                .critical_frequency(0.02)
+                .gain_db(-10.0)
+                .shelf_db(-20.0)
+                .lowshelf(),
+            &[
+                (1e-6, Tol::GainDb(-30.0, 0.01)),
+                (1e-4, Tol::GainDb(-30.0, 0.01)),
+                (0.02, Tol::GainDb(-20.0, 0.01)),
+                (4e-1, Tol::GainDb(-10.0, 0.01)),
+            ],
+        );
+    }
+
+    #[test]
+    fn iho() {
+        check_transfer(
+            &Filter::default()
+                .critical_frequency(0.01)
+                .gain_db(-20.0)
+                .shelf_db(10.0)
+                .q(10.)
+                .iho(),
+            &[
+                (1e-5, Tol::GainDb(40.0, 0.01)),
+                (0.01, Tol::GainBelowDb(-40.0)),
+                (4.99e-1, Tol::GainDb(-10.0, 0.01)),
+            ],
+        );
+    }
+}
diff --git a/src/iir/mod.rs b/src/iir/mod.rs
new file mode 100644
index 0000000..700369d
--- /dev/null
+++ b/src/iir/mod.rs
@@ -0,0 +1,8 @@
+//! IIR filters, coefficients and applications
+
+mod biquad;
+pub use biquad::*;
+mod coefficients;
+pub use coefficients::*;
+mod pid;
+pub use pid::*;
diff --git a/src/iir/pid.rs b/src/iir/pid.rs
new file mode 100644
index 0000000..388ec09
--- /dev/null
+++ b/src/iir/pid.rs
@@ -0,0 +1,281 @@
+use num_traits::{AsPrimitive, Float};
+use serde::{Deserialize, Serialize};
+
+use crate::Coefficient;
+
+/// PID controller builder
+///
+/// Builds `Biquad` from action gains, gain limits, input offset and output limits.
+///
+/// ```
+/// # use idsp::iir::*;
+/// let b: Biquad<f32> = Pid::default()
+///     .period(1e-3)
+///     .gain(Action::Ki, 1e-3)
+///     .gain(Action::Kp, 1.0)
+///     .gain(Action::Kd, 1e2)
+///     .limit(Action::Ki, 1e3)
+///     .limit(Action::Kd, 1e1)
+///     .build()
+///     .unwrap()
+///     .into();
+/// ```
+#[derive(Debug, Clone, Copy, PartialEq, PartialOrd, Serialize, Deserialize)]
+pub struct Pid<T> {
+    period: T,
+    gains: [T; 5],
+    limits: [T; 5],
+}
+
+impl<T: Float> Default for Pid<T> {
+    fn default() -> Self {
+        Self {
+            period: T::one(),
+            gains: [T::zero(); 5],
+            limits: [T::infinity(); 5],
+        }
+    }
+}
+
+/// [`Pid::build()`] errors
+#[derive(Copy, Clone, Debug, PartialEq, Eq, Ord, PartialOrd, Serialize, Deserialize)]
+#[non_exhaustive]
+pub enum PidError {
+    /// The action gains cover more than three successive orders
+    OrderRange,
+}
+
+/// PID action
+///
+/// This enumerates the five possible PID style actions of a [`crate::iir::Biquad`]
+#[derive(Copy, Clone, Debug, PartialEq, Eq, Ord, PartialOrd, Serialize, Deserialize)]
+pub enum Action {
+    /// Double integrating, -40 dB per decade
+    Kii = 0,
+    /// Integrating, -20 dB per decade
+    Ki = 1,
+    /// Proportional
+    Kp = 2,
+    /// Derivative=, 20 dB per decade
+    Kd = 3,
+    /// Double derivative, 40 dB per decade
+    Kdd = 4,
+}
+
+impl<T: Float> Pid<T> {
+    /// Sample period
+    ///
+    /// # Arguments
+    /// * `period`: Sample period in some units, e.g. SI seconds
+    pub fn period(&mut self, period: T) -> &mut Self {
+        self.period = period;
+        self
+    }
+
+    /// Gain for a given action
+    ///
+    /// Gain units are `output/input * time.powi(order)` where
+    /// * `output` are output (`y`) units
+    /// * `input` are input (`x`) units
+    /// * `time` are sample period units, e.g. SI seconds
+    /// * `order` is the action order: the frequency exponent
+    ///    (`-1` for integrating, `0` for proportional, etc.)
+    ///
+    /// Note that inverse time units correspond to angular frequency units.
+    /// Gains are accurate in the low frequency limit. Towards Nyquist, the
+    /// frequency response is warped.
+    ///
+    /// ```
+    /// # use idsp::iir::*;
+    /// let tau = 1e-3;
+    /// let ki = 1e-4;
+    /// let i: Biquad<f32> = Pid::default()
+    ///     .period(tau)
+    ///     .gain(Action::Ki, ki)
+    ///     .build()
+    ///     .unwrap()
+    ///     .into();
+    /// let x0 = 5.0;
+    /// let y0 = i.update(&mut [0.0; 4], x0);
+    /// assert!((y0 / (x0 * ki / tau) - 1.0).abs() < 2.0 * f32::EPSILON);
+    /// ```
+    ///
+    /// # Arguments
+    /// * `action`: Action to control
+    /// * `gain`: Gain value
+    pub fn gain(&mut self, action: Action, gain: T) -> &mut Self {
+        self.gains[action as usize] = gain;
+        self
+    }
+
+    /// Gain limit for a given action
+    ///
+    /// Gain limit units are `output/input`. See also [`Pid::gain()`].
+    /// Multiple gains and limits may interact and lead to peaking.
+    ///
+    /// ```
+    /// # use idsp::iir::*;
+    /// let ki_limit = 1e3;
+    /// let i: Biquad<f32> = Pid::default()
+    ///     .gain(Action::Ki, 8.0)
+    ///     .limit(Action::Ki, ki_limit)
+    ///     .build()
+    ///     .unwrap()
+    ///     .into();
+    /// let mut xy = [0.0; 4];
+    /// let x0 = 5.0;
+    /// for _ in 0..1000 {
+    ///     i.update(&mut xy, x0);
+    /// }
+    /// let y0 = i.update(&mut xy, x0);
+    /// assert!((y0 / (x0 * ki_limit) - 1.0f32).abs() < 1e-3);
+    /// ```
+    ///
+    /// # Arguments
+    /// * `action`: Action to limit in gain
+    /// * `limit`: Gain limit
+    pub fn limit(&mut self, action: Action, limit: T) -> &mut Self {
+        self.limits[action as usize] = limit;
+        self
+    }
+
+    /// Perform checks, compute coefficients and return `Biquad`.
+    ///
+    /// No attempt is made to detect NaNs, non-finite gains, non-positive period,
+    /// zero gain limits, or gain/limit sign mismatches.
+    /// These will consequently result in NaNs/infinities, peaking, or notches in
+    /// the Biquad coefficients.
+    ///
+    /// Gain limits for zero gain actions or for proportional action are ignored.
+    ///
+    /// ```
+    /// # use idsp::iir::*;
+    /// let i: Biquad<f32> = Pid::default().gain(Action::Kp, 3.0).build().unwrap().into();
+    /// assert_eq!(i, Biquad::proportional(3.0));
+    /// ```
+    ///
+    /// # Panic
+    /// Will panic in debug mode on fixed point coefficient overflow.
+    pub fn build<C: Coefficient + AsPrimitive<T>>(&self) -> Result<[C; 5], PidError>
+    where
+        T: AsPrimitive<C>,
+    {
+        const KP: usize = Action::Kp as usize;
+
+        // Determine highest denominator (feedback, `a`) order
+        let low = self
+            .gains
+            .iter()
+            .take(KP)
+            .position(|g| !g.is_zero())
+            .unwrap_or(KP);
+
+        if self.gains.iter().skip(low + 3).any(|g| !g.is_zero()) {
+            return Err(PidError::OrderRange);
+        }
+
+        // Scale gains, compute limits
+        let mut zi = self.period.powi(low as i32 - KP as i32);
+        let mut gl = [[T::zero(); 2]; 3];
+        for (gli, (i, (ggi, lli))) in gl.iter_mut().zip(
+            self.gains
+                .iter()
+                .zip(self.limits.iter())
+                .enumerate()
+                .skip(low),
+        ) {
+            gli[0] = *ggi * zi;
+            gli[1] = if i == KP { T::one() } else { gli[0] / *lli };
+            zi = zi * self.period;
+        }
+        let a0i = T::one() / (gl[0][1] + gl[1][1] + gl[2][1]);
+
+        // Derivative/integration kernels
+        let kernels = [
+            [C::one(), C::zero(), C::zero()],
+            [C::one(), C::zero() - C::one(), C::zero()],
+            [C::one(), C::zero() - C::one() - C::one(), C::one()],
+        ];
+
+        // Coefficients
+        let mut ba = [[C::ZERO; 2]; 3];
+        for (gli, ki) in gl.iter().zip(kernels.iter()) {
+            // Quantize the gains and not the coefficients
+            let (g, l) = (C::quantize(gli[0] * a0i), C::quantize(gli[1] * a0i));
+            for (j, baj) in ba.iter_mut().enumerate() {
+                *baj = [baj[0] + ki[j] * g, baj[1] + ki[j] * l];
+            }
+        }
+
+        Ok([ba[0][0], ba[1][0], ba[2][0], ba[1][1], ba[2][1]])
+    }
+}
+
+#[cfg(test)]
+mod test {
+    use crate::iir::*;
+
+    #[test]
+    fn pid() {
+        let b: Biquad<f32> = Pid::default()
+            .period(1.0)
+            .gain(Action::Ki, 1e-3)
+            .gain(Action::Kp, 1.0)
+            .gain(Action::Kd, 1e2)
+            .limit(Action::Ki, 1e3)
+            .limit(Action::Kd, 1e1)
+            .build()
+            .unwrap()
+            .into();
+        let want = [
+            9.18190826,
+            -18.27272561,
+            9.09090826,
+            -1.90909074,
+            0.90909083,
+        ];
+        for (ba_have, ba_want) in b.ba().iter().zip(want.iter()) {
+            assert!(
+                (ba_have / ba_want - 1.0).abs() < 2.0 * f32::EPSILON,
+                "have {:?} != want {want:?}",
+                b.ba(),
+            );
+        }
+    }
+
+    #[test]
+    fn pid_i32() {
+        let b: Biquad<i32> = Pid::default()
+            .period(1.0)
+            .gain(Action::Ki, 1e-5)
+            .gain(Action::Kp, 1e-2)
+            .gain(Action::Kd, 1e0)
+            .limit(Action::Ki, 1e1)
+            .limit(Action::Kd, 1e-1)
+            .build()
+            .unwrap()
+            .into();
+        println!("{b:?}");
+    }
+
+    #[test]
+    fn units() {
+        let ki = 5e-2;
+        let tau = 3e-3;
+        let b: Biquad<f32> = Pid::default()
+            .period(tau)
+            .gain(Action::Ki, ki)
+            .build()
+            .unwrap()
+            .into();
+        let mut xy = [0.0; 4];
+        for i in 1..10 {
+            let y_have = b.update(&mut xy, 1.0);
+            let y_want = (i as f32) * (ki / tau);
+            assert!(
+                (y_have / y_want - 1.0).abs() < 3.0 * f32::EPSILON,
+                "{i}: have {y_have} != {y_want}"
+            );
+        }
+    }
+}
diff --git a/src/iir_int.rs b/src/iir_int.rs
deleted file mode 100644
index 0dfd476..0000000
--- a/src/iir_int.rs
+++ /dev/null
@@ -1,96 +0,0 @@
-use super::tools::macc_i32;
-use core::f64::consts::PI;
-use serde::{Deserialize, Serialize};
-
-/// Generic vector for integer IIR filter.
-/// This struct is used to hold the x/y input/output data vector or the b/a coefficient
-/// vector.
-pub type Vec5 = [i32; 5];
-
-trait Coeff {
-    /// Lowpass biquad filter using cutoff and sampling frequencies.  Taken from:
-    /// https://webaudio.github.io/Audio-EQ-Cookbook/audio-eq-cookbook.html
-    ///
-    /// # Args
-    /// * `f` - Corner frequency, or 3dB cutoff frequency (in units of sample rate).
-    ///         This is only accurate for low corner frequencies less than ~0.01.
-    /// * `q` - Quality factor (1/sqrt(2) for critical).
-    /// * `k` - DC gain.
-    ///
-    /// # Returns
-    /// 2nd-order IIR filter coefficients in the form [b0,b1,b2,a1,a2]. a0 is set to -1.
-    fn lowpass(f: f64, q: f64, k: f64) -> Self;
-}
-
-impl Coeff for Vec5 {
-    fn lowpass(f: f64, q: f64, k: f64) -> Self {
-        // 3rd order Taylor approximation of sin and cos.
-        let f = f * 2. * PI;
-        let f2 = f * f * 0.5;
-        let fcos = 1. - f2;
-        let fsin = f * (1. - f2 / 3.);
-        let alpha = fsin / (2. * q);
-        // IIR uses Q2.30 fixed point
-        let a0 = (1. + alpha) / (1 << IIR::SHIFT) as f64;
-        let b0 = (k / 2. * (1. - fcos) / a0 + 0.5) as _;
-        let a1 = (2. * fcos / a0 + 0.5) as _;
-        let a2 = ((alpha - 1.) / a0 + 0.5) as _;
-
-        [b0, 2 * b0, b0, a1, a2]
-    }
-}
-
-/// Integer biquad IIR
-///
-/// See `dsp::iir::IIR` for general implementation details.
-/// Offset and limiting disabled to suit lowpass applications.
-/// Coefficient scaling fixed and optimized.
-#[derive(Copy, Clone, Default, Debug, Serialize, Deserialize)]
-pub struct IIR {
-    pub ba: Vec5,
-    pub y_offset: i32,
-    pub y_min: i32,
-    pub y_max: i32,
-}
-
-impl IIR {
-    /// Coefficient fixed point format: signed Q2.30.
-    /// Tailored to low-passes, PI, II etc.
-    pub const SHIFT: u32 = 30;
-
-    /// Feed a new input value into the filter, update the filter state, and
-    /// return the new output. Only the state `xy` is modified.
-    ///
-    /// # Arguments
-    /// * `xy` - Current filter state.
-    /// * `x0` - New input.
-    pub fn update(&self, xy: &mut Vec5, x0: i32) -> i32 {
-        let n = self.ba.len();
-        debug_assert!(xy.len() == n);
-        // `xy` contains       x0 x1 y0 y1 y2
-        // Increment time      x1 x2 y1 y2 y3
-        // Shift               x1 x1 x2 y1 y2
-        // This unrolls better than xy.rotate_right(1)
-        xy.copy_within(0..n - 1, 1);
-        // Store x0            x0 x1 x2 y1 y2
-        xy[0] = x0;
-        // Compute y0 by multiply-accumulate
-        let y0 = macc_i32(self.y_offset, xy, &self.ba, Self::SHIFT);
-        // Limit y0
-        let y0 = y0.max(self.y_min).min(self.y_max);
-        // Store y0            x0 x1 y0 y1 y2
-        xy[n / 2] = y0;
-        y0
-    }
-}
-
-#[cfg(test)]
-mod test {
-    use super::{Coeff, Vec5};
-
-    #[test]
-    fn lowpass_gen() {
-        let ba = Vec5::lowpass(1e-5, 1. / 2f64.sqrt(), 2.);
-        println!("{:?}", ba);
-    }
-}
diff --git a/src/lib.rs b/src/lib.rs
index 34affbf..78a72a7 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -1,7 +1,10 @@
-#![cfg_attr(not(test), no_std)]
+#![cfg_attr(not(any(test, doctest, feature = "std")), no_std)]
+#![doc = include_str!("../README.md")]
+#![deny(rust_2018_compatibility)]
+#![deny(rust_2018_idioms)]
+#![warn(missing_docs)]
+#![forbid(unsafe_code)]
 
-mod tools;
-pub use tools::*;
 mod atan2;
 pub use atan2::*;
 mod accu;
@@ -13,7 +16,6 @@ pub use complex::*;
 mod cossin;
 pub use cossin::*;
 pub mod iir;
-pub mod iir_int;
 mod lockin;
 pub use lockin::*;
 mod lowpass;
@@ -25,6 +27,9 @@ pub use rpll::*;
 mod unwrap;
 pub use unwrap::*;
 pub mod hbf;
+mod num;
+pub use num::*;
+pub mod svf;
 
 #[cfg(test)]
 pub mod testing;
diff --git a/src/lockin.rs b/src/lockin.rs
index e81abc7..0ca8669 100644
--- a/src/lockin.rs
+++ b/src/lockin.rs
@@ -1,5 +1,8 @@
 use super::{Complex, ComplexExt, Filter, MulScaled};
 
+/// Lockin filter
+///
+/// Combines two [`Filter`] and an NCO to perform demodulation
 #[derive(Copy, Clone, Default)]
 pub struct Lockin<T> {
     state: [T; 2],
diff --git a/src/lowpass.rs b/src/lowpass.rs
index 3f1bb7f..03ca250 100644
--- a/src/lowpass.rs
+++ b/src/lowpass.rs
@@ -7,7 +7,6 @@ use crate::Filter;
 ///
 /// The filter will cleanly saturate towards the `i32` range.
 ///
-///
 /// Both filters have been optimized for accuracy, dynamic range, and
 /// speed on Cortex-M7.
 #[derive(Copy, Clone)]
@@ -33,9 +32,13 @@ impl<const N: usize> Filter for Lowpass<N> {
     /// `1 << 16 <= k <= q*(1 << 31)`.
     type Config = [i32; N];
     fn update(&mut self, x: i32, k: &Self::Config) -> i32 {
-        let mut d = x.saturating_sub((self.0[0] >> 32) as i32) as i64 * k[0] as i64;
+        let mut d = x.saturating_sub(self.get()) as i64 * k[0] as i64;
         let y;
-        if N >= 2 {
+        if N == 1 {
+            self.0[0] += d;
+            y = self.get();
+            self.0[0] += d;
+        } else if N == 2 {
             d += (self.0[1] >> 32) * k[1] as i64;
             self.0[1] += d;
             self.0[0] += self.0[1];
@@ -47,15 +50,15 @@ impl<const N: usize> Filter for Lowpass<N> {
             self.0[0] += self.0[1];
             self.0[1] += d;
         } else {
-            self.0[0] += d;
-            y = self.get();
-            self.0[0] += d;
+            unimplemented!()
         }
         y
     }
+
     fn get(&self) -> i32 {
         (self.0[0] >> 32) as i32
     }
+
     fn set(&mut self, x: i32) {
         self.0[0] = (x as i64) << 32;
     }
diff --git a/src/num.rs b/src/num.rs
new file mode 100644
index 0000000..4ab24d5
--- /dev/null
+++ b/src/num.rs
@@ -0,0 +1,155 @@
+use num_traits::{AsPrimitive, Float, Num};
+
+/// Helper trait unifying fixed point and floating point coefficients/samples
+pub trait Coefficient: 'static + Copy + Num + AsPrimitive<Self::ACCU> {
+    /// Multiplicative identity
+    const ONE: Self;
+    /// Negative multiplicative identity, equal to `-Self::ONE`.
+    const NEG_ONE: Self;
+    /// Additive identity
+    const ZERO: Self;
+    /// Lowest value
+    const MIN: Self;
+    /// Highest value
+    const MAX: Self;
+    /// Accumulator type
+    type ACCU: AsPrimitive<Self> + Num;
+
+    /// Proper scaling and potentially using a wide accumulator.
+    /// Clamp `self` such that `min <= self <= max`.
+    /// Undefined result if `max < min`.
+    fn macc(self, s: Self::ACCU, min: Self, max: Self, e1: Self) -> (Self, Self);
+
+    /// Clamp to between min and max
+    ///
+    /// Undefined if `min > max`.
+    fn clip(self, min: Self, max: Self) -> Self;
+
+    /// Multiplication (scaled)
+    fn mul_scaled(self, other: Self) -> Self;
+
+    /// Division (scaled)
+    fn div_scaled(self, other: Self) -> Self;
+
+    /// Scale and quantize a floating point value.
+    fn quantize<C>(value: C) -> Self
+    where
+        Self: AsPrimitive<C>,
+        C: Float + AsPrimitive<Self>;
+    // TODO: range check and Result
+}
+
+macro_rules! impl_float {
+    ($T:ty) => {
+        impl Coefficient for $T {
+            const ONE: Self = 1.0;
+            const NEG_ONE: Self = -1.0;
+            const ZERO: Self = 0.0;
+            const MIN: Self = <$T>::NEG_INFINITY;
+            const MAX: Self = <$T>::INFINITY;
+            type ACCU = Self;
+
+            #[inline]
+            fn macc(self, s: Self::ACCU, min: Self, max: Self, _e1: Self) -> (Self, Self) {
+                ((self + s).clip(min, max), 0.0)
+            }
+
+            #[inline]
+            fn clip(self, min: Self, max: Self) -> Self {
+                // <$T>::clamp() is slow and checks
+                self.max(min).min(max)
+            }
+
+            #[inline]
+            fn div_scaled(self, other: Self) -> Self {
+                self / other
+            }
+
+            #[inline]
+            fn mul_scaled(self, other: Self) -> Self {
+                self * other
+            }
+
+            #[inline]
+            fn quantize<C: Float + AsPrimitive<Self>>(value: C) -> Self {
+                value.as_()
+            }
+        }
+    };
+}
+impl_float!(f32);
+impl_float!(f64);
+
+macro_rules! impl_int {
+    ($T:ty, $U:ty, $A:ty, $Q:literal) => {
+        impl Coefficient for $T {
+            const ONE: Self = 1 << $Q;
+            const NEG_ONE: Self = -1 << $Q;
+            const ZERO: Self = 0;
+            const MIN: Self = <$T>::MIN;
+            const MAX: Self = <$T>::MAX;
+            type ACCU = $A;
+
+            #[inline]
+            fn macc(self, mut s: Self::ACCU, min: Self, max: Self, e1: Self) -> (Self, Self) {
+                const S: usize = core::mem::size_of::<$T>() * 8;
+                // Guard bits
+                const G: usize = S - $Q;
+                // Combine offset (u << $Q) with previous quantization error e1
+                s += (((self >> G) as $A) << S) | (((self << $Q) | e1) as $U as $A);
+                // Ord::clamp() is slow and checks
+                // This clamping truncates the lowest G bits of the value and the limits.
+                debug_assert_eq!(min & ((1 << G) - 1), 0);
+                debug_assert_eq!(max & ((1 << G) - 1), (1 << G) - 1);
+                let y0 = if (s >> S) as $T < (min >> G) {
+                    min
+                } else if (s >> S) as $T > (max >> G) {
+                    max
+                } else {
+                    (s >> $Q) as $T
+                };
+                // Quantization error
+                let e0 = s as $T & ((1 << $Q) - 1);
+                (y0, e0)
+            }
+
+            #[inline]
+            fn clip(self, min: Self, max: Self) -> Self {
+                // Ord::clamp() is slow and checks
+                if self < min {
+                    min
+                } else if self > max {
+                    max
+                } else {
+                    self
+                }
+            }
+
+            #[inline]
+            fn div_scaled(self, other: Self) -> Self {
+                (((self as $A) << $Q) / other as $A) as $T
+            }
+
+            #[inline]
+            fn mul_scaled(self, other: Self) -> Self {
+                (((1 << ($Q - 1)) + self as $A * other as $A) >> $Q) as $T
+            }
+
+            #[inline]
+            fn quantize<C>(value: C) -> Self
+            where
+                Self: AsPrimitive<C>,
+                C: Float + AsPrimitive<Self>,
+            {
+                (value * (1 << $Q).as_()).round().as_()
+            }
+        }
+    };
+}
+// Q2.X chosen to be able to exactly and inclusively represent -2 as `-1 << X + 1`
+// This is necessary to meet a1 = -2
+// It also create 2 guard bits for clamping in the accumulator which is often enough.
+impl_int!(i8, u8, i16, 6);
+impl_int!(i16, u16, i32, 14);
+impl_int!(i32, u32, i64, 30);
+impl_int!(i64, u64, i128, 62);
diff --git a/src/pll.rs b/src/pll.rs
index 01cdac6..ee11ebb 100644
--- a/src/pll.rs
+++ b/src/pll.rs
@@ -33,6 +33,8 @@ use serde::{Deserialize, Serialize};
 ///
 /// The extension to I^3,I^2,I behavior to track chirps phase-accurately or to i64 data to
 /// increase resolution for extremely narrowband applications is obvious.
+///
+/// This PLL implements first order noise shaping to reduce quantization errors.
 #[derive(Copy, Clone, Default, Deserialize, Serialize)]
 pub struct PLL {
     // last input phase
diff --git a/src/rpll.rs b/src/rpll.rs
index e307d96..0bf0b0b 100644
--- a/src/rpll.rs
+++ b/src/rpll.rs
@@ -93,7 +93,6 @@ impl RPLL {
 #[cfg(test)]
 mod test {
     use super::RPLL;
-    use ndarray::prelude::*;
     use rand::{prelude::*, rngs::StdRng};
     use std::vec::Vec;
 
@@ -175,14 +174,12 @@ mod test {
             self.run(t_settle);
 
             let (y, f) = self.run(n);
-            let y = Array::from(y);
-            let f = Array::from(f);
             // println!("{:?} {:?}", f, y);
 
-            let fm = f.mean().unwrap();
-            let fs = f.std_axis(Axis(0), 0.).into_scalar();
-            let ym = y.mean().unwrap();
-            let ys = y.std_axis(Axis(0), 0.).into_scalar();
+            let fm = f.iter().copied().sum::<f32>() / f.len() as f32;
+            let fs = f.iter().map(|f| (*f - fm).powi(2)).sum::<f32>().sqrt() / f.len() as f32;
+            let ym = y.iter().copied().sum::<f32>() / y.len() as f32;
+            let ys = y.iter().map(|y| (*y - ym).powi(2)).sum::<f32>().sqrt() / y.len() as f32;
 
             println!("f: {:.2e}±{:.2e}; y: {:.2e}±{:.2e}", fm, fs, ym, ys);
 
diff --git a/src/svf.rs b/src/svf.rs
new file mode 100644
index 0000000..817d4cd
--- /dev/null
+++ b/src/svf.rs
@@ -0,0 +1,54 @@
+//! State variable filter
+
+use num_traits::{Float, FloatConst};
+use serde::{Deserialize, Serialize};
+
+/// Second order state variable filter state
+pub struct State<T> {
+    /// Lowpass output
+    pub lp: T,
+    /// Highpass output
+    pub hp: T,
+    /// Bandpass output
+    pub bp: T,
+}
+
+impl<T: Float> State<T> {
+    /// Bandreject (notch) output
+    pub fn br(&self) -> T {
+        self.hp + self.lp
+    }
+}
+
+/// State variable filter
+///
+/// <https://www.earlevel.com/main/2003/03/02/the-digital-state-variable-filter/>
+#[derive(Copy, Clone, Debug, Deserialize, Serialize, PartialEq, PartialOrd)]
+pub struct Svf<T> {
+    f: T,
+    q: T,
+}
+
+impl<T: Float + FloatConst> Svf<T> {
+    /// Set the critical frequency
+    ///
+    /// In units of the sample frequency.
+    pub fn set_frequency(&mut self, f0: T) {
+        self.f = (T::one() + T::one()) * (T::PI() * f0).sin();
+    }
+
+    /// Set the Q parameter
+    pub fn set_q(&mut self, q: T) {
+        self.q = T::one() / q;
+    }
+
+    /// Update the filter
+    ///
+    /// Ingest an input sample and update state correspondingly.
+    /// Selected output(s) are available from [`State`].
+    pub fn update(&self, s: &mut State<T>, x0: T) {
+        s.lp = s.bp * self.f + s.lp;
+        s.hp = x0 - s.lp - s.bp * self.q;
+        s.bp = s.hp * self.f + s.bp;
+    }
+}
diff --git a/src/tools.rs b/src/tools.rs
deleted file mode 100644
index 90a9627..0000000
--- a/src/tools.rs
+++ /dev/null
@@ -1,54 +0,0 @@
-use core::ops::{Add, Mul, Neg};
-use num_traits::Zero;
-
-pub fn abs<T>(x: T) -> T
-where
-    T: PartialOrd + Zero + Neg<Output = T>,
-{
-    if x >= T::zero() {
-        x
-    } else {
-        -x
-    }
-}
-
-// These are implemented here because core::f32 doesn't have them (yet).
-// They are naive and don't handle inf/nan.
-// `compiler-intrinsics`/llvm should have better (robust, universal, and
-// faster) implementations.
-
-pub fn copysign<T>(x: T, y: T) -> T
-where
-    T: PartialOrd + Zero + Neg<Output = T>,
-{
-    if (x >= T::zero() && y >= T::zero()) || (x <= T::zero() && y <= T::zero()) {
-        x
-    } else {
-        -x
-    }
-}
-
-// Multiply-accumulate vectors `x` and `a`.
-//
-// A.k.a. dot product.
-// Rust/LLVM optimize this nicely.
-pub fn macc<T>(y0: T, x: &[T], a: &[T]) -> T
-where
-    T: Add<Output = T> + Mul<Output = T> + Copy,
-{
-    x.iter()
-        .zip(a)
-        .map(|(x, a)| *x * *a)
-        .fold(y0, |y, xa| y + xa)
-}
-
-pub fn macc_i32(y0: i32, x: &[i32], a: &[i32], shift: u32) -> i32 {
-    // Rounding bias, half up
-    let y0 = ((y0 as i64) << shift) + (1 << (shift - 1));
-    let y = x
-        .iter()
-        .zip(a)
-        .map(|(x, a)| *x as i64 * *a as i64)
-        .fold(y0, |y, xa| y + xa);
-    (y >> shift) as i32
-}