diff --git a/docs/make.jl b/docs/make.jl
index 8dc0b1f02..9515a3f5c 100644
--- a/docs/make.jl
+++ b/docs/make.jl
@@ -8,6 +8,7 @@ makedocs(
   format = Documenter.HTML(assets = ["assets/style.css"], ansicolor=true, prettyurls = get(ENV, "CI", nothing) == "true"),
   sitename = "Krylov.jl",
   pages = ["Home" => "index.md",
+           "Foundations of Krylov methods" => "foundations_krylov.md",
            "API" => "api.md",
            "Krylov methods" => ["Symmetric positive definite linear systems" => "solvers/spd.md",
                                 "Symmetric indefinite linear systems" => "solvers/sid.md",
diff --git a/docs/src/foundations_krylov.md b/docs/src/foundations_krylov.md
new file mode 100644
index 000000000..a5cbaeb9e
--- /dev/null
+++ b/docs/src/foundations_krylov.md
@@ -0,0 +1,49 @@
+## Origin of Krylov methods
+
+__Cayley-Hamilton theorem__:
+If $A$ is a square matrix of size ``n`` and
+```math
+p(X) = \det(XI_n - A) = X^n + p_{n-1} X^{n-1} + \dots + p_1 X + p_0
+```
+is its characteristic polynomial, then
+```math
+p(A) = A^n + p_{n-1} A^{n-1} + \dots + p_1 A + p_0 I_n = 0_n.
+```
+If $A$ is nonsingular, $p_0 \ne 0$ and
+```math
+A^{-1} = -\dfrac{1}{p_0}(A^{n-1} + p_{n-1} A^{n-2} + \dots + p_1 I_n).
+```
+Thus,
+```math
+x^{\star} = A^{-1}b \implies x^{\star} \in \mathcal{K}_n(A, b) = \mathop{\mathrm{Span}} \{b, Ab, \dots, A^{n-1}b\}
+```
+where ``\mathcal{K}_n(A, b)`` is a *Krylov subspace*.
+
+## Principle of Krylov methods
+
+Krylov methods build iteratively a solution ``x_k \in \mathcal{K}_k(A,b)`` of ``Ax = b``.
+
+A process is used to build a basis ``V_k`` of ``\mathcal{K}_k(A, b)``.
+We have the Lanczos process for square symmetric matrices and the Arnoldi process for square unsymmetric matrices.
+The projection of ``A`` into the Krylov subspace has a workable structure.
+The projection is always a tridiagonal matrix with the Lanczos process and an upper Hessenberg matrix with the Arnoldi process regardless the structure of ``A``.
+
+Iterates have the form ``x_k = V_k y_k`` where ``y_k \in \mathbb{R}^k`` is determined by solving a subproblem that uses the projection of ``A``.
+Depending on the subproblem used, the iterates ``x_k`` have different properties, such as monotonically
+decreasing the residual norms ``\|b - A x_k\|`` or error norms ``\|x_k - x^{\star}\|``.
+
+When ``A`` is rectangular, we use the Golub-Kahan process to build orthogonal bases of ``\mathcal{K}_k(A^T A, A^T b)`` and ``\mathcal{K}_k(A A^T, b)``, and the normal equations, to solve linear least-squares and least-norm problems.
+
+## Convergence of Krylov methods
+
+Because the minimal polynomial of a matrix ``A`` is a polynomial of minimal degree ``m`` such that ``q(A) = 0``, it divides all polynomials such that ``r(A) = 0``.
+It notably divides the characteristic polynomial of ``A`` and it has the same roots:
+```math
+A^{-1} = -\dfrac{1}{q_0}(A^{m-1} + q_{m-1} A^{m-2} + \dots + q_1 I_n)
+```
+and
+```math
+x^{\star} = A^{-1}b \implies x^{\star} \in \mathcal{K}_m(A, b).
+```
+To have ``m \ll n``, the number of distinct roots of ``p(A)`` must be small.
+It means that the matrix ``A`` has only a few distinct eigenvalues.