Merge pull request #96 from gtbook/frank_equations

Fix display equations
gtbook · Dec 18, 2024 · 37105bd · 37105bd
2 parents ed7484d + f46420e
commit 37105bd
Showing 41 changed files with 1,217 additions and 1,768 deletions.
diff --git a/S11_models.ipynb b/S11_models.ipynb
@@ -111,8 +111,9 @@
     "\n",
     "For a drone, we might specify the configuration as $q = (x,y,z,\\phi,\\theta,\\psi)$, in which $x,y,z$ give the position of the origin of the body-attached frame,  and the angles $\\phi, \\theta, \\psi$ define the roll, pitch, and yaw angles for the drone’s orientation.\n",
     "The configuration of a robot answers the question of where the robot is at a specific instant in time. If we wish instead to describe the motion of a robot, we must consider the configuration to be time varying, and in this case both the configuration and its time derivative (a velocity) are relevant. We often package the configuration and its time derivative into a single vector\n",
-    "\n",
-    "$$x(t) = \\left[ \\begin{array}{c} q(t) \\\\ \\dot{q}(t) \\end{array}\\right]$$\n"
+    "\\begin{equation}\n",
+    "x(t) = \\left[ \\begin{array}{c} q(t) \\\\ \\dot{q}(t) \\end{array}\\right]\n",
+    "\\end{equation}"
    ]
   },
   {
@@ -148,22 +149,23 @@
    "metadata": {},
    "source": [
     "For robot’s that have continuous configuration or state spaces, we have the choice of whether to represent actions in discrete or continuous time. In the case of discrete time, we typically represent actions at time instant $k$ as $u_k$, and define a corresponding mapping from state at time $k$ to the state at time $k+1$:\n",
-    "\n",
-    "$$x_{k+1} = f(x_k, u_k)$$\n",
-    "\n",
+    "\\begin{equation}\n",
+    "x_{k+1} = f(x_k, u_k)\n",
+    "\\end{equation}\n",
     "Here, the function $f(x,u)$ defines the effect of the action on the current state. Typically, for discrete time systems we assume that the time discretization is uniform, $\\Delta t$, and that the time corresponding to time instant $k$ is thus equal to $k \\Delta t$.  It should be emphasized that discrete time systems can have continuous state representations, for example, our logistics robot will have $x_k \\in \\mathbb{R}^2$, which takes continuous values in the plane, even though we only note these values at discrete moments in time.\n",
     "Note that we have made several notational choices here. First, we use $u$ to denote the action, mainly due to this usage in the control theory community. We use the index $k$ instead of $t$ to denote the discrete time instant, preferring to reserve $t$ for the case of continuous time systems. \n",
     "Further, we will use $x$ to denote state for discrete time systems, even if we are interested only in the configuration $q$.\n",
     "\n",
     "In some cases, continuous time representations are preferred. For example, the drone in Chapter 7 moves in continuous time. For continuous time systems, we typically represent the system dynamics as an ordinary differential equation of the form:\n",
-    "\n",
-    "$$ \\dot{x} = f(x(t), u(t))$$\n",
-    "\n",
+    "\\begin{equation}\n",
+    "\\dot{x} = f(x(t), u(t))\n",
+    "\\end{equation}\n",
     "In the case of drones, the system dynamics relate the instantaneous velocity of the drone (both linear and angular velocity) to the input thrusts provided by the propellors. \n",
     "Even if we choose to represent actions using continuous time, it is often necessary to discretize time for the purpose of computation\n",
     "(e.g., to determine a drone trajectory using nonlinear optimization). In this case, we typically compute a discrete time approximation for time $(k+1) \\Delta t$ by integrating the system dynamics over the relevant time interval:\n",
-    "\n",
-    "$$x_{k+1} = x_k + \\int_{k \\Delta t}^{(k+1) \\Delta t} \\dot{x}(t) dt$$\n"
+    "\\begin{equation}\n",
+    "x_{k+1} = x_k + \\int_{k \\Delta t}^{(k+1) \\Delta t} \\dot{x}(t) dt\n",
+    "\\end{equation}"
    ]
   },
   {
@@ -178,9 +180,9 @@
     "> Actions allow the robot to affect the world. Sensors allow the robot to perceive the world.\n",
     "\n",
     "A robot's sensors provide information that can be used to infer things about the world, about the robot, and about the robot's location in the world.  In general, an abstract sensor model can be written as\n",
-    "\n",
-    "$$z = h(x) $$\n",
-    "\n",
+    "\\begin{equation}\n",
+    "z = h(x) \n",
+    "\\end{equation}\n",
     "in which the measurement is denoted by $z$ and the function $h(\\cdot)$ maps from the current state to a sensor value.  The form of $h$ depends, of course, on the sensor. For the trash sorting robot of Chapter 2, one sensor measures the electrical conductivity of an object, and merely returns “true” or “false.” Another sensor returns the real-valued weight of the object in kilograms. These sensors are simple to implement, fairly robust, and require physical contact with the object of interest.  "
    ]
   },

diff --git a/S13_math.ipynb b/S13_math.ipynb
@@ -125,11 +125,9 @@
     "*Optimization* is the process of finding extremal values of performance criteria, all of which, in this book, will be expressed as scalar-valued functions of a finite number of inputs. In some cases, we search for the single, scalar input that will minimize a cost function, such as when choosing the best action for a trash sorting robot in Chapter 2. In other cases, we might search for a sequence of input commands that will yield an ideal system trajectory, such as for drone flight in Chapter 7. In still other cases, we might try to find millions of weights for a neural network to minimize recognition error of our computer vision system, such as the Convolutional Neural Nets (CNNs) of Chapter 5. \n",
     "\n",
     "In general, we can express such problems as\n",
-    "\n",
-    "$$\n",
+    "\\begin{equation}\n",
     "\\max_x f(x)\n",
-    "$$\n",
-    "\n",
+    "\\end{equation}\n",
     "in which $x$ is called the optimization variable (or variables in the case where $x$ is a vector) and $f(\\cdot)$ is called the objective function. For this formulation, when maximizing, $f(\\cdot)$ can be thought of as a reward. We could have framed the problem as a minimization, $\\min_x f(x)$, in which case $f(\\cdot)$ should be thought of as a cost to be minimized. It is easy to convert between these two forms (e.g., simply multiply the objective function by $-1$), but it is often helpful to express problems specifically as either minimizing cost or maximizing reward, based on the semantics of the problem."
    ]
   },
@@ -138,32 +136,24 @@
    "metadata": {},
    "source": [
     "Many optimization problems can be solved using the method of gradient descent.  Such methods construct a sequence of estimates, $x^1, x^2, \\dots$ until a minimal value of cost is found. The incremental update rule for the estimates is given by\n",
-    "\n",
-    "$$ \n",
+    "\\begin{equation}\n",
     "x^{k+1} = x^k + \\alpha \\nabla f(x)\n",
-    "$$\n",
-    "\n",
+    "\\end{equation}\n",
     "in which $\\alpha$ is a step-size parameter.\n",
     "In some cases, the gradient $\\nabla f(x) $ can be computed in closed form, while for more complex functions it may be necessary to use numerical approximations of the gradient.\n",
     "When working with neural networks, the cost function can be written as a sum\n",
-    "\n",
-    "$$\n",
+    "\\begin{equation}\n",
     "f(x,S) = \\sum_{s\\in S} f_k(x; s)\n",
-    "$$\n",
-    "\n",
+    "\\end{equation}\n",
     "Here, $x$ denotes the weights assigned to the connections in the network, and $s$ denotes a specific example in the data set $S$. Since differentiation is linear, the gradient of this functional can be expressed as \n",
-    "\n",
-    "$$\n",
+    "\\begin{equation}\n",
     "\\nabla_x f(x,S) = \\sum_{s\\in S} \\nabla_x f_k(x; s)\n",
-    "$$\n",
-    "\n",
+    "\\end{equation}\n",
     "\n",
     "If the data set is very large, computing $|S|$ gradients will be prohibitive. The method of stochastic gradient descent deals with this problem by randomly selecting a few samples from the data set, $S' \\subset S$, and using the approximation\n",
-    "\n",
-    "$$\n",
+    "\\begin{equation}\n",
     "\\nabla_x f(x,S) \\approx \\sum_{s \\in S'} \\nabla_x f_k(x; s)\n",
-    "$$\n",
-    "\n",
+    "\\end{equation}\n",
     "We use stochastic gradient descent in chapter 5 to optimize the weights in a deep neural network.\n",
     "\n",
     "Quite often in robotics, the optimization variables can be written as $x_1, x_2, \\dots x_n$, in which the subscripts denote discrete instants in time. In this case, there are typically well-defined relationships between each $x_k$ and $x_{k+1}$. This is true, for example, when a robot moves through the world, at each step $k$ executing command $u_k$ and collecting data $z_k$.  Estimating the state trajectory $x_1, \\dots, x_n$ can be formulated as an optimization problem in which the value of $u_k$ acts as a kind of constraint on the relationship between $x_k$ and $x_{k+1}$, as we will see in Chapter 4 when we solve the localization problem.  Similarly, if we wish to optimize the trajectory of a drone (as in Chapter 7), the optimization problem begins by finding a sequence of states $x_1, \\dots, x_n$ that maximize performance criteria. In this case, in order to ensure smooth flight, $x_k$ and $x_{k+1}$ should not be too far apart.  For problems of this sort, when there are specific relationships between the optimization variables, and especially when they enjoy this kind of sequential structure, we can solve the optimization using factor graphs, which are extremely computationally efficient when the graph that encodes variable interdependence is sparse. "

diff --git a/S21_sorter_state.ipynb b/S21_sorter_state.ipynb
@@ -170,14 +170,15 @@
     }
    },
    "source": [
-    "```{index} sample space",
-    "```",
+    "```{index} sample space\n",
+    "```\n",
     "The starting point for reasoning with uncertainty is to define the set of outcomes that might occur.\n",
     "The set of all possible outcomes is called the **sample space**, often denoted by $\\Omega.$\n",
     "In our example, when an item of trash arrives on the conveyor belt,\n",
     "there are five possible outcomes,\n",
-    "\n",
-    "$\\Omega = \\{ \\rm{cardboard, paper, cans, scrap \\; metal, bottle}\\}.$"
+    "\\begin{equation}\n",
+    "\\Omega = \\{ \\rm{cardboard, paper, cans, scrap \\; metal, bottle}\\}.\n",
+    "\\end{equation}"
    ]
   },
   {
@@ -189,18 +190,18 @@
     }
    },
    "source": [
-    "```{index} event, probability distribution",
-    "```",
+    "```{index} event, probability distribution\n",
+    "```\n",
     "## Probability Distributions\n",
     "A subset of the sample space $\\Omega$ is called an **event**.  A **probability distribution**, $P$, assigns a probability $0 \\leq P(A) \\leq 1$ to each event $A \\subseteq \\Omega$, with $P(\\emptyset) = 0$ and $P(\\Omega)=1$. \n",
     "In addition, for disjoint events, $A_i \\cap A_j = \\emptyset$, we have\n",
     "$P(A_i \\cup A_j) = P(A_i) + P(A_j)$.\n",
     "Using this property, it is a simple matter to compute the probability for any $A \\subseteq \\Omega$\n",
     "if we are provided with the probabilities of the individual outcomes.\n",
     "Further, since $P(\\Omega)=1$, it follows immediately that \n",
-    "\n",
-    "$$P(\\Omega) = \\sum_{\\omega \\in \\Omega} P(\\{\\omega\\}) = 1$$\n",
-    "\n",
+    "\\begin{equation}\n",
+    "P(\\Omega) = \\sum_{\\omega \\in \\Omega} P(\\{\\omega\\}) = 1\n",
+    "\\end{equation}\n",
     "i.e., that the probabilities of the individual outcomes sum to unity.\n",
     "As a slight abuse of notation, for singleton events, we will often write $P(\\omega)$ rather than $P(\\{\\omega\\})$\n",
     "to simplify notation."
@@ -262,9 +263,9 @@
     "It is common to assume that outcomes occur in proportion to their probability (there are a number of\n",
     "technical conditions that underlie this assumption, such as the condition that outcomes are independent,\n",
     "but we will not address these here). Thus, from the above observed frequencies, we might estimate that the probability of seeing a piece of cardboard in the work cell is given by\n",
-    "\n",
-    "$$P(\\mathrm{cardboard}) \\approx 200/1000 = 0.2$$\n",
-    "\n",
+    "\\begin{equation}\n",
+    "P(\\mathrm{cardboard}) \\approx 200/1000 = 0.2\n",
+    "\\end{equation}\n",
     "Using the same logic, we can do the same for all categories, yielding:\n",
     "\n",
     "| *Category (C)* | *P(C)* |\n",
@@ -285,8 +286,8 @@
     }
    },
    "source": [
-    "```{index} prior",
-    "```",
+    "```{index} prior\n",
+    "```\n",
     "We call this type of probabilistic knowledge about the state of the world, in the absence of any other information, a **prior**, because it represents our belief *before* any evidence (e.g., sensor data) has been acquired.\n",
     "\n",
     "\n",
@@ -616,20 +617,18 @@
     "is sufficient to know that a discrete random variable takes a value from a countable set,\n",
     "each of which is assigned a probability value.\n",
     "For a random variable $X$, the CDF for $X$ is denoted by $F_X$, and is defined as\n",
-    "\n",
-    "$$\n",
+    "\\begin{equation}\n",
     "F_X(\\alpha) = P(X \\leq \\alpha)\n",
-    "$$\n",
+    "\\end{equation}\n",
     "It follows immediately that $0 \\leq F_X(\\alpha) \\leq 1$,\n",
     "since $F_X(\\alpha)$ is itself a probability.\n",
     "In the case of discrete random variables,\n",
     "say $X \\in \\{ x_0, \\dots x_{n-1}\\}$, we can compute the CDF\n",
     "$F_X(\\alpha)$ by summing the probabilities assigned\n",
     "to all $x_i \\leq \\alpha$\n",
-    "\n",
-    "$$\n",
+    "\\begin{equation}\n",
     "F_X(\\alpha) = \\sum_{x_i \\leq \\alpha} P(x_i) = \\sum_{i=0}^{k-1} P(x_i)\n",
-    "$$\n",
+    "\\end{equation}\n",
     "in which the rightmost summation follows if we choose $k$ such that $x_{k-1} \\leq \\alpha < x_k$.\n",
     "The terminology *Cumulative Distribution Function* is due to the fact that $F_X(\\alpha)$\n",
     "is the accumulated probability assigned to all outcomes less than or equal to $\\alpha$,\n",
@@ -938,8 +937,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "```{index} decision tree",
-    "```",
+    "```{index} decision tree\n",
+    "```\n",
     "As you can see, this is a PMF on the variable with id $42$, and it indeed has probabilities (that add up to one) for values `0..2`. Internally, GTSAM *actually* represents a PMF as a small **decision tree**, which you can reveal using `show`:"
    ]
   },

diff --git a/S22_sorter_actions.ipynb b/S22_sorter_actions.ipynb
@@ -161,7 +161,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 3,
    "metadata": {
     "colab": {
      "base_uri": "https://localhost:8080/",
@@ -244,7 +244,7 @@
        "nop                1      1    1            1       1"
       ]
      },
-     "execution_count": 4,
+     "execution_count": 3,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -382,12 +382,13 @@
     "$X \\in \\{ x_1, \\dots , x_n \\}$.\n",
     "The **expected value** of $X$, which we denote by $E[X]$ is defined\n",
     "by\n",
-    "\n",
-    "$$E[X] = \\sum_{i=1}^n x_i p_X(x_i)$$\n",
-    "\n",
+    "\\begin{equation}\n",
+    "E[X] = \\sum_{i=1}^n x_i p_X(x_i)\n",
+    "\\end{equation}\n",
     "For the example above, the expected value of cost, $E[X]$, is given by\n",
-    "\n",
-    "$E[X] = (0 \\times 0.2) + (0 \\times 0.3)  + (5 \\times 0.25) + (10 \\times 0.2)  + (3 \\times 0.05)  = 3.4$\n",
+    "\\begin{equation}\n",
+    "E[X] = (0 \\times 0.2) + (0 \\times 0.3)  + (5 \\times 0.25) + (10 \\times 0.2)  + (3 \\times 0.05)  = 3.4\n",
+    "\\end{equation}\n",
     "\n",
     "Note that we never really *expect* to see the cost $3.4$. The term *expected value* is a technical term,\n",
     "defined by the equation above. The expected value is related to what we would\n",
@@ -409,7 +410,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 4,
    "metadata": {},
    "outputs": [
     {
@@ -433,10 +434,10 @@
        "</div>"
       ],
       "text/plain": [
-       "<gtbook.display.pretty at 0x1042263a0>"
+       "<gtbook.display.pretty at 0x10f987550>"
       ]
      },
-     "execution_count": 5,
+     "execution_count": 4,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -449,7 +450,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 5,
    "metadata": {
     "colab": {
      "base_uri": "https://localhost:8080/"
@@ -464,7 +465,7 @@
        "array([3.2, 0.6, 3.4, 1. ])"
       ]
      },
-     "execution_count": 6,
+     "execution_count": 5,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -501,7 +502,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 6,
    "metadata": {},
    "outputs": [
     {
@@ -532,7 +533,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 7,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -566,16 +567,17 @@
     " For a discrete random variable $X$ with $\\Omega = \\{ x_1 \\dots x_n\\}$,\n",
     " $E[X]$ (also called the mean, and often denoted by $\\mu$)\n",
     " can be computed as above\n",
-    "\n",
-    "$$\\mu = E[X] = \\sum_{i=1}^n x_i p_X(x_i)$$\n",
+    "\\begin{equation}\n",
+    "\\mu = E[X] = \\sum_{i=1}^n x_i p_X(x_i)\n",
+    "\\end{equation}\n",
     "\n",
     "The variance of a *random variable*,\n",
     "typically denoted by $\\sigma^2$, is merely the expected value of the squared difference between\n",
     "the random variable $X$ and the mean. The variance is also a property of probability distributions, and it can\n",
     "be computed as\n",
-    "\n",
-    "$$\\sigma^2 = E[(X-\\mu)^2] = \\sum_{i=1}^n p_X(x_i) (x_i - \\mu)^2 $$\n",
-    "\n",
+    "\\begin{equation}\n",
+    "\\sigma^2 = E[(X-\\mu)^2] = \\sum_{i=1}^n p_X(x_i) (x_i - \\mu)^2\n",
+    "\\end{equation}\n",
     "Note that the expressions for $\\mu$ and $\\sigma^2$ depend only on the probability distribution (in this case,\n",
     "the pmf $p_X$) and the values taken by the random variable $X$."
    ]
@@ -589,13 +591,13 @@
     "A **statistic** is any function of data (including the identity function). \n",
     "Consider a set of measurements $\\{ z_1, \\dots z_N \\}$.\n",
     "The average of these values, often denoted by $\\bar{z}$, is a statistic, and it can be computed as\n",
-    " \n",
-    "$$\\bar{z} = \\frac{1}{N} \\sum_{i=1}^{N} z_i$$\n",
-    "\n",
+    "\\begin{equation}\n",
+    "\\bar{z} = \\frac{1}{N} \\sum_{i=1}^{N} z_i\n",
+    "\\end{equation}\n",
     "Likewise, the variance of the *data*, often denoted by $\\hat{\\sigma}^2$ can be computed as\n",
-    "\n",
-    "$$\\hat{\\sigma}^2 = \\frac{1}{N-1} \\sum_{i=1}^{N} (z_i- \\bar{z})^2$$\n",
-    " \n",
+    "\\begin{equation}\n",
+    "\\hat{\\sigma}^2 = \\frac{1}{N-1} \\sum_{i=1}^{N} (z_i- \\bar{z})^2\n",
+    "\\end{equation}\n",
     "Note that the definitions of $\\bar{z}$ and $\\hat{\\sigma}^2$ depend *only* on the data itself.\n",
     "\n",
     "Certain similarities are immediately obvious between these two sets of definitions.\n",
@@ -609,9 +611,9 @@
     "This property can be written formally as the **weak law of large numbers**, which\n",
     "states that for any $\\epsilon > 0$, if $\\bar{z}_N$ denotes the average of a data set of size $N$,\n",
     "then\n",
-    "\n",
-    "$$\\lim_{N \\rightarrow \\infty} P( \\mid \\bar{z}_N - \\mu \\mid < \\epsilon ) = 1 $$\n",
-    "\n",
+    "\\begin{equation}\n",
+    "\\lim_{N \\rightarrow \\infty} P( \\mid \\bar{z}_N - \\mu \\mid < \\epsilon ) = 1\n",
+    "\\end{equation}\n",
     "i.e., the average of $N$ data points will become arbitrarily close to $\\mu$ as $N$ becomes\n",
     "large. This occurs with probability one, a nuance that we will not discuss here.\n",
     "This is one explanation for why simulation by sampling works, and why the results tend\n",
@@ -642,11 +644,8 @@
    "name": "S22_sorter_actions.ipynb",
    "provenance": []
   },
-  "interpreter": {
-   "hash": "c6e4e9f98eb68ad3b7c296f83d20e6de614cb42e90992a65aa266555a3137d0d"
-  },
   "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
+   "display_name": "gtbook",
    "language": "python",
    "name": "python3"
   },
@@ -660,7 +659,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.12"
+   "version": "3.9.19"
   },
   "latex_metadata": {
    "affiliation": "Georgia Institute of Technology",