diff --git a/TODO_Review.md b/TODO_Review.md index d7fb077..32ed9f7 100644 --- a/TODO_Review.md +++ b/TODO_Review.md @@ -18,7 +18,7 @@ - [ ] Grammarly verification - [x] \include{chapters/preface-en} - [x] \include{chapters/preface} - - [ ] \include{chapters/whatSlam} + - [x] \include{chapters/whatSlam} - [ ] \include{chapters/rigidBody} - [ ] \include{chapters/lieGroup} - [ ] \include{chapters/cameraModel} diff --git a/chapters/cameraModel.tex b/chapters/cameraModel.tex index fe996c3..9bc484c 100644 --- a/chapters/cameraModel.tex +++ b/chapters/cameraModel.tex @@ -282,16 +282,16 @@ \subsubsection{Install OpenCV} Since we use a newer version of OpenCV, we must install it from the source code. First, you can adjust some compilation options to match the programming environment (for example, whether you need GPU acceleration, etc.); furthermore, from the source code installation we can use some additional functions. OpenCV currently maintains two major versions, divided into OpenCV 2.4 series and OpenCV 3 series \footnote{In 2020 we can also use version 4.0 or higher.}. This book uses the OpenCV \textbf {3} series. -Because the OpenCV project is relatively large, it will not be placed under 3rdparty in this book. Readers can download it from ~ \url{http://opencv.org/downloads.html}~ and select the OpenCV for Linux version. You will get a compressed package like opencv-3.1.0.zip. Unzip it to any directory, we found that OpenCV is also a cmake project. +Because the OpenCV project is relatively large, it will not be placed under 3rdparty in this book. Readers can download it from ~ \url{http://opencv.org/downloads.html}~ and select the OpenCV for Linux version. You will get a compressed package like opencv-3.1.0.zip. Unzip it to any directory, we found that OpenCV is also a CMake project. Before compiling, first install the dependencies of OpenCV: \begin{lstlisting}[language=sh, caption=Terminal input:] sudo apt-get install build-essential libgtk2.0-dev libvtk5-dev libjpeg-dev libtiff4-dev libjasper-dev libopenexr-dev libtbb-dev \end{lstlisting} -In fact, OpenCV has many dependencies, and the lack of certain compilation items will affect some of its functions (but we will not use all the functions). OpenCV will check whether the dependencies will be installed during cmake and adjust its own functions. If you have a GPU on your computer and the relevant dependencies are installed, OpenCV will also enable GPU acceleration. But for this book, the above dependencies are sufficient. +In fact, OpenCV has many dependencies, and the lack of certain compilation items will affect some of its functions (but we will not use all the functions). OpenCV will check whether the dependencies will be installed during CMake and adjust its own functions. If you have a GPU on your computer and the relevant dependencies are installed, OpenCV will also enable GPU acceleration. But for this book, the above dependencies are sufficient. -Subsequent compilation and installation are the same as ordinary cmake projects. After make, please call ``sudo make install'' to install OpenCV on your machine (instead of just compiling it). Depending on the machine configuration, this compilation process may take from 20 minutes to an hour. If your CPU is powerful, you can use commands like ``make -j4'' to call multiple threads to compile (the parameter after -j is the number of threads used). After installation, OpenCV is stored in the /usr/local directory by default. You can look for the installation location of OpenCV header files and library files to see where they are. In addition, if you have installed the OpenCV 2 series before, it is recommended that you install OpenCV 3 elsewhere (think about how this should be done). +Subsequent compilation and installation are the same as ordinary CMake projects. After make, please call ``sudo make install'' to install OpenCV on your machine (instead of just compiling it). Depending on the machine configuration, this compilation process may take from 20 minutes to an hour. If your CPU is powerful, you can use commands like ``make -j4'' to call multiple threads to compile (the parameter after -j is the number of threads used). After installation, OpenCV is stored in the /usr/local directory by default. You can look for the installation location of OpenCV header files and library files to see where they are. In addition, if you have installed the OpenCV 2 series before, it is recommended that you install OpenCV 3 elsewhere (think about how this should be done). \subsection{Basic OpenCV Images Operations} Now let's go through the basic image operations in OpenCV from a simple example. @@ -370,7 +370,7 @@ \subsection{Basic OpenCV Images Operations} } \end{lstlisting} -In this example, we demonstrated the following operations: image reading, displaying, pixel vising, copying, assignment, etc. When compiling the program, you need to add the OpenCV header file in your CMakeLists.txt, and then link the program to the OpenVC's library. At the same time, due to the use of C ++ 11 standards (such as nullptr and chrono), you also need to set up the c++ standard in compiler flag: +In this example, we demonstrated the following operations: image reading, displaying, pixel vising, copying, assignment, etc. When compiling the program, you need to add the OpenCV header file in your ``CMakeLists.txt'', and then link the program to the OpenVC's library. At the same time, due to the use of C ++ 11 standards (such as nullptr and chrono), you also need to set up the c++ standard in compiler flag: \begin{lstlisting}[language=Python,caption=slambook/ch5/imageBasics/CMakeLists.txt] # use c++11 standard @@ -397,7 +397,7 @@ \subsection{Basic OpenCV Images Operations} \item In line 10 \textasciitilde to 18, use the cv::imread function to read the image, and display the image and basic information. \item In line 35 \textasciitilde 46, we iterate over all pixels in the image and calculates the time spent in the entire loop. Please note that the pixel visiting method is not unique, and the method given by the example is not the most efficient way. OpenCV provides an iterator of cv::Mat, you can traverse the pixels of the image through the iterator. Or, cv::Mat::data provides a raw pointer to the beginning of the image data, you can directly calculate the offset through this pointer, and then get the actual memory location of the pixel. The method used in the example is to facilitate the reader to understand the structure of the image. -On the author's machine (virtual machine), it takes about 12.74ms to traverse this image. You can compare the speed on your machine. However, we are using the default debug mode of cmake, which is much slower than the release mode. +On the author's machine (virtual machine), it takes about 12.74ms to traverse this image. You can compare the speed on your machine. However, we are using the default debug mode of CMake, which is much slower than the release mode. \item OpenCV provides many functions for manipulating images, we will not list them one by one, otherwise this book will become an OpenCV operation manual. The example shows the most common things like image reading and displaying, as well as the deep copy function in cv::Mat. During the programming process, readers will also encounter operations such as image rotation and interpolation. At this time, you should refer to the corresponding documentation of the function to understand their principles and usage. \end{enumerate} @@ -519,7 +519,7 @@ \subsection{RGB-D Vision} \label{sec:join-point-cloud} Finally, we demonstrate an example of RGB-D vision. The convenience of RGB-D cameras is that they can obtain pixel depth information through physical methods. If the internal and external parameters of the camera are known, we can calculate the position of any pixel in the world coordinate system, thereby creating a point cloud map. Now let's demonstrate how to do it. -We have prepared 5 pairs of images, located in the slambook2/ch5/rgbd folder. There are 5 RGB images from 1.png to 5.png under the color/ directory, and 5 corresponding depth images under the depth/. At the same time, the pose.txt file gives the camera poses of the 5 images (in the form of $ \mathbf{T}_\mathrm{wc} $). The form of the pose record is the same as before, with the translation vector plus a rotation quaternion: +We have prepared 5 pairs of images, located in the slambook2/ch5/rgbd folder. There are 5 RGB images from 1.png to 5.png under the color/ directory, and 5 corresponding depth images under the depth/. At the same time, the ``pose.txt'' file gives the camera poses of the 5 images (in the form of $ \mathbf{T}_\mathrm{wc} $). The form of the pose record is the same as before, with the translation vector plus a rotation quaternion: \[ [x, y, z, q_x, q_y, q_z, q_w], \] diff --git a/chapters/lieGroup.tex b/chapters/lieGroup.tex index aa058dd..d0e3027 100644 --- a/chapters/lieGroup.tex +++ b/chapters/lieGroup.tex @@ -478,7 +478,7 @@ \subsection{Derivative on $\mathrm{SE}(3)$} \section{Practice: Sophus} \subsection{Basic Usage of Sophus} -We have introduced the basic knowledge of Lie algebra, and now it is time to consolidate what we have learned through practical exercises. Let's discuss how to manipulate Lie algebra in a program. In Lecture 3, we saw that Eigen provided geometry modules, but did not provide support for Lie algebra. A better Lie algebra library is the Sophus library maintained by Strasdat (\url{https://github.com/strasdat/Sophus})\footnote{Sophus Lie first proposed the Lie algebra. The library is named after him.}. The Sophus library supports $\mathrm{SO}(3)$ and $\mathrm{SE}(3)$, which are mainly discussed in this chapter. In addition, it also contains two-dimensional motion $\mathrm{SO}(2), \mathrm{SE} (2) $ and the similar transformation of $\mathrm{Sim}(3)$. It is developed directly on top of Eigen and we don't need to install additional dependencies. Readers can get Sophus directly from GitHub, or the Sophus source code is also available in our book's code directory ``slambook2/3rdparty''. For historical reasons, earlier versions of Sophus only provided double-precision Lie group/Lie algebra classes. Subsequent versions have been rewritten as template classes, so that different precision of Lie group/Lie algebra can be used in the Sophus from the template class, but at the same time it increases the difficulty of use. In the second edition of this book, we use the Sophus library of \textbf{with templates}. The Sophus provided in the 3rdparty of this book is the \textbf{template} version, which should have been copied to your computer when you downloaded the code for this book. Sophus itself is also a cmake project. Presumably you already know how to compile the cmake project, so I won't go into details here. The Sophus library only needs to be compiled, no need to install it. +We have introduced the basic knowledge of Lie algebra, and now it is time to consolidate what we have learned through practical exercises. Let's discuss how to manipulate Lie algebra in a program. In Lecture 3, we saw that Eigen provided geometry modules, but did not provide support for Lie algebra. A better Lie algebra library is the Sophus library maintained by Strasdat (\url{https://github.com/strasdat/Sophus})\footnote{Sophus Lie first proposed the Lie algebra. The library is named after him.}. The Sophus library supports $\mathrm{SO}(3)$ and $\mathrm{SE}(3)$, which are mainly discussed in this chapter. In addition, it also contains two-dimensional motion $\mathrm{SO}(2), \mathrm{SE} (2) $ and the similar transformation of $\mathrm{Sim}(3)$. It is developed directly on top of Eigen and we don't need to install additional dependencies. Readers can get Sophus directly from GitHub, or the Sophus source code is also available in our book's code directory ``slambook2/3rdparty''. For historical reasons, earlier versions of Sophus only provided double-precision Lie group/Lie algebra classes. Subsequent versions have been rewritten as template classes, so that different precision of Lie group/Lie algebra can be used in the Sophus from the template class, but at the same time it increases the difficulty of use. In the second edition of this book, we use the Sophus library of \textbf{with templates}. The Sophus provided in the 3rdparty of this book is the \textbf{template} version, which should have been copied to your computer when you downloaded the code for this book. Sophus itself is also a CMake project. Presumably you already know how to compile the CMake project, so I won't go into details here. The Sophus library only needs to be compiled, no need to install it. Let's demonstrate the SO(3) and SE(3) operations in the Sophus library: @@ -545,17 +545,17 @@ \subsection{Basic Usage of Sophus} } \end{lstlisting} -The demo is divided into two parts. The first half introduces the operation on $\mathrm{SO}(3)$, and the second half is $\mathrm{SE}(3)$. We demonstrate how to construct $\mathrm{SO}(3), \mathrm{SE}(3)$ objects, exponentially, logarithmically map them, and update the lie group elements when we know the update amount. If the reader has a good understanding of the content of this lecture, then this program should not be difficult for you. In order to compile it, add the following lines to CMakeLists.txt: +The demo is divided into two parts. The first half introduces the operation on $\mathrm{SO}(3)$, and the second half is $\mathrm{SE}(3)$. We demonstrate how to construct $\mathrm{SO}(3), \mathrm{SE}(3)$ objects, exponentially, logarithmically map them, and update the lie group elements when we know the update amount. If the reader has a good understanding of the content of this lecture, then this program should not be difficult for you. In order to compile it, add the following lines to ``CMakeLists.txt'': \begin{lstlisting}[caption=slambook/ch4/useSophus/CMakeLists.txt] -# we use find_package to make cmake find sophus +# we use find_package to make CMake find sophus find_package( Sophus REQUIRED ) include_directories( ${Sophus_INCLUDE_DIRS} ) # sohpus is header only add_executable( useSophus useSophus.cpp ) \end{lstlisting} -The find\_package is a command provided by cmake to find the header and library files of a library. If cmake can find it, it will provide the variables for the directory where the header and library files are located. In the example of Sophus, it is Sophus\_INCLUDE\_DIRS. The template-based Sophus library, like Eigen, contains only header files and no source files. Based on them, we can introduce the Sophus library into our own cmake project. Readers are asked to see the output of this program on their own, which is consistent with our previous derivation. +The \textit{find\_package} is a command provided by CMake to find the header and library files of a library. If CMake can find it, it will provide the variables for the directory where the header and library files are located. In the example of Sophus, it is Sophus\_INCLUDE\_DIRS. The template-based Sophus library, like Eigen, contains only header files and no source files. Based on them, we can introduce the Sophus library into our own CMake project. Readers are asked to see the output of this program on their own, which is consistent with our previous derivation. \subsection{Example: Evaluating the trajectory} In practical engineering, we often need to evaluate the difference between the estimated trajectory of an algorithm and the real trajectory to evaluate the accuracy of the algorithm. The real (or ground-truth) trajectory is often obtained by some higher precision systems, and the estimated one is calculated by the algorithm to be evaluated. In the last lecture we demonstrated how to display a trajectory stored in a file. In this section we will consider how to calculate the error of two trajectories. Consider an estimated trajectory $\mathbf{T}_{\mathrm{esti}, i}$ and the real trajectory $\mathbf{T}_{\mathrm{gt},i}$, where $i=1,\cdots, N$; then we can define some error indicators to describe the difference between them. @@ -579,7 +579,7 @@ \subsection{Example: Evaluating the trajectory} \mathrm{RPE}_{\mathrm{trans}} = \sqrt{ \frac{1}{N-\Delta t} \sum_{i=1}^{N-\Delta t} \| \mathrm{trans } \left( \left(\mathbf{T}_{\mathrm{gt},i}^{-1} \mathbf{T}_{\mathrm{gt},i+\Delta t} )\right)^ {-1} \left(\mathbf{T}_{\mathrm{esti},i}^{-1} \mathbf{T}_{\mathrm{esti},i+\Delta t}\right)\right ) \|_2^2}. \end{equation} -This part of the calculation is easy to implement with the Sophus library. Below we demonstrate the calculation of the absolute trajectory error. In this example, we have two trajectories: groundtruth.txt and estimated.txt. The following code will read the two trajectories, calculate the error, and display it in a 3D window. For the sake of brevity, the code for the trajectory plotting has been omitted, as we have done similar work in the previous section. +This part of the calculation is easy to implement with the Sophus library. Below we demonstrate the calculation of the absolute trajectory error. In this example, we have two trajectories: ``groundtruth.txt'' and ``estimated.txt''. The following code will read the two trajectories, calculate the error, and display it in a 3D window. For the sake of brevity, the code for the trajectory plotting has been omitted, as we have done similar work in the previous section. \begin{lstlisting}[language=c++,caption=slambook/ch4/example/trajectoryError.cpp] #include @@ -744,5 +744,5 @@ \section*{Exercises} \end{array}} \right]. \end{equation} \item Follow the derivation of the left perturbation and derives the derivatives of $\mathrm{SO}(3)$ and $\mathrm{SE}(3)$ under the right perturbation. - \item Search how cmake's find\_package works. What optional parameters does it have? What are the prerequisites for cmake to find a library? + \item Search how CMake's \textit{find\_package} works. What optional parameters does it have? What are the prerequisites for CMake to find a library? \end{enumerate} \ No newline at end of file diff --git a/chapters/mapping.tex b/chapters/mapping.tex index 1ef0dc6..dcd87e5 100644 --- a/chapters/mapping.tex +++ b/chapters/mapping.tex @@ -858,7 +858,7 @@ \subsection{八叉树地图} 有了对数概率,我们就可以根据RGB-D数据更新整个八叉树地图了。假设我们在RGB-D图像中观测到某个像素带有深度$d$,这说明了一件事:我们\textbf{在深度值对应的空间点上观察到了一个占据数据,并且,从相机光心出发到这个点的线段上,应该是没有物体的}(否则会被遮挡)。利用这个信息,可以很好地对八叉树地图进行更新,并且能处理运动的结构。 \subsection{实践:八叉树地图} -下面通过程序演示一下octomap的建图过程。首先,请读者安装octomap库:\url{https://github.com/OctoMap/octomap}。Octomap库主要包含octomap地图与octovis(一个可视化程序),二者都是cmake工程。其主要依赖doxygen,可通过如下命令安装: +下面通过程序演示一下octomap的建图过程。首先,请读者安装octomap库:\url{https://github.com/OctoMap/octomap}。Octomap库主要包含octomap地图与octovis(一个可视化程序),二者都是CMake工程。其主要依赖doxygen,可通过如下命令安装: \begin{lstlisting} sudo apt-get install doxygen \end{lstlisting} diff --git a/chapters/preface-en.tex b/chapters/preface-en.tex index 187e2f6..78c4aab 100644 --- a/chapters/preface-en.tex +++ b/chapters/preface-en.tex @@ -1,5 +1,5 @@ \chapter*{Preface for English Version} -A lot of friends at Github asked me about this English version. I'm really sorry it takes so long to do the translation, and I'm glad to make it publicly available to help the readers. I encountered some issues with math equation on the web pages. Since the book is originally written in LaTeX, I'm going to release the LaTeX source along with the compiled pdf. You can directly access the pdf version for the English book, and probably the publishing house is going to help me do the paper version. +A lot of friends at Github asked me about this English version. I'm really sorry it takes so long to do the translation, and I'm glad to make it publicly available to help the readers. I encountered some issues with the math equations on the web pages. Since the book is originally written in LaTeX, I'm going to release the LaTeX source along with the compiled pdf. You can directly access the pdf version for the English book, and probably the publishing house is going to help me do the paper version. As I'm not a native English speaker, the translation work is basically based on Google translation and some afterward modifications. If you think the quality of translation can be improved, and you are willing to do this, please contact me or send an issue on Github. Any help will be welcome! diff --git a/chapters/preface.tex b/chapters/preface.tex index 01de4d1..549ecff 100644 --- a/chapters/preface.tex +++ b/chapters/preface.tex @@ -6,25 +6,25 @@ \section{What is this book about?} So, what is SLAM? -SLAM stands for \textbf{S}imultaneous \textbf{L}ocalization \textbf{a}nd \textbf{M}apping. It usually refers to a robot or a moving rigid body, equipped with a specific \textbf{sensor}, estimates its own \textbf{motion} and builds a \textbf{model} (certain kinds of description) of the surrounding environment, without a \textit{priori} information\cite{Davison2007}. If the sensor referred here is mainly a camera, it is called ``\textbf{Visual SLAM}''. +SLAM stands for \textbf{S}imultaneous \textbf{L}ocalization \textbf{a}nd \textbf{M}apping. It usually refers to a robot or a moving rigid body, equipped with a specific \textbf{sensor}, estimates its own \textbf{motion} and builds a \textbf{model} (certain kinds of description) of the surrounding environment, without a \textit{priori} information\cite{Davison2007}. If the sensor referred here is mainly a camera, it is called \textbf{Visual SLAM}. -Visual SLAM is the subject of this book. We deliberately put a long definition into one single sentence, so that the readers can have a clear concept. First of all, SLAM aims at solving the ``positioning'' and ``map building'' issues at the same time. In other words, it is a problem of how to estimate the location of a sensor itself, while estimating the model of the environment. So how to achieve it? This requires a good understanding of sensor information. A sensor can observe the external world in a certain form, but the specific approaches for utilizing such observations are usually different. And, why is this problem worth spending an entire book to discuss? Simply because it is difficult, especially if we want to do SLAM in \textbf{real time} and \textbf{without any a priory knowledge}. When we talk about visual SLAM, we need to estimate the trajectory and map based on a set of continuous images (which form a video). +Visual SLAM is the subject of this book. We deliberately put a long definition into one single sentence, so that the readers can have a clear concept. First of all, SLAM aims at solving the \textit{positioning} and \textit{map building} issues at the same time. In other words, it is a problem of how to estimate the location of a sensor itself, while estimating the model of the environment. So how to achieve it? This requires a good understanding of sensor information. A sensor can observe the external world in a certain form, but the specific approaches for utilizing such observations are usually different. And, why is this problem worth spending an entire book to discuss? Simply because it is difficult, especially if we want to do SLAM in \textbf{real time} and \textbf{without any a priory knowledge}. When we talk about visual SLAM, we need to estimate the trajectory and map based on a set of continuous images (which form a video). This seems to be quite intuitive. When we human beings enter an unfamiliar environment, aren't we doing exactly the same thing? So, the question is whether we can write programs and make computers do so. At the birth of computer vision, people imagined that one day computers could act like humans, watching and observing the world, and understanding the surrounding environment. The ability to explore unknown areas is a wonderful and romantic dream, attracting numerous researchers striving on this problem day and night~\cite{Hartley2003}. We thought that this would not be that difficult, but the progress turned out to be not as smooth as expected. Flowers, trees, insects, birds, and animals, are recorded so differently in computers: they are simply matrices consisted of numbers. To make computers understand the contents of images is as difficult as making us humans understand those blocks of numbers. We didn't even know how we understand images, nor do we know how to make computers do so. However, after decades of struggling, we finally started to see signs of success - through Artificial Intelligence (AI) and Machine Learning (ML) technologies, which gradually enable computers to identify objects, faces, voices, texts, although in a way (probabilistic modeling) that is still so different from us. On the other hand, after nearly three decades of development in SLAM, our cameras begin to capture their movements and know their positions, although there is still a huge gap between the capability of computers and humans. Researchers have successfully built a variety of real-time SLAM systems. Some of them can efficiently track their locations, and others can even do three-dimensional reconstruction in real-time. -This is really difficult, but we have made remarkable progress. What's more exciting is that, in recent years, we have seen the emergence of a large number of SLAM-related applications. The sensor location could be very useful in many areas: indoor sweeping machines and mobile robots, self-driving cars, Unmanned Aerial Vehicles (UAVs) in the air, Virtual Reality (VR), and Augmented Reality (AR). SLAM is so important. Without it, the sweeping machine cannot maneuver in a room autonomously, but wandering blindly instead; domestic robots can not follow instructions to reach a certain room accurately; Virtual Reality will always be limited within a prepared space. If none of these innovations could be seen in real life, what a pity it would be. +This is really difficult, but we have made remarkable progress. What's more exciting is that, in recent years, we have seen the emergence of a large number of SLAM-related applications. The sensor location could be very useful in many areas: indoor sweeping machines and mobile robots, self-driving cars, Unmanned Aerial Vehicles (UAVs), Virtual Reality (VR), and Augmented Reality (AR). SLAM is so important. Without it, the sweeping machine cannot maneuver in a room autonomously, but wandering blindly instead; domestic robots can not follow instructions to reach a certain room accurately; Virtual Reality will always be limited within a prepared space. If none of these innovations could be seen in real life, what a pity it would be. Today's researchers and developers are increasingly aware of the importance of SLAM technology. SLAM has over 30 years of research history, and it has been a hot topic in both robotics and computer vision communities. Since the 21st century, visual SLAM technology has undergone a significant change and breakthrough in both theory and practice and is gradually moving from laboratories into the real-world. At the same time, we regretfully find that, at least in the Chinese language, SLAM-related papers and books are still very scarce, making many beginners of this area unable to get started smoothly. Although the theoretical framework of SLAM has basically become mature, to implement a complete SLAM system is still very challenging and requires a high level of technical expertise. Researchers new to the area have to spend a long time learning a significant amount of scattered knowledge and often have to go through several detours to get close to the real core. This book systematically explains the visual SLAM technology. We hope that it will (at least in part) fill the current gap. We will detail SLAM's theoretical background, system architecture, and the various mainstream modules. At the same time, we place great emphasis on practice: all the important algorithms introduced in this book will be provided with runnable code that can be tested by yourself so that readers can reach a deeper understanding. Visual SLAM, after all, is a technology for real-applications. Although the mathematical theory can be beautiful, if you are not able to convert it into lines of code, it will be like a castle in the air, which brings little practical impact. We believe that practice verifies true knowledge, and practice tests true passion. Only after getting your hands dirty with the algorithms, you can truly understand SLAM and claim that you have fallen in love with SLAM research. -Since its inception in 1986~\cite{Smith1986}, SLAM has been a hot research topic in robotics. It is very difficult to give a complete introduction to all the algorithms and their variants in the SLAM history, and we consider it as unnecessary as well. This book will be firstly introducing the background knowledge, such as projective geometry, computer vision, state estimation theory, Lie Group and Lie algebra, etc. On top of that, we will be showing the trunk of the SLAM tree, and omitting those complicated and oddly-shaped leaves. We think this is effective. If the reader can master the essence of the trunk, they have already gained the ability to explore the details of the research frontier. So we aim to help SLAM beginners quickly grow into qualified researchers and developers. On the other hand, even if you are already an experienced SLAM researcher, this book may still reveal areas that you are unfamiliar with, and may provide you with new insights. +Since its inception in 1986~\cite{Smith1986}, SLAM has been a hot research topic in robotics. It is very difficult to give a complete introduction to all the algorithms and their variants in the SLAM history, and we consider it as unnecessary as well. This book will first introduce the background knowledge, such as projective geometry, computer vision, state estimation theory, Lie Group and Lie algebra, etc. On top of that, we will be showing the trunk of the SLAM tree, and omitting those complicated and oddly-shaped leaves. We think this is effective. If the reader can master the essence of the trunk, they have already gained the ability to explore the details of the research frontier. So we aim to help SLAM beginners quickly grow into qualified researchers and developers. On the other hand, even if you are already an experienced SLAM researcher, this book may still reveal areas that you are unfamiliar with, and may provide you with new insights. There have already been a few SLAM-related books around, such as ``Probabilistic Robotics''~\cite{Thrun2005}, ``Multiple View Geometry in Computer Vision''~\cite{Hartley2003}, ``State Estimation for Robotics: A Matrix-Lie-Group Approach''\cite{Barfoot2016}, etc. They provide rich content, comprehensive discussions and rigorous derivations, and therefore are the most popular textbooks among SLAM researchers. However, there are two important issues: Firstly, the purpose of these books is often to introduce the fundamental mathematical theory, with SLAM being only one of its applications. Therefore, they cannot be considered as specifically visual SLAM focused. Secondly, they place great emphasis on mathematical theory but are relatively weak in programming. This makes readers still fumbling when trying to apply the knowledge they learn from the books. Our belief is: only after coding, debugging, and tweaking algorithms and parameters with his own hands, one can claim a real understanding of a problem. -In this book, we will be introducing the history, theory, algorithms, and research status in SLAM, and explaining a complete SLAM system by decomposing it into several modules: visual odometry, back-end optimization, map building, and loop closure detection. We will be accompanying the readers' step by step to implement the core algorithms of each module, explore why they are effective, under what situations they are ill-conditioned, and guide them through running the code on their machines. You will be exposed to the critical mathematical theory and programming knowledge and will use various libraries including Eigen, OpenCV, PCL, g2o, and Ceres, and master their use in the Linux operating system. +In this book, we will be introducing the history, theory, algorithms, and research status in SLAM, and explaining a complete SLAM system by decomposing it into several modules: \textit{visual odometry}, \textit{back-end optimization}, \textit{map building}, and \textit{loop closure detection}. We will be accompanying the readers' step by step to implement the core algorithms of each module, explore why they are effective, under what situations they are ill-conditioned, and guide them through running the code on their machines. You will be exposed to the critical mathematical theory and programming knowledge and will use various libraries including \textit{Eigen}, \textit{OpenCV}, \textit{PCL}, \textit{g2o}, and \textit{Ceres}, and master their use in the Linux operating system. Well, enough talking, wish you a pleasant journey! @@ -68,7 +68,7 @@ \section{Source code} {\hfill\url{https://github.com/gaoxiang12/slambook2}\hfill} -Note the slambook2 refers to the second version in which I added a lot of extra experiments. +Note the slambook2 refers to the second version in which we added a lot of extra experiments. It is strongly recommended that readers download them for viewing at any time. The code is divided by chapters, for example, the contents of the 7th lecture will be placed in folder ``ch7''. In addition, some of the small libraries used in the book can be found in the ``3rd party'' folder as compressed packages. For large and medium-sized libraries like OpenCV, we will introduce their installation methods when they first appear. If you have any questions regarding the code, click the ``Issues'' button on GitHub to submit. If there is indeed a problem with the code, we will make changes in a timely manner. Even if your understanding is biased, we will still reply as much as possible. If you are not accustomed to using Git, you can also click the button on the right which contains the word ``download'' to download a zipped file to your local drive. @@ -117,7 +117,7 @@ \section{Style} \item Due to typographical reasons, the code shown in the book may be slightly different from the code hosted on GitHub. In that case please use the code on GitHub. - \item For each of the libraries we use, it will be explained in details when first appearing, but not repeated in the follow-up. Therefore, it is recommended that readers read this book in order. + \item For each of the libraries we use, it will be explained in detail when first appearing, but not repeated in the follow-up. Therefore, it is recommended that readers read this book in order. \item An abstract will be presented at the beginning of each lecture. A summary and some exercises will be given at the end. The cited references are listed at the end of the book. @@ -131,7 +131,7 @@ \section{Style} \end{enumerate} \section{Acknowledgments} -The online English version of this book is currently publicly available and open-source. The Chinese version +The online English version of this book is currently publicly available and open-source. % \textcolor{red}{The Chinese version} \section{Exercises (self-test questions)} \begin{enumerate} @@ -153,4 +153,4 @@ \section{Exercises (self-test questions)} \item *Spend an hour learning Vim, you will be using it sooner or later. You can type ``vimtutor'' into a terminal and read through its contents. We do not require you to operate it very skillfully, as long as you can use it to edit the code in the process of learning this book. Do not waste time on its plugins, for now, do not try to turn Vim into an IDE, we will only use it for text editing in this book. -\end{enumerate} \ No newline at end of file +\end{enumerate} diff --git a/chapters/rigidBody.tex b/chapters/rigidBody.tex index 989a861..bb838eb 100644 --- a/chapters/rigidBody.tex +++ b/chapters/rigidBody.tex @@ -347,7 +347,7 @@ \section{Practice: Using Eigen} } \end{lstlisting} -This example demonstrates the basic operations and operations of the Eigen matrix. To compile it, you need to specify the header file directory of Eigen in CMakeLists.txt: +This example demonstrates the basic operations and operations of the Eigen matrix. To compile it, you need to specify the header file directory of Eigen in ``CMakeLists.txt'': \begin{lstlisting}[caption=slambook2/ch3/useEigen/CMakeLists.txt] # Add header file include_directories( "/usr/include/eigen3" ) @@ -834,7 +834,7 @@ \subsection{Data demonstration of the Eigen geometry module} \item Perspective transformation ( $ 4 \times 4 $ ): Eigen::Projective3d. \end{itemize} -This program can be compiled by referring to the corresponding CMakeLists in the code. In this program, I demonstrate how to use the rotation matrix, rotation vectors (AngleAxis), Euler angles, and quaternions in Eigen. We use these rotations to rotate a vector $ \mathbf {v} $ and find that the result is the same. At the same time, it also demonstrates how to convert these expressions in the program. Readers who want to learn more about Eigen's geometry modules can refer to \url {http://eigen.tuxfamily.org/dox/group__TutorialGeometry.html}. +This program can be compiled by referring to the corresponding ``CMakeLists.txt'' in the code. In this program, I demonstrate how to use the rotation matrix, rotation vectors (AngleAxis), Euler angles, and quaternions in Eigen. We use these rotations to rotate a vector $ \mathbf {v} $ and find that the result is the same. At the same time, it also demonstrates how to convert these expressions in the program. Readers who want to learn more about Eigen's geometry modules can refer to \url {http://eigen.tuxfamily.org/dox/group__TutorialGeometry.html}. Note that the \textbf {program code has some subtle differences from the mathematical representation}. For example, by operator overloading in C++, quaternions and three-dimensional vectors can directly be multiplied, but mathematically, the vector needs to be converted into a imaginary quaternion like we talked in the last section, and then quaternion multiplication is used for calculation. The same applies to the transformation matrix multiplying with a three-dimensional vector. In general, the usage in the program is more flexible than the mathematical formula. @@ -996,7 +996,7 @@ \subsection{Displaying Camera Pose} \label{fig:visualizeGeometry} \end{figure} -In addition to displaying the trajectory, we can also display the pose of the camera in the 3D window. In slambook2/ch3/visualizeGeometry, we visualize various expressions of camera poses (see \autoref{fig:visualizeGeometry}). When the reader uses the mouse to move the camera, the box on the left side will display the rotation matrix, translation, Euler angle and quaternion of the camera pose in real time. You can see how the data changes. According to our experience, it is hard to infer the exact rotation from quaternions or matrices. However, although the rotation matrix or transformation matrix is not intuitive, it is not difficult to visually display them. This program uses the Pangolin library as a 3D display library. Please refer to Readme.txt to compile the program. +In addition to displaying the trajectory, we can also display the pose of the camera in the 3D window. In slambook2/ch3/visualizeGeometry, we visualize various expressions of camera poses (see \autoref{fig:visualizeGeometry}). When the reader uses the mouse to move the camera, the box on the left side will display the rotation matrix, translation, Euler angle and quaternion of the camera pose in real time. You can see how the data changes. According to our experience, it is hard to infer the exact rotation from quaternions or matrices. However, although the rotation matrix or transformation matrix is not intuitive, it is not difficult to visually display them. This program uses the Pangolin library as a 3D display library. Please refer to ``Readme.txt'' to compile the program. \section*{Exercises} \begin{enumerate} diff --git a/chapters/whatSlam.tex b/chapters/whatSlam.tex index 1319137..0207a56 100644 --- a/chapters/whatSlam.tex +++ b/chapters/whatSlam.tex @@ -33,7 +33,7 @@ \section{Meet ``Little Carrot''} \begin{enumerate} \item Where am I? - It's about \emph{localization}. - \item What is the surrounding environment like? -It's about \emph{map building}. + \item What is the surrounding environment like? - It's about \emph{map building}. \end{enumerate} \emph{Localization} and \emph{map building}, can be seen as the perception in both inward and outward directions. As a completely autonomous robot, Little Carrot need not only to understand its own \emph{state} (i.e.\ the location), but also the external \emph{environment} (i.e.\ the map). Of course, there are many different approaches to solve these two problems. For example, we can lay guiding rails on the floor of the room, or paste a lot of artificial markers such as QR code pictures on the wall, or mount radio positioning devices on the table. If you are outdoor, you can also install a GNSS receiver (like the one in a cell phone or a car) on the head of Little Carrot. With these devices, can we claim that the positioning problem has been resolved? Let's categorize these sensors (see Fig.~\ref{fig:sensors}) into two classes. @@ -45,13 +45,13 @@ \section{Meet ``Little Carrot''} \label{fig:sensors} \end{figure} -The first class are \emph{non-intrusive} sensors which are completely self-contained inside a robot, such as wheel encoders, cameras, laser scanners, etc. They do not assume a cooperative environment around the robot. The other class are \emph{intrusive} sensors depending on a prepared environment, such as the above-mentioned guiding rails, QR codes, etc. Intrusive sensors can usually locate a robot directly, solving the positioning problem in a simple and effective manner. However, since they require changes on the environment, the scope of usage is often limited to a certain degree. For example, if there is no GPS signal, or guiding rails cannot be laid, what should we do in those cases? +The first class is \emph{non-intrusive} sensors which are completely self-contained inside a robot, such as wheel encoders, cameras, laser scanners, etc. They do not assume a cooperative environment around the robot. The other class is \emph{intrusive} sensors depending on a prepared environment, such as the above-mentioned guiding rails, QR codes, etc. Intrusive sensors can usually locate a robot directly, solving the positioning problem in a simple and effective manner. However, since they require changes in the environment, the scope of usage is often limited to a certain degree. For example, if there is no GPS signal or guiding rails cannot be laid, what should we do in those cases? -We can see that the intrusive sensors place certain \emph{constraints} to the external environment. A localization system based on them can only function properly when those constraints are met in the real world. Otherwise, the localization approach cannot be carried out anymore, like GPS positioning system normally doesn't work well in indoor environments. Therefore, although this type of sensor is simple and reliable, they do not work as a general solution. In contrast, non-intrusive sensors, such as laser scanners, cameras, wheel encoders, Inertial Measurement Units (IMUs), etc., can only observe indirect physical quantities rather than the direct locations. For example, a wheel encoder measures the wheel rotation angle, an IMU measures the angular velocity and the acceleration, a camera or a laser scanner observe the external environment in a certain form like point-clouds and images. We have to apply algorithms to infer positions from these indirect observations. While this sounds like a roundabout tactic, the more obvious benefit is that it does not make any demands on the environment, making it possible for this localization framework to be applied to an unknown environment. Therefore, they are called as self-localization in many research area. +We can see that the intrusive sensors place certain \emph{constraints} to the external environment. A localization system based on them can only function properly when those constraints are met in the real world. Otherwise, the localization approach cannot be carried out anymore, like GPS positioning system normally doesn't work well in indoor environments. Therefore, although this type of sensor is simple and reliable, it does not work as a general solution. In contrast, non-intrusive sensors, such as laser scanners, cameras, wheel encoders, Inertial Measurement Units (IMUs), etc., can only observe indirect physical quantities rather than the direct location. For example, a wheel encoder measures the wheel rotation angle, an IMU measures the angular velocity and the acceleration, a camera or a laser scanner observes the external environment in a certain form like point-clouds and images. We have to apply algorithms to infer positions from these indirect observations. While this sounds like a roundabout tactic, the more obvious benefit is that it does not make any demands on the environment, making it possible for this localization framework to be applied to an unknown environment. Therefore, they are called as \textit{self-localization} in many research area. -Looking back at the SLAM definitions discussed earlier, we emphasized an \emph{unknown environment} in SLAM problems. In theory, we should not presume which environment the Little Carrot will be used (but in reality we will have a rough range, such as indoor or outdoor), which means that we can not assume that the external sensors like GPS can work smoothly. Therefore, the use of portable non-intrusive sensors to achieve SLAM is our main focus. In particular, when talking about visual SLAM, we generally refer to the using of \emph{cameras} to solve the localization and map building problems. +Looking back at the SLAM definitions discussed earlier, we emphasized an \emph{unknown environment} in SLAM problems. In theory, we should not presume which environment the Little Carrot will be used (but in reality we will have a rough range, such as indoor or outdoor), which means that we can not assume that external sensors like GPS can work smoothly. Therefore, the use of portable non-intrusive sensors to achieve SLAM is our main focus. In particular, when talking about visual SLAM, we generally refer to the using of \emph{cameras} to solve the localization and map building problems. -Visual SLAM is the main subject of this book, so we are particularly interested in what the Little Carrot's eyes can do. The cameras used in SLAM are different from the commonly seen Single Lens Reflex (SLR) cameras. It is often much simpler and does not carry an expensive lens. It shoots at the surrounding environment at a certain rate, forming a continuous video stream. An ordinary camera can capture images at 30 frames per second, while high-speed cameras can do faster. The camera can be roughly divided into three categories: Monocular, Stereo and RGB-D, as shown by the following figure~\ref{fig:cameras}. Intuitively, a monocular camera has only one camera, a stereo camera has two. The principle of an RGB-D camera is more complex, in addition to being able to collect color images, it can also measure the distance of the scene from the camera for each pixel. RGB-D cameras usually carry multiple cameras, and may adopt a variety of different working principles. In the fifth lecture, we will detail their working principles, and readers just need an intuitive impression for now. In addition, there are also specialty and emerging camera types that can be applied to SLAM, such as panorama camera~\cite{Pretto2011}, event camera~\cite{Rueckauer2016}. Although they are occasionally seen in SLAM applications, so far they have not become mainstream. From the appearance we can infer that Little Carrot seems to carry a stereo camera. +Visual SLAM is the main subject of this book, so we are particularly interested in what the Little Carrot's eyes can do. The cameras used in SLAM are different from the commonly seen Single Lens Reflex (SLR) cameras. It is often much simpler and does not carry an expensive lens. It shoots at the surrounding environment at a certain rate, forming a continuous video stream. An ordinary camera can capture images at 30 frames per second, while high-speed cameras can do faster. The camera can be roughly divided into three categories: Monocular, Stereo, and RGB-D, as shown by the following figure~\ref{fig:cameras}. Intuitively, a monocular camera has only one camera, a stereo camera has two. The principle of an RGB-D camera is more complex, in addition to being able to collect color images, it can also measure the distance of the scene from the camera for each pixel. RGB-D devices usually carry multiple cameras and may adopt a variety of different working principles. In the fifth lecture, we will detail their working principles, and readers just need an intuitive impression for now. In addition, there are also some special and emerging camera types that can be applied to SLAM, such as panorama camera~\cite{Pretto2011}, event camera~\cite{Rueckauer2016}. Although they are occasionally seen in SLAM applications, so far they have not become mainstream. From the appearance, we can infer that Little Carrot seems to carry a stereo camera. \begin{figure} \centering @@ -60,13 +60,13 @@ \section{Meet ``Little Carrot''} \label{fig:cameras} \end{figure} -Now, let's take a look at the pros and cons of using different type of camera for SLAM\@. +Now, let's take a look at the pros and cons of using different types of cameras for SLAM\@. \subsubsection{Monocular Camera} -The SLAM system that uses only one camera is called Monocular SLAM. This sensor structure is particularly simple, and the cost is particularly low, therefore the monocular SLAM has been very attractive to researchers. You must have seen the output data of a monocular camera: photo. Yes, as a photo, what are its characteristics? +The SLAM system that uses only one camera is called Monocular SLAM. This sensor structure is particularly simple, and the cost is particularly low, therefore the monocular SLAM has been very attractive to researchers. You must have seen the output data of a monocular camera: photo. Yes, like a photo, what are its characteristics? -A photo is essentially a \emph{projection} of a scene onto a camera's imaging plane. It reflects a three-dimensional world in a two-dimensional form. Obviously, there is one dimension lost during this projection process, which is the so-called depth (or distance). In a monocular case, we can not obtain the \emph{distance} between objects in the scene and the camera by using a single image. Later we will see that this distance is actually critical for SLAM. Because we human have seen a large number of images, we formed a natural sense of distances for most scenes, and this can help us determine the distance relationship among the objects in the image. For example, we can recognize objects in the image and correlate them with their approximate size obtained from daily experience. The close objects will occlude the distant objects; the sun, the moon and other celestial objects are infinitely far away; an object will have shadow if it is under sunlight. This common sense can help us determine the distance of objects, but there are also certain cases that confuse us, and we can no longer determine the distance and true size of an object. The following figure~\ref{fig:why-depth} is shown as an example. In this image, we can not determine whether the figures are real people or small toys purely based on the image itself. Unless we change our view angle, explore the three-dimensional structure of the scene. In other words, from a single image, we can not determine the true size of an object. It may be a big but far away object, but it may also be a close but small object. They may appear to be the same size in an image due to the perspective projection effect. +A photo is essentially a \emph{projection} of a scene onto a camera's imaging plane. It reflects a three-dimensional world in a two-dimensional form. Evidently, there is one dimension lost during this projection process, which is the so-called \textit{depth} (or distance). In a monocular case, we can not obtain the \emph{distance} between objects in the scene and the camera by using a single image. Later, we will see that this distance is actually critical for SLAM. Because we humans have seen a large number of images, we formed a natural sense of distances for most scenes, and this can help us determine the distance relationship among the objects in the image. For example, we can recognize objects in the image and correlate them with their approximate size obtained from daily experience. The close objects will occlude the distant objects; the sun, the moon, and other celestial objects are infinitely far away; an object will have shadow if it is under sunlight. This common-sense can help us determine the distance of objects, but there are also certain cases that confuse us, and we can no longer determine the distance and true size of an object. The following figure~\ref{fig:why-depth} is shown as an example. In this image, we can not determine whether the figures are real people or small toys purely based on the image itself. Unless we change our view angle, explore the three-dimensional structure of the scene. In other words, from a single image, we can not determine the true size of an object. It may be a big but far away object, but it may also be a close but small object. They may appear to be the same size in an image due to the perspective projection effect. \begin{figure} \centering @@ -75,14 +75,14 @@ \subsubsection{Monocular Camera} \label{fig:why-depth} \end{figure} -Since the image taken by a monocular camera is just a 2D projection of the 3D space, if we want to recover the 3D structure, we have to change the camera's view angle. Monocular SLAM adopts the same principle. We move the camera and estimate its own \emph{motion}, as well as the distances and sizes of the objects in the scene, namely the \emph{structure} of the scene. So how should we estimate these movements and structures? From the everyday experience we know that if a camera moves to the right, the objects in the image will move to the left which gives us an inspiration of inferring motion. On the other hand, we also know that closer objects move faster, while distant objects move slower. Thus, when the camera moves, the movement of these objects on the image forms pixel disparity. Through calculating the disparity, we can quantitatively determine which objects are far away and which objects are close. +Since the image taken by a monocular camera is just a 2D projection of the 3D space, if we want to recover the 3D structure, we have to change the camera's view angle. Monocular SLAM adopts the same principle. We move the camera and estimate its own \emph{motion}, as well as the distances and sizes of the objects in the scene, namely the \emph{structure} of the scene. So how should we estimate these movements and structures? From the everyday experience we know that if a camera moves to the right, the objects in the image will move to the left which gives us an inspiration of inferring motion. On the other hand, we also know that closer objects move faster, while distant objects move slower. Thus, when the camera moves, the movement of these objects on the image forms pixel \textit{disparity}. Through calculating the disparity, we can quantitatively determine which objects are far away and which objects are close. However, even if we know which objects are near and which are far, they are still only relative values. For example, when we are watching a movie, we can tell which objects in the movie scene are bigger than the others, but we can not determine the \emph{real size} of those objects -- are the buildings real high-rise buildings or just models on a table? Is it a real monster that destructs a building, or just an actor wearing special clothing? Intuitively, if the camera's movement and the scene size are doubled at the same time, monocular cameras see the same. Likewise, multiplying this size by any factor, we will still get the same picture. This demonstrates that the trajectory and map obtained from monocular SLAM estimation will differ from the actual trajectory and map with a factor, which is just the so-called \emph{scale} \footnote{Mathematical reason will be explained in the visual odometry chapter.}. Since monocular SLAM can not determine this real scale purely based on images, this is also called the \emph{scale ambiguity}. In monocular SLAM, depth can only be calculated with translational movement, and the real scale cannot be determined. These two things could cause significant trouble when applying monocular SLAM into real-world applications. The fundamental cause is that depth can not be determined from a single image. So, in order to obtain real-scaled depth, we start to use stereo and RGB-D cameras. \subsubsection{Stereo Camera and RGB-D Camera} -The purpose of using stereo and RGB-D cameras is to measure the distance between objects and the camera, to overcome the shortcomings of monocular cameras that distances are unknown. Once distances are known, the 3D structure of a scene can be recovered from a single frame, and also eliminates the scale ambiguity. Although both stereo and RGB-D cameras are able to measure the distance, their principles are not the same. A stereo camera consists of two synchronized monocular cameras, displaced with a known distance, namely the \emph{baseline}. Because the physical distance of the baseline is know, we are able to calculate the 3D position of each pixel, in a way that is very similar to our human eyes. We can estimate the distances of the objects based on the differences between the images from left and right eye, and we can try to do the same on computers (see Fig.~\ref{fig:stereo}). We can also extend stereo camera to multi-camera systems if needed, but basically there is no much difference. +The purpose of using stereo and RGB-D cameras is to measure the distance between objects and the camera, to overcome the shortcomings of monocular cameras that distances are unknown. Once distances are known, the 3D structure of a scene can be recovered from a single frame, and also eliminates the scale ambiguity. Although both stereo and RGB-D cameras are able to measure the distance, their principles are not the same. A stereo camera consists of two synchronized monocular cameras, displaced with a known distance, namely the \emph{baseline}. Because the physical distance of the baseline is known, we are able to calculate the 3D position of each pixel, in a way that is very similar to our human eyes. We can estimate the distances of the objects based on the differences between the images from the left and right eye, and we can try to do the same on computers (see Fig.~\ref{fig:stereo}). We can also extend stereo camera to multi-camera systems if needed, but basically, there is no much difference. \begin{figure} \centering @@ -92,9 +92,9 @@ \subsubsection{Stereo Camera and RGB-D Camera} \end{figure} -Stereo cameras usually require significant amount of computational power to (unreliably) estimate depth for each pixel. This is really clumsy compared to human beings. The depth range measured by a stereo camera is related to the baseline length. The longer a baseline is, the farther it can measure. So stereo cameras mounted on autonomous vehicles are usually quite big. Depth estimation for stereo cameras is achieved by comparing images from the left and right cameras, and does not rely on other sensing equipment. Thus stereo cameras can be applied both indoor and outdoor. The disadvantage of stereo cameras or multi-camera systems is that the configuration and calibration process is complicated, and their depth range and accuracy are limited by baseline length and camera resolution. Moreover, stereo matching and disparity calculation also consumes much computational resource, and usually requires GPU or FPGA to accelerate in order to generate real-time depth maps. Therefore, in most of the state-of-the-art algorithms, computational cost is still one of the major problems of stereo cameras. +Stereo cameras usually require a significant amount of computational power to (unreliably) estimate depth for each pixel. This is really clumsy compared to human beings. The depth range measured by a stereo camera is related to the baseline length. The longer a baseline is, the farther it can measure. So stereo cameras mounted on autonomous vehicles are usually quite big. Depth estimation for stereo cameras is achieved by comparing images from the left and right cameras and does not rely on other sensing equipment. Thus stereo cameras can be applied both indoor and outdoor. The disadvantage of stereo cameras or multi-camera systems is that the configuration and calibration process is complicated, and their depth range and accuracy are limited by baseline length and camera resolution. Moreover, stereo matching and disparity calculation also consumes many computational resources and usually requires GPU or FPGA to accelerate in order to generate real-time depth maps. Therefore, in most state-of-the-art algorithms, the computational cost is still one of the major problems of stereo cameras. -Depth camera (also known as RGB-D camera, RGB-D will be used in this book) is a type of new cameras rising since 2010. Similar to laser scanners, RGB-D cameras adopt infrared structure of light or Time-of-Flight (ToF) principles, and measure the distance between objects and the camera by actively emitting light to the object and receive the returned light. This part is not solved by software as a stereo camera, but by physical sensors, so it can save much computational resource compared to stereo cameras (see Fig.~\ref{fig:RGBD}). Common RGB-D cameras include Kinect / Kinect V2, Xtion Pro Live, RealSense, etc. However, most of the RGB-D cameras still suffer from issues including narrow measurement range, noisy data, small field of view, susceptible to sunlight interference, and unable to measure transparent material. For SLAM purpose, RGB-D cameras are mainly used in indoor environments, and are not suitable for outdoor applications. +Depth camera (also known as RGB-D camera, RGB-D will be used in this book) is a new type of camera rising since 2010. Similar to laser scanners, RGB-D cameras adopt infrared structure of light or Time-of-Flight (ToF) principles, and measure the distance between objects and the camera by actively emitting light to the object and receive the returned light. This part is not solved by software as a stereo camera, but by physical sensors, so it can save many computational resources compared to stereo cameras (see Fig.~\ref{fig:RGBD}). Common RGB-D cameras include Kinect / Kinect V2, Xtion Pro Live, RealSense, etc. However, most of the RGB-D cameras still suffer from issues including narrow measurement range, noisy data, small field of view, susceptibility to sunlight interference, and unable to measure transparent material. For SLAM purposes, RGB-D cameras are mainly used in indoor environments and are not suitable for outdoor applications. \begin{figure} \centering \includegraphics[width=0.8\textwidth]{./resources/whatIsSLAM/rgbd.pdf} @@ -102,9 +102,9 @@ \subsubsection{Stereo Camera and RGB-D Camera} \label{fig:RGBD} \end{figure} -We have discussed the common types of cameras, and we believe you should have gained an intuitive understanding of them. Now, imagine a camera is moving in a scene, we will get a series of continuously changing images \footnote{You can try to use your phone to record a video clip.}. The goal of visual SLAM is to localize and build a map using these images. This is not as simple task as you would think. It is not a single algorithm that continuously output positions and map information as long as we feed it with input data. SLAM requires a good algorithm framework, and after decades of hard work by researchers, the framework has been matured in recent years. +We have discussed the common types of cameras, and we believe you should have gained an intuitive understanding of them. Now, imagine a camera is moving in a scene, we will get a series of continuously changing images \footnote{You can try to use your phone to record a video clip.}. The goal of visual SLAM is to localize and build a map using these images. This is not an as simple task as you would think. It is not a single algorithm that continuously outputs positions and map information as long as we feed it with input data. SLAM requires a good algorithm framework, and after decades of hard work by researchers, the framework has been matured in recent years. -\section{The Classic Visual SLAM Framework} +\section{Classic Visual SLAM Framework} Let's take a look at the classic visual SLAM framework, shown in the following figure~\ref{fig:workflow}: @@ -117,20 +117,20 @@ \section{The Classic Visual SLAM Framework} A typical visual SLAM work-flow includes the following steps: \begin{enumerate} -\item{Sensor data acquisition}. In visual SLAM, this mainly refers to for acquisition and preprocessing for camera images. For a mobile robot, this will also include the acquisition and synchronization with motor encoders, IMU sensors, etc. +\item{Sensor data acquisition}. In visual SLAM, this mainly refers to for acquisition and preprocessing of camera images. For a mobile robot, this will also include the acquisition and synchronization with motor encoders, IMU sensors, etc. \item{Visual Odometry (VO)}. The task of VO is to estimate the camera movement between adjacent frames (ego-motion), as well as to generate a rough local map. VO is also known as the \emph{Front End}. \item {Backend filtering/optimization}. The back end receives camera poses at different time stamps from VO, as well as results from loop closing, and apply optimization to generate a fully optimized trajectory and map. Because it is connected after the VO, it is also known as the \emph{Back End}. \item {Loop Closing}. Loop closing determines whether the robot has returned to its previous position in order to reduce the accumulated drift. If a loop is detected, it will provide information to the back end for further optimization. -\item {Reconstruction}. It constructs a task specific map based on the estimated camera trajectory. +\item {Reconstruction}. It constructs a task-specific map based on the estimated camera trajectory. \end{enumerate} -The classic visual SLAM framework is the result of more than a decade's research endeavor. The framework itself and the algorithms have been basically finalized and have been provided as basic functions in several public vision and robotics libraries. Relying on these algorithms, we are able to build visual SLAM systems performing real-time localization and mapping in static environments. Therefore, a rough conclusion can be reached that if the working environment is limited to static and rigid with stable lighting conditions and no human interference, visual SLAM problem is basically solved~\cite{Cadena2016}. +The classic visual SLAM framework is the result of more than a decade's research endeavor. The framework itself and the algorithms have been basically finalized and have been provided as basic functions in several public vision and robotics libraries. Relying on these algorithms, we are able to build visual SLAM systems performing real-time localization and mapping in static environments. Therefore, a rough conclusion can be reached that if the working environment is limited to static and rigid with stable lighting conditions and no human interference, the visual SLAM problem is basically solved~\cite{Cadena2016}. -The readers may have not fully understood the concepts of the above-mentioned modules yet, so we will detail the functionality of each module in the following sections. However, an deeper understanding of their working principles requires certain mathematical knowledge which will be expanded in the second part of this book. For now, an intuitive and qualitative understanding of each module is good enough. +The readers may have not fully understood the concepts of the above-mentioned modules yet, so we will detail the functionality of each module in the following sections. However, a deeper understanding of their working principles requires certain mathematical knowledge which will be expanded in the second part of this book. For now, an intuitive and qualitative understanding of each module is good enough. \subsubsection{Visual Odometry} -The visual odometry is concerned with the movement of a camera between \emph{adjacent image frames}, and the simplest case is of course the motion between two successive images. For example, when we see the images in Fig.~\ref{fig:cameramotion}, we will naturally tell that the right image should be the result of the left image after a rotation to the left with a certain angle (it will be easier if we have a video input). Let's consider this question: how do we know the motion is ``turning left''? Humans have long been accustomed to using our eyes to explore the world, and estimating our own positions, but this intuition is often difficult to explain, especially in natural language. When we see these images, we will naturally think that, ok, the bar is close to us, the walls and the blackboard are farther away. When the camera turns to left, the closer part of the bar started to appear, and the cabinet on the right side started to move out of our sight. With this information, we conclude that the camera should be be rotating to the left. +The visual odometry is concerned with the movement of a camera between \emph{adjacent image frames}, and the simplest case is of course the motion between two successive images. For example, when we see the images in Fig.~\ref{fig:cameramotion}, we will naturally tell that the right image should be the result of the left image after a rotation to the left with a certain angle (it will be easier if we have a video input). Let's consider this question: how do we know the motion is ``turning left''? Humans have long been accustomed to using our eyes to explore the world, and estimating our own positions, but this intuition is often difficult to explain, especially in natural language. When we see these images, we will naturally think that, ok, the bar is close to us, the walls and the blackboard are farther away. When the camera turns to the left, the closer part of the bar started to appear, and the cabinet on the right side started to move out of our sight. With this information, we conclude that the camera should be rotating to the left. \begin{figure} \centering @@ -139,15 +139,15 @@ \subsubsection{Visual Odometry} \label{fig:cameramotion} \end{figure} -But if we go a step further: can we determine how much the camera has rotated or translated, in units of degrees or centimeters? It is still difficult for us to give an quantitative answer. Because our intuition is not good at calculating numbers. But for a computer, movements have to be described with such numbers. So we will ask: how should a computer determine a camera's motion only based on images? +But if we go a step further: can we determine how much the camera has rotated or translated, in units of degrees or centimeters? It is still difficult for us to give a quantitative answer. Because our intuition is not good at calculating numbers. But for a computer, movements have to be described with such numbers. So we will ask: how should a computer determine a camera's motion only based on images? As mentioned earlier, in the field of computer vision, a task that seems natural to a human can be very challenging for a computer. Images are nothing but numerical matrices in computers. A computer has no idea what these matrices mean (this is the problem that machine learning is also trying to solve). In visual SLAM, we can only see blocks of pixels, knowing that they are the results of projections by spatial points onto the camera's imaging plane. In order to quantify a camera's movement, we must first \emph{understand the geometric relationship between a camera and the spatial points}. -Some background knowledge is needed to clarify this geometric relationship and the realization of VO methods. Here we only want to convey an intuitive concept. For now, you just need to take away that VO is able to estimate camera motions from images of adjacent frames and restore the 3D structures of the scene. It is named as an ``odometry'', because similar to an actual wheel odometry which only calculates the ego-motion at neighboring moments, and does not estimate a global map or a absolute pose. In this regard, VO is like a species with only a short memory. +Some background knowledge is needed to clarify this geometric relationship and the realization of VO methods. Here we only want to convey an intuitive concept. For now, you just need to take away that VO is able to estimate camera motions from images of adjacent frames and restore the 3D structures of the scene. It is named an ``odometry'', because similar to actual wheel odometry which only calculates the ego-motion at neighboring moments, and does not estimate a global map or an absolute pose. In this regard, VO is like a species with only a short memory. -Now, assuming that we have a visual odometry, we are able to estimate camera movements between every two successive frames. If we connect the adjacent movements, this constitutes the movement of the robot trajectory, and therefore addresses the positioning problem. On the other hand, we can calculate the 3D position for each pixel according to the camera position at each time step, and they will form an map. Up to here, it seems with an VO, the SLAM problem is already solved. Or, is it? +Now, assuming that we have visual odometry, we are able to estimate camera movements between every two successive frames. If we connect the adjacent movements, this constitutes the movement of the robot trajectory and therefore addresses the positioning problem. On the other hand, we can calculate the 3D position for each pixel according to the camera position at each time step, and they will form a map. Up to here, it seems with a VO, the SLAM problem is already solved. Or, is it? -Visual odometry is indeed an key technology to solving visual SLAM problem. We will be spending a great part to explain it in details. However, using only a VO to estimate trajectories will inevitably cause \emph{accumulative drift}. This is due to the fact that the visual odometry (in the simplest case) only estimates the movement between two frames. We know that each estimate is accompanied by a certain error, and because the way odometry works, errors from previous moments will be carried forward to the following moments, resulting in inaccurate estimation after a period of time (see Fig.~\ref{fig:loopclosure}). For example, the robot first turns left 90$^\circ$ and then turns right 90$^\circ$. Due to error, we estimate the first 90$^\circ$ as 89$^\circ$, which is possible to happen in real-world applications. Then we will be embarrassed to find that after the right turn, the estimated position of the robot will not return to the origin. What's worse, even the following estimates are perfectly estimated, they will always be carrying this 1$^\circ$ error compared to the true trajectory. +Visual odometry is indeed a key technology for solving the visual SLAM problem. We will be spending a great part to explain it in details. However, using only a VO to estimate trajectories will inevitably cause \emph{accumulative drift}. This is due to the fact that the visual odometry (in the simplest case) only estimates the movement between two frames. We know that each estimate is accompanied by a certain error, and because of the way odometry works, errors from previous moments will be carried forward to the following moments, resulting in inaccurate estimation after a period of time (see Fig.~\ref{fig:loopclosure}). For example, the robot first turns left 90$^\circ$ and then turns right 90$^\circ$. Due to error, we estimate the first 90$^\circ$ as 89$^\circ$, which is possible to happen in real-world applications. Then we will be embarrassed to find that after the right turn, the estimated position of the robot will not return to the origin. What's worse, even the following estimates are perfectly estimated, they will always be carrying this 1$^\circ$ error compared to the true trajectory. \begin{figure} \centering @@ -160,16 +160,16 @@ \subsubsection{Visual Odometry} \subsubsection{ Back-end Optimization} -Generally speaking, the back-end optimization mainly refers to the process of dealing with the \emph{noise} in SLAM systems. We wish that all the sensor data is accurate, but in reality, even the most expensive sensors still have certain amount of noise. Cheap sensors usually have larger measurement errors, while that of expensive ones may be small. Moreover, performance of many sensors are affected by changes in magnetic field, temperature, etc. Therefore, in addition to solving the problem of estimating camera movements from images, we also care about how much noise this estimation contains, how these noise is carried forward from the last time step to the next, and how confident we have on the current estimation. So the problem that back-end optimization solves can be summarized as: to estimate the state of the entire system from noisy input data and calculate how uncertain these estimations are. The state here includes both the robot's own trajectory and the environment map. +Generally speaking, the back-end optimization mainly refers to the process of dealing with the \emph{noise} in SLAM systems. We wish that all the sensor data is accurate, but in reality, even the most expensive sensors still have a certain amount of noise. Cheap sensors usually have larger measurement errors, while that of expensive ones may be small. Moreover, the performance of many sensors is affected by changes in the magnetic field, temperature, etc. Therefore, in addition to solving the problem of estimating camera movements from images, we also care about how much noise this estimation contains, how it is carried forward from the last time step to the next, and how confident we have in the current estimation. So the problem that back-end optimization solves can be summarized as: to estimate the state of the entire system from noisy input data and to calculate how uncertain these estimations are. The state here includes both the robot's own trajectory and the environment map. -In contrast, the visual odometry part is usually referred to as the \emph{front end}. In a SLAM framework, the front end provides data to be optimized by the back end, as well as the initial values. Because the back end is responsible for the overall optimization, we only care about the data itself instead of where it comes from. In other words, we only have numbers and matricies in backend without those beautiful images. In visual SLAM, the front end is more relevant to \emph{computer vision} topics, such as image feature extraction and matching, while the backend is relevant to \emph{state estimation} research area. +In contrast, the visual odometry part is usually referred to as the \emph{front end}. In a SLAM framework, the front end provides data to be optimized by the back end, as well as the initial values. Because the back end is responsible for the overall optimization, we only care about the data itself instead of where it comes from. In other words, we only have numbers and matrices in the backend without those beautiful images. In visual SLAM, the front end is more relevant to \emph{computer vision} topics, such as image feature extraction and matching, while the backend is relevant to \emph{state estimation} research area. -Historically, the back-end optimization part has been equivalent to ``SLAM research'' for a long time. In the early days, SLAM problem was described as a state estimation problem, which is exactly what the back-end optimization tries to solve. In the earliest papers on SLAM, researchers at that time called it ``estimation of spatial uncertainty''~\cite{Smith1986, Smith1990}. Although sounds a little obscure, it does reflect the nature of the SLAM problem: \emph{the estimation of the uncertainty of the self-movement and the surrounding environment}. In order to solve the SLAM problem, we need state estimation theory to express the uncertainty of localization and map construction, and then use filters or nonlinear optimization to estimate the mean and uncertainty (covariance) of the states. The details of state estimation and non-linear optimization will be explained in chapter 6, 10 and 11. +Historically, the back-end optimization part has been equivalent to ``SLAM research'' for a long time. In the early days, the SLAM problem was described as a state estimation problem, which is exactly what the back-end optimization tries to solve. In the earliest papers on SLAM, researchers at that time called it ``estimation of spatial uncertainty''~\cite{Smith1986, Smith1990}. Although sounds a little obscure, it does reflect the nature of the SLAM problem: \emph{the estimation of the uncertainty of the self-movement and the surrounding environment}. In order to solve the SLAM problem, we need state estimation theory to express the uncertainty of localization and map construction, and then use filters or nonlinear optimization to estimate the mean and uncertainty (covariance) of the states. The details of state estimation and non-linear optimization will be explained in chapter 6, 10, and 11. \subsubsection{Loop Closing} Loop Closing, also known as \emph{Loop Closure Detection}, is mainly to address the drifting problem of position estimation in SLAM. So how to solve it? Assuming that a robot has returned to its origin after a period of movement, but the estimated position does not return to the origin due to drift. How to correct it? Imagine that if there is some way to let the robot know that it has returned to the origin, then we can then ``pull'' the estimated locations to the origin to eliminate drifts, which is, exactly, called loop closing. -Loop closing has close relationship with both localization and map building. In fact, the main purpose of building a map is to enable a robot to know the places it has been to. In order to achieve loop closing, we need to let the robot has the ability to identify the scenes it has visited before. There are different alternatives to achieve this goal. For example, as we mentioned earlier, we can set a marker at where the robot starts, such as a QR code. If the sign was seen again, we know that the robot has returned to the origin. However, the marker is essentially an intrusive sensor which sets additional constraints to the application environment. We prefer the robot can use its non-intrusive sensors, e.g.\ the image itself, to complete this task. A possible approach would be to detect similarities between images. This is inspired by us humans. When we see two similar images, it is easy to identify that they are taken from the same place. If the loop closing is successful, accumulative error can be significantly reduced. Therefore, visual loop detection is essentially an algorithm for calculating similarities of images. Note that the loop closing problem also exists in laser based SLAM, but here the rich information contained in images can remarkably reduce the difficulty of making a correct loop detection. +Loop closing has a close relationship with both localization and map building. In fact, the main purpose of building a map is to enable a robot to know the places it has been to. In order to achieve loop closing, we need to let the robot has the ability to identify the scenes it has visited before. There are different alternatives to achieve this goal. For example, as we mentioned earlier, we can set a marker where the robot starts, such as a QR code. If the sign was seen again, we know that the robot has returned to the origin. However, the marker is essentially an intrusive sensor that sets additional constraints to the application environment. We prefer the robot can use its non-intrusive sensors, e.g.\ the image itself, to complete this task. A possible approach would be to detect similarities between images. This is inspired by us humans. When we see two similar images, it is easy to identify that they are taken from the same place. If the loop closing is successful, the accumulative error can be significantly reduced. Therefore, visual loop detection is essentially an algorithm for calculating similarities of images. Note that the loop closing problem also exists in laser-based SLAM, but here the rich information contained in images can remarkably reduce the difficulty of making a correct loop detection. After a loop is detected, we will tell the back-end optimization algorithm that, OK, ``A and B are the same point''. Then, based on this new information, the trajectory and the map will be adjusted to match the loop detection result. In this way, if we have sufficient and reliable loop detection, we can eliminate cumulative errors, and get globally consistent trajectories and maps. @@ -183,20 +183,20 @@ \subsubsection{Mapping} \label{fig:mapping} \end{figure} -Let's take the domestic cleaning robots as an example. Since they basically move on the ground, a two-dimensional map with marks for open areas and obstacles, built by a single-line laser scanner, would be sufficient for navigation for them. And for a camera, we need at least a three-dimensional map for its 6 degrees of freedom movement. Sometimes, we want a smooth and beautiful reconstruction result, not just a set of points, but also with texture of triangular faces. And at other times, we do not care about the map, just need to know things like ``point A and point B are connected, while point B and point C are not'', which is a topological way to understand the environement. Sometimes maps may not even be needed, for instance, a level-3 autonomous driving car can make a lane-following driving only knowing its relative motion with the lanes. +Let's take the domestic cleaning robots as an example. Since they basically move on the ground, a two-dimensional map with marks for open areas and obstacles, built by a single-line laser scanner, would be sufficient for navigation for them. And for a camera, we need at least a three-dimensional map for its 6 degrees-of-freedom movement. Sometimes, we want a smooth and beautiful reconstruction result, not just a set of points, but also with a texture of triangular faces. And at other times, we do not care about the map, just need to know things like ``point A and point B are connected, while point B and point C are not'', which is a topological way to understand the environment. Sometimes maps may not even be needed, for instance, a level-3 autonomous driving car can make a lane-following driving only knowing its relative motion with the lanes. -For maps, we have various ideas and demands. So compared to the previously mentioned VO, loop closure detection and back-end optimization, map building does not have a certain algorithm. A collection of spatial points can be called a map, a beautiful 3D model is also a map, so is a picture of a city, a village, railways, and rivers. The form of the map depends on the application of SLAM. In general, they can be divided into to categories: \emph{metrical map} and \emph{topological map}. +For maps, we have various ideas and demands. So compared to the previously mentioned VO, loop closure detection, and back-end optimization, map building does not have a certain algorithm. A collection of spatial points can be called a map, a beautiful 3D model is also a map, so is a picture of a city, a village, railways, and rivers. The form of the map depends on the application of SLAM. In general, they can be divided into to categories: \emph{metrical map} and \emph{topological map}. \paragraph{Metric Map} -Metrical maps emphasize the exact metrical locations of the objects in maps. They are usually classified as either sparse or dense. Sparse metric maps store the scene into a compact form, and do not express all the objects. For example, we can construct a sparse map by selecting representative landmarks such as the lanes and traffic signs, and ignore other parts. In contrast, dense metrical maps focus on modeling all the things that are seen. For localization, a sparse map would be enough, while for navigation, a dense map is usually needed (otherwise we may hit a wall between two landmarks). A dense map usually consists of a number of small pieces at a certain resolution. It can be small grids for 2D metric maps, or small voxels for 3D maps. For example, in a grid map, a grid may have three states: occupied, idle, and unknown, to express whether there is an object. When a spatial location is queried, the map can give the information about whether the location can be passed through. This type of maps can be used for a variety of navigation algorithms, such as A$^*$, D$^*$\footnote{ See \url{https://en.wikipedia.org/wiki/A*_search_algorithm}.}, etc., and thus attracts the attention of robotics researchers. But we can also see that all the grid status are store in the map, and thus being storage expensive. There are also some open issues in building a metrical map, for example, in large-scale metrical maps, a little bit of steering error may cause the walls of two rooms to overlap with each other, and thus making the map ineffective. +Metrical maps emphasize the exact metrical locations of the objects in maps. They are usually classified as either sparse or dense. Sparse metric maps store the scene into a compact form and do not express all the objects. For example, we can construct a sparse map by selecting representative landmarks such as the lanes and traffic signs, and ignore other parts. In contrast, dense metrical maps focus on modeling all the things that are seen. For localization, a sparse map would be enough, while for navigation, a dense map is usually needed (otherwise we may hit a wall between two landmarks). A dense map usually consists of a number of small pieces at a certain resolution. It can be small grids for 2D metric maps or small voxels for 3D maps. For example, in a grid map, a grid may have three states: occupied, idle, and unknown, to express whether there is an object. When a spatial location is queried, the map can give information about whether the location can be passed through. This type of maps can be used for a variety of navigation algorithms, such as A$^*$, D$^*$\footnote{ See \url{https://en.wikipedia.org/wiki/A*_search_algorithm}.}, etc., and thus attract the attention of robotics researchers. But we can also see that all the grid status are store in the map, and thus being storage expensive. There are also some open issues in building a metrical map, for example, in large-scale metrical maps, a little bit of steering error may cause the walls of two rooms to overlap with each other, and thus making the map ineffective. \paragraph{Topological Map} -Compared to the accurate metrical maps, topological maps emphasize the relationships among map elements. A topological map is a graph composed of nodes and edges, only considering the connectivity between nodes. For instance, we only care about that point A and point B are connected, regardless how we could travel from point A to point B. It relaxes the requirements on precise locations of a map by removing map details, and is therefore a more compact expression. However, topological maps are not good at representing maps with complex structures. Questions such as how to split a map to form nodes and edges, and how to use a topological map for navigation and path planning, are still open problems to be studied. +Compared to the accurate metrical maps, topological maps emphasize the relationships among map elements. A topological map is a graph composed of nodes and edges, only considering the connectivity between nodes. For instance, we only care about that point A and point B are connected, regardless of how we could travel from point A to point B. It relaxes the requirements on precise locations of a map by removing map details, and is, therefore, a more compact expression. However, topological maps are not good at representing maps with complex structures. Questions such as how to split a map to form nodes and edges, and how to use a topological map for navigation and path planning, are still open problems to be studied. \section{Mathematical Formulation of SLAM Problems} -Through the previous introduction, readers should have gained an intuitive understanding of the modules in a SLAM system and the main functionality of each module. However, we cannot write runable programs only based on intuitive impressions. We want to rise it to a rational and rigorous level, that is, using mathematical symbols to formulate a SLAM process. We will be using variables and formulas, but please rest assured that we will try our best to keep it clear enough. +Through the previous introduction, readers should have gained an intuitive understanding of the modules in a SLAM system and the main functionality of each module. However, we cannot write runnable programs only based on intuitive impressions. We want to rise it to a rational and rigorous level, that is, using mathematical symbols to formulate a SLAM process. We will be using variables and formulas, but please rest assured that we will try our best to keep it clear enough. -Assuming that our Little Carrot is moving in an unknown environment, carrying some sensors. How can this be described in mathematical language? First, since sensors usually collect data at different some time points, we are only concerned with the locations and map at these moments. This turns a continuous process into discrete time steps, say $1, \cdots, k$, at which data sampling happens. We use $\mathbf{x}$ to indicate positions of Little Carrot. So the positions at different time steps can be written as $\mathbf{x}_1,\cdots,\mathbf{x}_k$, which constitute the trajectory of Little Carrot. In terms of the map, we assume that the map is made up of a number of \emph{landmarks}, and at each time step, the sensors can see a part of the landmarks and record their observations. Assume there are total $N$ landmarks in the map, and we will use $\mathbf{y}_1, \cdots, \mathbf{y}_N$ to denote them. +Assuming that our Little Carrot is moving in an unknown environment, carrying some sensors. How can this be described in mathematical language? First, since sensors usually collect data at different time points, we are only concerned with the locations and the map at these moments. This turns a continuous process into discrete time steps, say $1, \cdots, k$, at which data sampling happens. We use $\mathbf{x}$ to indicate positions of Little Carrot. So the positions at different time steps can be written as $\mathbf{x}_1,\cdots,\mathbf{x}_k$, which constitute the trajectory of Little Carrot. In terms of the map, we assume that the map is made up of a number of \emph{landmarks}, and at each time step, the sensors can see a part of the landmarks and record their observations. Assume there is a total of $N$ landmarks in the map, and we will use $\mathbf{y}_1, \cdots, \mathbf{y}_N$ to denote them. With such a setting, the process that ``Little Carrot move in the environment with sensors'' basically has two parts: @@ -205,13 +205,13 @@ \section{Mathematical Formulation of SLAM Problems} \item What are the sensor \emph{observations}? Assuming that the Little Carrot detects a certain landmark, let's say $\mathbf{y}_j$ at position $\mathbf{x}_k$, we need to describe this event in mathematical language. \end{enumerate} -Let's first take a look at motion. Typically, we may send some motion message to the robots like ``turn 15 degree to left''. These messages or orders will be finally carried out by the controller, but probably in may different ways. Sometimes we control the position of robots, but acceleration or angular velocity would always be reasonable alternates. However, no matter what the controller is, we can use a universal and abstract mathematical model to describe it: +Let's first take a look at motion. Typically, we may send some motion messages to the robots like ``turn 15 degrees to left''. These messages or orders will be finally carried out by the controller, but probably in many different ways. Sometimes we control the position of robots, but acceleration or angular velocity would always be reasonable alternates. However, no matter what the controller is, we can use a universal and abstract mathematical model to describe it: \begin{equation} {\mathbf{x}_k} = f\left( {{\mathbf{x}_{k - 1}},{\mathbf{u}_k}, \mathbf{w}_k} \right), \end{equation} -where $\mathbf{u}_k$ is the input orders, and $\mathbf{w}_k$ is noise. Note that we use a general $f(\cdot)$ to describe the process, instead of specifying the exact form of $f$. This allows the function to represent any motion input, rather than being limited to a particular one, and thus becoming a general equation. We call it the \emph{motion equation}. +where $\mathbf{u}_k$ is the input orders, and $\mathbf{w}_k$ is noise. Note that we use a general $f(\cdot)$ to describe the process, instead of specifying the exact form of $f$. This allows the function to represent any motion input, rather than being limited to a particular one and thus becoming a general equation. We call it the \emph{motion equation}. -The presence of noise turns this model into a stochastic model. In other words, even if we give the order like ``move forward one meter'', it does not mean that our robot really advances one meter. If all the instructions are accurate, there is no need to \emph{estimate} anything. In fact, the robot may only advance by, say, 0.9 meters, and at another moment, it moves by 1.1 meters. Thus, the noise during each movement is random. If we ignore this noise, the position determined only by the command may be a hundred miles away from the actual position after several minutes. +The presence of noise turns this model into a stochastic model. In other words, even if we give an order as ``move forward one meter'', it does not mean that our robot really advances one meter. If all the instructions are accurate, there is no need to \emph{estimate} anything. In fact, the robot may only advance by, say, 0.9 meters, and at another moment, it moves by 1.1 meters. Thus, the noise during each movement is random. If we ignore this noise, the position determined only by the command maybe a hundred miles away from the actual position after several minutes. Corresponding to the motion equation, there is also an \emph{observation equation}. The observation equation describes the process that the Little Carrot sees a landmark point $\mathbf{y}_j$ at $\mathbf{x}_k$ and generates an observation data $\mathbf{z}_{k,j}$. Likewise, we will describe this relationship with an abstract function $h(\cdot)$: \begin{equation} @@ -248,7 +248,7 @@ \section{Mathematical Formulation of SLAM Problems} \end{array} \right] + \mathbf{v}_{k, j}. \end{equation} -When considering about visual SLAM, the sensor is a camera, then the observation equation is a process like ``getting the pixels in the image of the landmarks.'' This process involves a description of the camera model, which will be covered in detail in Chapter 5, which is skipped here. +When considering visual SLAM, the sensor is a camera, then the observation equation is a process like ``getting the pixels in the image of the landmarks.'' This process involves a description of the camera model, which will be covered in detail in Chapter 5, which is skipped here. Obviously, it can be seen that the two equations have different parameterized forms for different sensors. If we maintain versatility and take them into a common abstract form, then the SLAM process can be summarized into two basic equations: \begin{equation} @@ -273,9 +273,9 @@ \subsection{Installing Linux} Our program is based on C++ programs on Linux. During the experiment, we will use several open-source libraries. Most libraries are only supported in Linux, while configuration on Windows is relatively (or quite) cumbersome. Therefore, we have to assume that you already have a basic knowledge of Linux (see the exercises in the previous lecture), including using basic commands to understand how the software is installed. Of course, you don't have to know how to develop C++ programs under Linux, which is exactly what we want to talk about below. -Let's start from installing the experimental environment required for this book. As a book for beginners, we use Ubuntu as a development environment. Ubuntu and its variances have enjoyed a good reputation as a novice user in all major Linux distributions. Ubuntu is an open-source operating system. Its system and software can be downloaded freely on the official website (\url{http://ubuntu.com}), which provides detailed instructions on how to install it. At the same time, Tsinghua University, China Science and Technology University and other major universities in China have also provided Ubuntu software mirrors, making the software installation very convenient (probably there are also mirror websites in your country). +Let's start by installing the experimental environment required for this book. As a book for beginners, we use Ubuntu as a development environment. Ubuntu and its variances have enjoyed a good reputation as a novice user in all major Linux distributions. Ubuntu is an open-source operating system. Its system and software can be downloaded freely on the official website (\url{http://ubuntu.com}), which provides detailed instructions on how to install it. At the same time, Tsinghua University, China Science and Technology University, and other major universities in China have also provided Ubuntu software mirrors, making the software installation very convenient (probably there are also mirror websites in your country). -The first version of this book uses Ubuntu 14.04 as the default development environment. In the second edition, we updated the default version to the newer \textbf{Ubuntu 18.04} (\autoref{fig:ubuntu1804}) for later research. If you want to change the styles, then Ubuntu Kylin, Debian, Deepin and Linux Mint are also good choices. I promise that all the code in the book has been well tested under Ubuntu 18.04, but if you choose a different distribution, I am not sure if you will encounter some minor problems. You may need to spend some time solving small issues (but you can also take them as opportunities to exercise yourself). In general, Ubuntu's support for various libraries is relatively complete, and the software is also very rich. Although we don't limit which Linux distribution you use, in the explanation, \textbf{we will use Ubuntu 18.04 as an example}, and mainly use Ubuntu commands (such as apt-get), so in other versions of Ubuntu there will be no obvious differences below. In general, the migration of programs between Linux is not very difficult. But if you want to use the programs in this book under Windows or OS X, you need to have some porting experience. +The first version of this book uses Ubuntu 14.04 as the default development environment. In the second edition, we updated the default version to the newer \textbf{Ubuntu 18.04} (\autoref{fig:ubuntu1804}) for later research. If you want to change the styles, then Ubuntu Kylin, Debian, Deepin, and Linux Mint are also good choices. I promise that all the code in the book has been well tested under Ubuntu 18.04, but if you choose a different distribution, I am not sure if you will encounter some minor problems. You may need to spend some time solving small issues (but you can also take them as opportunities to exercise yourself). In general, Ubuntu's support for various libraries is relatively complete, and the software is also very rich. Although we don't limit which Linux distribution you use, in the explanation, \textbf{we will use Ubuntu 18.04 as an example}, and mainly use Ubuntu commands (such as apt-get), so in other versions of Ubuntu, there will be no obvious differences below. In general, the migration of programs between Linux is not very difficult. But if you want to use the programs in this book under Windows or OS X, you need to have some porting experience. \begin{figure}[!ht] \centering @@ -317,7 +317,7 @@ \subsection{Hello SLAM} \end{lstlisting} If there are other errors, please check again if the program you just entered is correct. -Just now this compile command compiles the text file helloSLAM.cpp into an executable program. We check the current directory and find that there is an additional a.out file, and it has executed permissions (the colors in the terminal are different, should be green in default settings). We can enter ./a.out to run the program \footnote{Don't type the first \%. }: +Just now this compile command compiles the text file ``helloSLAM.cpp'' into an executable program. We check the current directory and find that there is an additional a.out file, and it has executed permissions (the colors in the terminal are different, should be green in default settings). We can enter ./a.out to run the program \footnote{Don't type the first \%. }: \begin{lstlisting}[language=sh,caption=terminal input:] % ./a.out @@ -326,25 +326,25 @@ \subsection{Hello SLAM} As we thought, this program outputs ``Hello SLAM!'', telling us that it is running correctly. -Please review what we did before. In this example, we used the editor to enter the source code for helloSLAM.cpp, then called the g++ compiler to compile it and get the executable. By default, g++ compiles the source file into a program of the name a.out (it is a bit weird, but acceptable). If we like, we can also specify the file name of this output. This is an extremely simple example, we actually \textbf{use a lot of hidden default parameters, almost omitting all intermediate steps}, in order to give the reader a simple impression (although you may not have realized it). Below we will use CMake to compile this program. +Please review what we did before. In this example, we used the editor to enter the source code for ``helloSLAM.cpp'', then called the g++ compiler to compile it and get the executable. By default, g++ compiles the source file into a program of the name a.out (it is a bit weird but acceptable). If we like, we can also specify the file name of this output. This is an extremely simple example, we actually \textbf{use a lot of hidden default parameters, almost omitting all intermediate steps}, in order to give the reader a simple impression (although you may not have realized it). Below we will use CMake to compile this program. -\subsection{Use cmake} -Theoretically, any C++ program can be compiled with g++. But when the program size is getting bigger and bigger, a project may have many folders and source files, and the compiled commands will be longer and longer. Usually, a small C++ project may contain more than a dozen classes, and there are complex dependencies between these classes. Some of them are compiled into executables, and some are compiled into libraries. If we only rely on the g++ command, we need to enter a lot of commands, and the whole compilation process will become very cumbersome. Therefore, for C++ projects, using some engineering management tools is more efficient. In history, engineers used \textbf{makefile} to compile automatically, but the cmake to be discussed below is more convenient than it. And cmake is widely used in engineering, we will see that most of the libraries mentioned later use cmake to manage the source code. +\subsection{Use CMake} +Theoretically, any C++ program can be compiled with g++. But when the program size is getting bigger and bigger, a project may have many folders and source files, and the compiled commands will be longer and longer. Usually, a small C++ project may contain more than a dozen classes, and there are complex dependencies between these classes. Some of them are compiled into executables, and some are compiled into libraries. If we only rely on the g++ command, we need to enter a lot of commands, and the whole compilation process will become very cumbersome. Therefore, for C++ projects, using some engineering management tools is more efficient. In history, engineers used \textbf{makefile} to compile automatically, but the CMake to be discussed below is more convenient than it. And CMake is widely used in engineering, we will see that most of the libraries mentioned later use CMake to manage the source code. -In a CMake project, we will use the cmake command to generate a makefile, and then use the make command to compile the entire project based on the contents of the makefile. The reader may not know what a makefile is, but it doesn't matter, we will learn by example. Still taking the above helloSLAM.cpp as an example, this time we are not using g++ directly, but using CMake to build a project and then compiling it. Create a new CMakeLists.txt file in slambook2/ch2/ with the following contents: +In a CMake project, we will use the \textit{cmake} command to generate a makefile, and then use the make command to compile the entire project based on the contents of the makefile. The reader may not know what a makefile is, but it doesn't matter, we will learn by example. Still taking the above ``helloSLAM.cpp'' as an example, this time we are not using g++ directly, but using CMake to build a project and then compiling it. Create a new ``CMakeLists.txt'' file in ``slambook2/ch2/'' with the following contents: \begin{lstlisting}[language=Python,caption=slambook2/ch2/CMakeLists.txt] cmake_minimum_required( VERSION 2.8 ) project( HelloSLAM ) add_executable( helloSLAM helloSLAM.cpp ) \end{lstlisting} -The CMakeLists.txt file is used to tell cmake what we want to do with the files in this directory. The contents of the CMakeLists.txt file need to follow the cmake syntax. In this example, we demonstrate the most basic project: specifying a project name and an executable program. According to the comments, the reader should understand what each sentence does. +The ``CMakeLists.txt'' file is used to tell CMake what we want to do with the files in this directory. The contents of the ``CMakeLists.txt'' file need to follow the CMake syntax. In this example, we demonstrate the most basic project: specifying a project name and an executable program. According to the comments, the reader should understand what each sentence does. -Now, in the current directory (slambook2/ch2/), call cmake to compile the project: \footnote{Note that there's a dot at the end of the command, please don't forget it, which means using cmake in the current directory. }: +Now, in the current directory (slambook2/ch2/), call \textit{cmake} to compile the project: \footnote{Note that there's a dot at the end of the command, please don't forget it, which means using CMake in the current directory. }: \begin{lstlisting}[language=sh,caption=Terminal input] cmake . \end{lstlisting} -cmake will output some compilation information, and then generate some intermediate files in the current directory, the most important of which is the makefile\footnote{Makefile is an automated compilation script, the reader can now understand it as a system automatically generated compiler instructions, without taking care of its content. }. Since MakeFile is automatically generated, we don't have to modify it. Now, compile the project with the make command. +\textit{cmake} will output some compilation information, and then generate some intermediate files in the current directory, the most important of which is the makefile\footnote{Makefile is an automated compilation script, the reader can now understand it as a system automatically generated compiler instructions, without taking care of its content. }. Since MakeFile is automatically generated, we don't have to modify it. Now, compile the project with the make command. \begin{lstlisting}[language=sh,caption=Terminal input] % make Scanning dependencies of target helloSLAM @@ -352,26 +352,26 @@ \subsection{Use cmake} Linking CXX executable helloSLAM [100%] Built target helloSLAM \end{lstlisting} -The compiler will show a process percent during compilation. We then get the declared executable \textbf{helloSLAM} in our CMakeLists.txt if the compilation is successful. Just type: +The compiler will show a process percent during compilation. We then get the declared executable \textbf{helloSLAM} in our ``CMakeLists.txt'' if the compilation is successful. Just type: \begin{lstlisting}[language=sh,caption=Terminal Input] % ./helloSLAM Hello SLAM! \end{lstlisting} -to run it. Because we didn't modify the source code, we got the same result as before. Please think about the difference between this practice and the previous use of the g++ compiler. This time we used the cmake-make process. The cmake process handles the relationship between the project files, and the making process actually calls g++ to compile the program. By calling this cmake-make process, we have a good management for the project: \textbf{from inputting a string of g++ commands to maintaining several relatively intuitive CMakeLists.txt files}, which will obviously reduce the difficulty of maintaining the entire project. For example, if you want to add another executable file, just add a line ``add\_executable'' in CMakeLists.txt, and the subsequent steps are unchanged. Cmake will help us resolve code dependencies without having to type in a bunch of g++ commands. +to run it. Because we didn't modify the source code, we got the same result as before. Please think about the difference between this practice and the previous use of the g++ compiler. This time we used the \textit{cmake-make} process. The \textit{cmake} process handles the relationship between the project files, and the \textit{make} process actually calls g++ to compile the program. By calling this CMake-make process, we have good management for the project: \textbf{from inputting a string of g++ commands to maintaining several relatively intuitive ``CMakeLists.txt'' files}, which will drastically reduce the difficulty of maintaining the entire project. For example, if you want to add another executable file, just add a line ``add\_executable'' in CMakeLists.txt, and the subsequent steps are unchanged. Cmake will help us resolve code dependencies without having to type in a bunch of g++ commands. -The only thing that is dissatisfied with this process is that the intermediate files generated by cmake are still in our code files. When we want to release the code, we don't want to publish these intermediate files together. At this time, we still need to delete them one by one, which is very inconvenient. A better approach is to have these intermediate files in an intermediate directory. After the compilation is successful, we will delete the intermediate directory. Therefore, the more common practice of compiling cmake projects is as follows: +The only thing that is dissatisfied with this process is that the intermediate files generated by CMake are still in our code files. When we want to release the code, we don't want to publish these intermediate files together. At this time, we still need to delete them one by one, which is very inconvenient. A better approach is to have these intermediate files in an intermediate directory. After the compilation is successful, we will delete the intermediate directory. Therefore, the more common practice of compiling CMake projects is as follows: \begin{lstlisting}[language=sh,caption=Terminal input] mkdir build cd build cmake .. make \end{lstlisting} -We created a new intermediate folder ``build'', and then entered the build folder, using the cmake .. command to compile the previous folder, which is the folder where the code is located. In this way, the intermediate files generated by cmake will be in the ``build'' folder, separate from the source code. When publishing the source code, we just delete the build folder. Please try to compile the code in ch2 in this way, and then call the generated executable (please remember to delete the intermediate file generated in the last section). +We created a new intermediate folder ``build'', and then entered the build folder, using the ``\textit{cmake ..}'' command to compile the previous folder, which is the folder where the code is located. In this way, the intermediate files generated by CMake will be in the ``build'' folder, separate from the source code. When publishing the source code, we just delete the build folder. Please try to compile the code in ch2 in this way, and then call the generated executable (please remember to delete the intermediate file generated in the last section). \subsection{Use Libraries} In a C++ project, not all code is compiled into executables. Only executable files with the main function will generate executable programs. For other codes, we just want to package them into a packet for other programs to call. This packet is called \textbf{library}. -A library is often just a collection of many algorithms and programs, and we will be exposed to many libraries in later exercises. For example, the OpenCV library provides many computer vision related algorithms, while the Eigen library provides calculations of matrix algebra. Therefore, we need to learn how to use cmake to generate libraries and use the functions in the library. Now let's demonstrate how to write a library yourself. Write the following libHelloSLAM.cpp file: +A library is often just a collection of many algorithms and programs, and we will be exposed to many libraries in later exercises. For example, the OpenCV library provides many computer vision-related algorithms, while the Eigen library provides calculations of matrix algebra. Therefore, we need to learn how to use CMake to generate libraries and use the functions in the library. Now let's demonstrate how to write a library yourself. Write the following ``libHelloSLAM.cpp'' file: \begin{lstlisting}[language=c++,caption=slambook2/ch2/libHelloSLAM.cpp] #include @@ -382,11 +382,11 @@ \subsection{Use Libraries} cout << "Hello SLAM" << endl; } \end{lstlisting} -This library provides a ``printHello'' function that will output a message. But it doesn't have the main function, which means there are no executables in this library. We add the following to CMakeLists.txt: +This library provides a ``printHello'' function that will output a message. But it doesn't have the main function, which means there are no executables in this library. We add the following to ``CMakeLists.txt'': \begin{lstlisting}[language=sh,caption=slambook2/ch2/CMakeLists.txt] add_library( hello libHelloSLAM.cpp ) \end{lstlisting} -This line tells cmake that we want to compile this file into a library called ``hello''. Then, as above, compile the entire project using cmake: +This line tells CMake that we want to compile this file into a library called ``hello''. Then, as above, compile the entire project using \textit{cmake}: \begin{lstlisting}[language=sh,caption=Terminal input] cd build cmake .. @@ -394,14 +394,14 @@ \subsection{Use Libraries} \end{lstlisting} At this point, a ``libhello.a'' file is generated in the build folder, which is the library we declared. -In Linux, the library files are divided into \textbf{static library} and \textbf{shared library}. Static libraries have a .a extension and shared libraries end with .so. All libraries are collections of functions that are packaged. The difference is that \textbf{a static library will generate a copy each time it is called, and the shared library has only one copy}, which saves space. If you want to generate a shared library instead of a static library, just use the following statement: +In Linux, the library files are divided into \textbf{static library} and \textbf{shared library}. Static libraries have a ``.a'' extension and shared libraries end with ``.so''. All libraries are collections of functions that are packaged. The difference is that \textbf{a static library will generate a copy each time it is called, and the shared library has only one copy}, which saves space. If you want to generate a shared library instead of a static library, just use the following statement: \begin{lstlisting}[language=sh,caption=slambook2/ch2/CMakeLists.txt] add_library( hello_shared SHARED libHelloSLAM.cpp ) \end{lstlisting} Then we will get a libhello\_shared.so. -The library file is a compressed package with compiled binary functions. However, if there is only a .a or .so library file, then we don't know what the function is and how to call it. In order for others (or ourselves) to use this library, we need to provide a \textbf{header file} to indicate what is in the library. Therefore, for the user of the library, \textbf{you can call this library as long as you get the header and library files}. Write the header file for libhello below. +The library file is a compressed package with compiled binary functions. However, if there is only a ``.a'' or ``.so'' library file, then we don't know what the function is and how to call it. In order for others (or ourselves) to use this library, we need to provide a \textbf{header file} to indicate what is in the library. Therefore, for the user of the library, \textbf{you can call this library as long as you get the header and library files}. Write the header file for ``libhello'' below. \begin{lstlisting}[language=c++,caption=slambook2/ch2/libHelloSLAM.h] #ifndef LIBHELLOSLAM_H_ @@ -425,15 +425,15 @@ \subsection{Use Libraries} } \end{lstlisting} -Then, declare an executable in CMakeLists.txt and \textbf{link} it to the library: +Then, declare an executable in ``CMakeLists.txt'' and \textbf{link} it to the library: \begin{lstlisting}[caption=slambook2/ch2/CMakeLists.txt] add_executable( useHello useHello.cpp ) target_link_libraries( useHello hello_shared ) \end{lstlisting} -Through these two lines of statements, the useHello program can successfully use the code in the hello\_shared library. This small example demonstrates how to generate and call a library. Please note that for libraries provided by others, we can also call them in the same way and integrate them into our own programs. +Through these two lines of statements, the ``useHello'' program can successfully use the code in the hello\_shared library. This small example demonstrates how to generate and call a library. Please note that for libraries provided by others, we can also call them in the same way and integrate them into our own programs. -In addition to the features already demonstrated, cmake has many more syntax and options. Of course we can not list all of them here. In fact, cmake is very similar to a normal programming language, with variables and conditional control statements, so you can learn cmake just like learning programming. The exercises contain some reading materials for cmake, which can be read by interested readers. Now, a brief review of what we did before: +In addition to the features already demonstrated, \textit{cmake} has many more syntax and options. Of course, we can not list all of them here. In fact, cmake is very similar to a normal programming language, with variables and conditional control statements, so you can learn cmake just like learning programming. The exercises contain some reading materials for cmake, which can be read by interested readers. Now, a brief review of what we did before: \begin{enumerate} \item First, the program code consists of a header file and a source file. @@ -441,18 +441,18 @@ \subsection{Use Libraries} \item If the executable wants to call a function in the library file, it needs to refer to the header file provided by the library to understand the format of the call. Also, link the executable to the library file. \end{enumerate} -These steps should be simple and clear, but you may encounter some problems in the actual operation. For example, what happens if the executable references a library function but we forget to link the library? Try removing the link command in CMakeLists.txt and see what happens. Can you understand the error message reported by cmake? +These steps should be simple and clear, but you may encounter some problems in the actual operation. For example, what happens if the executable references a library function but we forget to link the library? Try removing the link command in ``CMakeLists.txt'' and see what happens. Can you understand the error message reported by \textit{cmake}? \subsection{Use IDE} Finally, let's talk about how to use the Integrated Development Environments (IDEs). The previous programming can be done with a simple text editor. However, you may need to jump between files to query the declaration and implementation of a function. This can be a little annoying when there are too many files. The IDE provides developers with a lot of convenient functions such as jump, completion, breakpoint debugging, etc. Therefore, we recommend that the reader choose an IDE for development. -There are many kinds of IDEs under Linux. Although there are still some gaps with the best IDE (I mean Visual Studio in Windows), there are several supported C++ developments, such as Eclipse, Qt Creator, Code::Blocks, Clion, Visual Studio Code, and so on. Again, we don't force readers to use a particular IDE, but only give our advice. We are using KDevelop and Clion (see \autoref{fig:kdevelop} and \autoref{fig:clion})\footnote{However, the recent Visual Studio Code is getting better and better. It's free. It's very popular among developers. You may have a try. }. KDevelop is a free software located in Ubuntu's software repository, meaning you can install it with apt-get; Clion is a paid software, but you can use the student mailbox for free for one year. Both are good C++ development environments, the advantages are listed below: +There are many kinds of IDEs under Linux. Although there are still some gaps with the best IDE (I mean Visual Studio in Windows), there are several supported C++ developments, such as Eclipse, Qt Creator, Code::Blocks, Clion, Visual Studio Code, and so on. Again, we don't force readers to use a particular IDE but only give our advice. We are using KDevelop and Clion (see \autoref{fig:kdevelop} and \autoref{fig:clion})\footnote{However, the recent Visual Studio Code is getting better and better. It's free. It's very popular among developers. You may have a try. }. KDevelop is a free software located in Ubuntu's software repository, meaning you can install it with apt-get; Clion is a paid software, but you can use the student mailbox for free for one year. Both are good C++ development environments, the advantages are listed below: \begin{enumerate} - \item Support cmake projects. + \item Support CMake projects. \item Support C++ better (including the 11 and later standards). There are highlighting, jumping, and finishing functions. Can automatically format the code. \item Makes it easy to see individual files and directory trees. - \item Has one-click compilation, breakpoint debugging and other functions. + \item Has one-click compilation, breakpoint debugging, and other functions. \end{enumerate} \begin{figure}[!ht] @@ -465,28 +465,28 @@ \subsection{Use IDE} Below we take a little bit of space to introduce KDevelop and Clion. \subsubsection{Use KDE} -Kdevelop natively supports the cmake project. To do this, after creating CMakeLists.txt in the terminal, open CMakeLists.txt with ``Project $\rightarrow$Open/Import Project'' in KDevelop. The software will ask you a few questions, and by default create a build folder to help you call the cmake and make commands. These can be done automatically by pressing the shortcut key F8. The following section of \autoref{fig:kdevelop} shows the compilation information. +Kdevelop natively supports the CMake project. To do this, after creating ``CMakeLists.txt'' in the terminal, open the ``CMakeLists.txt'' with ``Project $\rightarrow$Open/Import Project'' in KDevelop. The software will ask you a few questions, and by default create a build folder to help you call the cmake and make commands. These can be done automatically by pressing the shortcut key F8. The following section of \autoref{fig:kdevelop} shows the compilation information. We hand over the task of adapting to the IDE to the reader. If you are transferring from Windows, you will find its interface similar to Visual C++ or Visual Studio. Please use KDevelop to open the previous project and compile it to see what information it outputs. I believe you will feel more convenient than opening the terminal. Next, let's show to debug in the IDE. Most of the students who program under Windows will have experience of breakpoint debugging under Visual Studio. However, in Linux, the default debugging tool gdb only provides a text interface, which is not convenient for novices. Some IDEs provide breakpoint debugging (the bottom layer is still gdb), and KDevelop is one of them. To use KDevelop's breakpoint debugging feature, you need to do the following: \begin{enumerate} - \item Set the project to Debug compilation mode in CMakeLists.txt, and don't use optimization options (not used by default). + \item Set the project to Debug compilation mode in ``CMakeLists.txt'', and don't use optimization options (not used by default). \item Tell KDevelop which program you want to run. If there are parameters, also configure its parameters and working directory. \item Enter the breakpoint debugging interface, you can single step, see the value of the intermediate variable. \end{enumerate} %\clearpage -The first step is to set the compilation mode by adding the following command to CMakeLists.txt: +The first step is to set the compilation mode by adding the following command to ``CMakeLists.txt'': \begin{lstlisting}[caption=slambook2/ch2/CMakeLists.txt] Set( CMAKE_BUILD_TYPE "Debug" ) \end{lstlisting} -Cmake has some compilation-related built-in variables that give you more detailed control over the compilation process. For the compilation type, there is usually a Debug mode for debugging and a Release mode for publishing. In Debug mode, the program runs slower, but breakpoint debugging is possible, and you can see the values of the variables; while Release mode is faster, but there is probably no debugging information. We set the program to Debug mode and place the breakpoint. Next, tell KDevelop which program you want to launch. +CMake has some compilation-related built-in variables that give you more detailed control over the compilation process. For the compilation type, there is usually a Debug mode for debugging and a Release mode for publishing. In Debug mode, the program runs slower, but breakpoint debugging is possible, and you can see the values of the variables; while Release mode is faster, but there is probably no debugging information. We set the program to Debug mode and place the breakpoint. Next, tell KDevelop which program you want to launch. -In the second step, open ``Run $\rightarrow$Configure Launcher'' and click on ``Add New $\rightarrow$ Application'' on the left. In this step, our task is to tell KDevelop which program to launch. As shown in \autoref{fig:launchConfigure}, you can either select a cmake project target (that is, the executable we built with the add\_executable directive) or point to a binary file. The second approach is recommended, and in our experience, this is less of a problem. +In the second step, open ``Run $\rightarrow$Configure Launcher'' and click on ``Add New $\rightarrow$ Application'' on the left. In this step, our task is to tell KDevelop which program to launch. As shown in \autoref{fig:launchConfigure}, you can either select a CMake project target (that is, the executable we built with the add\_executable directive) or point to a binary file. The second approach is recommended, and in our experience, this is less of a problem. \begin{figure}[!ht] \centering @@ -497,7 +497,7 @@ \subsubsection{Use KDE} In the second column, you can set the program's parameters and working directory. Sometimes programs have runtime parameters that are passed in as arguments to the main function. If not, leave it blank, as is the working directory. After configuring these two items, click the ``OK'' button to save the configuration results. -In just these steps we have configured an application startup item. For each startup item, we can click the ``Execute'' button to start the program directly, or click the ``Debug'' button to debug it. Readers can try to click the ``Execute'' button to see the results of the output. Now, to debug this program, click on the left side of the printHello line and add a breakpoint. Then, click on the ``Debug'' button and the program will wait at the breakpoint, as shown by \autoref{fig:debug}. +In just these steps we have configured an application startup item. For each startup item, we can click the ``Execute'' button to start the program directly, or click the ``Debug'' button to debug it. Readers can try to click the ``Execute'' button to see the results of the output. Now, to debug this program, click on the left side of the ``printHello'' line and add a breakpoint. Then, click on the ``Debug'' button and the program will wait at the breakpoint, as shown by \autoref{fig:debug}. \begin{figure}[!htp] \centering @@ -518,11 +518,11 @@ \subsubsection{Use Clion} \label{fig:clion} \end{figure} -Clion is more complete than KDevelop, but it requires a user account, and the memory/CPU requirements for the host will be higher. \footnote{CLion is abnormally slow in the version after 2018. It is recommended that you use the release version around 2017. }. In Clion, you can also open a CMakeLists.txt or specify a directory. Clion will complete the cmake-make process for you. Its running interface is shown in \autoref{fig:clion}. +Clion is more complete than KDevelop, but it requires a user account, and the memory/CPU requirements for the host will be higher. \footnote{CLion is abnormally slow in the version after 2018. It is recommended that you use the release version around 2017. }. In Clion, you can also open a ``CMakeLists.txt'' or specify a directory. Clion will complete the \textit{cmake-make} process for you. Its running interface is shown in \autoref{fig:clion}. Similarly, after opening Clion, you can select the programs you want to run or debug in the upper right corner of the interface, and adjust their startup parameters and working directory. Click the small beetle button in this column to start the breakpoint debugging mode. Clion also has several convenient features, such as automatically creating classes, changing functions, and automatically adjusting the coding style. Please try it. -Ok, if you are already familiar with the use of the IDE, then the second chapter will stop here. You may already feel that I have talked too much, so in the following practice section, we will not introduce things like how to create a new build folder, call the cmake and make commands to compile the program. I believe that readers should master these simple steps. Similarly, since most of the third-party libraries used in this book are cmake projects, you will continue to be familiar with the compilation process. Next, we will start the formal chapter and introduce some related mathematics. +Ok, if you are already familiar with the use of the IDE, then the second chapter will stop here. You may already feel that I have talked too much, so in the following practice section, we will not introduce things like how to create a new build folder, call the \textit{cmake} and \textit{make} commands to compile the program. I believe that readers should master these simple steps. Similarly, since most of the third-party libraries used in this book are cmake projects, you will continue to be familiar with the compilation process. Next, we will start the formal chapter and introduce some related mathematics. \section*{Exercises} \begin{enumerate} @@ -532,9 +532,9 @@ \section*{Exercises} \item Use the build folder to compile your CMake project, then try it in KDevelop. \item Deliberately add some syntax errors to the code to see what information the build will generate. Can you read the error message of g++? \item If you forgot to link the library to the executable, will the compiler report an error? What kind of mistakes are reported? - \item[\optional] Read ``cmake practice'' (or other cmake materials) to learn about the grammars of cmake. + \item[\optional] Read ``CMake practice'' (or other materials) to learn about the grammars of CMake. \item[\optional] Improve the hello SLAM problem, make it a small library, and install it on your local hard drive. Then, create a new project, use find\_package to find the library, and call it. - \item[\optional] Read other cmake instructional materials, such as \url{https://github.com/TheErk/CMake-tutorial}. + \item[\optional] Read other CMake instructional materials, such as \url{https://github.com/TheErk/CMake-tutorial}. \item Find the official website of KDevelop and see what other features it has. Are you using it? \item If you learned Vim in the last lecture, please try KDevelop's/Clion's Vim editing function. \end{enumerate} \ No newline at end of file diff --git a/slambook-en.pdf b/slambook-en.pdf index 0154053..c904aa2 100644 Binary files a/slambook-en.pdf and b/slambook-en.pdf differ diff --git a/slambook-en_reviewed.pdf b/slambook-en_reviewed.pdf index faafe3a..7fba635 100644 Binary files a/slambook-en_reviewed.pdf and b/slambook-en_reviewed.pdf differ