From da2ac91256d627813e1ae0b2bbf076981ddfa3fd Mon Sep 17 00:00:00 2001 From: vasu Date: Wed, 20 Mar 2024 17:12:43 +0530 Subject: [PATCH 1/9] rendering fixes-v1 --- .../3d_measurements_stereo_vision.mdx | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx b/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx index 78461b583..89e7ffacf 100644 --- a/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx +++ b/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx @@ -67,11 +67,11 @@ With the above configuration in place, we have the below equations which map a p 2. \\(v\_right = f\_y * \frac{y}{z} + O\_y\\) Different symbols used in above equations are defined below: -* \\(u_left\\), \\(v_left\\) refer to pixel coordinates of point P in the left image -* \\(u_right\\), \\(v_right\\) refer to pixel coordinates of point P in the right image -* \\(f_X\\) refers to the focal length (in pixels) in x direction and \\(f_y\\) refers to the focal length (in pixels) in y direction. Actually, there is only 1 focal length for a camera which is the distance between the pinhole (/ optical center of the lens) to the image plane. However, pixels may be rectangular and not perfect squares, resulting in different fx and fy values when we represent f in terms of pixels. +* \\(u\_left\\), \\(v\_left\\) refer to pixel coordinates of point P in the left image +* \\(u\_right\\), \\(v\_right\\) refer to pixel coordinates of point P in the right image +* \\(f\_X\\) refers to the focal length (in pixels) in x direction and \\(f\_y\\) refers to the focal length (in pixels) in y direction. Actually, there is only 1 focal length for a camera which is the distance between the pinhole (/ optical center of the lens) to the image plane. However, pixels may be rectangular and not perfect squares, resulting in different fx and fy values when we represent f in terms of pixels. * x,y,z are 3D coordinates of the point P (any unit like cm, feet, etc can be used) -* \\(0_X\\) and \\(0_y\\) refer to pixel coordinates of the principal point +* \\(0\_X\\) and \\(0\_y\\) refer to pixel coordinates of the principal point * b is called the baseline and refers to the distance between the left and right cameras. Same units are used for both b and x,y,z coordinates (any unit like cm, feet, etc can be used) We have 4 equations above and 3 unknowns - x, y and z coordinates of a 3D point P. Intrinsic camera parameters - focal lengths and principal point are assumed to be known. Equations 1.2 and 2.2 indicate that the v coordinate value in the left and right images is the same. From 0c5b22327b106e50ea54cf7e2d0f0290961a0993 Mon Sep 17 00:00:00 2001 From: vasu Date: Wed, 20 Mar 2024 20:15:57 +0530 Subject: [PATCH 2/9] rendering fixes -v2 --- .../3d_measurements_stereo_vision.mdx | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx b/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx index 89e7ffacf..6ad39282c 100644 --- a/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx +++ b/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx @@ -60,23 +60,27 @@ Figure 2: Image formation using 2 cameras With the above configuration in place, we have the below equations which map a point in 3D to the image plane in 2D. 1. Left camera + 1. \\(u\_left = f\_x * \frac{x}{z} + O\_x\\) 2. \\(v\_left = f\_y * \frac{y}{z} + O\_y\\) + 2. Right camera + 1. \\(u\_right = f\_x * \frac{x-b}{z} + O\_x\\) 2. \\(v\_right = f\_y * \frac{y}{z} + O\_y\\) Different symbols used in above equations are defined below: + * \\(u\_left\\), \\(v\_left\\) refer to pixel coordinates of point P in the left image * \\(u\_right\\), \\(v\_right\\) refer to pixel coordinates of point P in the right image -* \\(f\_X\\) refers to the focal length (in pixels) in x direction and \\(f\_y\\) refers to the focal length (in pixels) in y direction. Actually, there is only 1 focal length for a camera which is the distance between the pinhole (/ optical center of the lens) to the image plane. However, pixels may be rectangular and not perfect squares, resulting in different fx and fy values when we represent f in terms of pixels. +* \\(f\_x\\) refers to the focal length (in pixels) in x direction and \\(f\_y\\) refers to the focal length (in pixels) in y direction. Actually, there is only 1 focal length for a camera which is the distance between the pinhole (optical center of the lens) to the image plane. However, pixels may be rectangular and not perfect squares, resulting in different fx and fy values when we represent f in terms of pixels. * x,y,z are 3D coordinates of the point P (any unit like cm, feet, etc can be used) -* \\(0\_X\\) and \\(0\_y\\) refer to pixel coordinates of the principal point +* \\(0\_x\\) and \\(0\_y\\) refer to pixel coordinates of the principal point * b is called the baseline and refers to the distance between the left and right cameras. Same units are used for both b and x,y,z coordinates (any unit like cm, feet, etc can be used) We have 4 equations above and 3 unknowns - x, y and z coordinates of a 3D point P. Intrinsic camera parameters - focal lengths and principal point are assumed to be known. Equations 1.2 and 2.2 indicate that the v coordinate value in the left and right images is the same. -3. \\(v\_left = v\_right\\) +3. \\(v\_left = v\_right\\) Using equations 1.1, 1.2 and 2.1 we can derive the x,y,z coordinates of point P. @@ -139,7 +143,7 @@ Annotated Right Image ### 3D Coordinate Calculations Twelve points are selected in the scene, and their (u,v) values in the left and right images are tabulated below. Using equations 4, 5, and 6, (x,y,z) coordinates for these points are also calculated and tabulated below. X and Y coordinates concerning the left camera, and the origin is at the left camera's pinhole (or optical center of the lens). Therefore, 3D points left and above the pinhole have negative X and Y values, respectively. -| point | \\(u_left\\) | \\(v_left\\) | \\(u_right\\) | \\(v_right\\) | depth/z(cm) | \\(x_wrt_left\\)| \\(y_wrt_left\\) | +| point | \\(u\_left\\) | \\(v\_left\\) | \\(u\_right\\) | \\(v\_right\\) | depth/z(cm) | \\(x\_wrt\_left\\)| \\(y\_wrt\_left\\) | |:--------:|:---------:|:---------:|:----------:|:----------:|:--------------:|:-----------------:|:-----------------:| | pt1 | 138 | 219 | 102 | 219 | 94.36 | -33.51 | -5.53 | | pt2 | 264 | 216 | 234 | 217 | 113.23 | -8.72 | -7.38 | From 5b92275949780135dbefa2abba94848429cbbd5a Mon Sep 17 00:00:00 2001 From: vasu Date: Wed, 20 Mar 2024 20:26:29 +0530 Subject: [PATCH 3/9] rendering test -v3 --- .../3d_measurements_stereo_vision.mdx | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx b/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx index 6ad39282c..9fae74969 100644 --- a/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx +++ b/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx @@ -60,18 +60,21 @@ Figure 2: Image formation using 2 cameras With the above configuration in place, we have the below equations which map a point in 3D to the image plane in 2D. 1. Left camera - 1. \\(u\_left = f\_x * \frac{x}{z} + O\_x\\) 2. \\(v\_left = f\_y * \frac{y}{z} + O\_y\\) + 3. \\(u\_left = f\_x * \frac{x}{z} + O\_x)\\ + 4. \\(u\_left = f\_x * \frac{x}{z} + O\_x)\\ 2. Right camera - 1. \\(u\_right = f\_x * \frac{x-b}{z} + O\_x\\) 2. \\(v\_right = f\_y * \frac{y}{z} + O\_y\\) Different symbols used in above equations are defined below: -* \\(u\_left\\), \\(v\_left\\) refer to pixel coordinates of point P in the left image +* \\(u\_left\\), \\(v\_left\\) refer to pixel coordinates of point P in the left image + +* \\(u\_left\\), \\(v\_left)\\ refer DELME to pixel coordinates of point P in the left image + * \\(u\_right\\), \\(v\_right\\) refer to pixel coordinates of point P in the right image * \\(f\_x\\) refers to the focal length (in pixels) in x direction and \\(f\_y\\) refers to the focal length (in pixels) in y direction. Actually, there is only 1 focal length for a camera which is the distance between the pinhole (optical center of the lens) to the image plane. However, pixels may be rectangular and not perfect squares, resulting in different fx and fy values when we represent f in terms of pixels. * x,y,z are 3D coordinates of the point P (any unit like cm, feet, etc can be used) From 5dea5c1b4ea21af293d99be9401926253674cac4 Mon Sep 17 00:00:00 2001 From: vasu Date: Wed, 20 Mar 2024 20:38:16 +0530 Subject: [PATCH 4/9] rendering more testing -v4 --- .../3d_measurements_stereo_vision.mdx | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx b/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx index 9fae74969..5d8269ef3 100644 --- a/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx +++ b/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx @@ -59,22 +59,16 @@ Figure 2: Image formation using 2 cameras With the above configuration in place, we have the below equations which map a point in 3D to the image plane in 2D. -1. Left camera - 1. \\(u\_left = f\_x * \frac{x}{z} + O\_x\\) - 2. \\(v\_left = f\_y * \frac{y}{z} + O\_y\\) - 3. \\(u\_left = f\_x * \frac{x}{z} + O\_x)\\ - 4. \\(u\_left = f\_x * \frac{x}{z} + O\_x)\\ - +1. Left camera + 1. \\(u\_left = f\_x * \frac{x}{z} + O\_x\\) + 2. \\(v\_left = f\_y * \frac{y}{z} + O\_y\\) + 2. Right camera - 1. \\(u\_right = f\_x * \frac{x-b}{z} + O\_x\\) - 2. \\(v\_right = f\_y * \frac{y}{z} + O\_y\\) + 1. \\(u\_right = f\_x * \frac{x-b}{z} + O\_x\\) + 2. \\(v\_right = f\_y * \frac{y}{z} + O\_y\\) Different symbols used in above equations are defined below: - * \\(u\_left\\), \\(v\_left\\) refer to pixel coordinates of point P in the left image - -* \\(u\_left\\), \\(v\_left)\\ refer DELME to pixel coordinates of point P in the left image - * \\(u\_right\\), \\(v\_right\\) refer to pixel coordinates of point P in the right image * \\(f\_x\\) refers to the focal length (in pixels) in x direction and \\(f\_y\\) refers to the focal length (in pixels) in y direction. Actually, there is only 1 focal length for a camera which is the distance between the pinhole (optical center of the lens) to the image plane. However, pixels may be rectangular and not perfect squares, resulting in different fx and fy values when we represent f in terms of pixels. * x,y,z are 3D coordinates of the point P (any unit like cm, feet, etc can be used) From ccacff6ba47df1883ee429cbad38f653e78342e1 Mon Sep 17 00:00:00 2001 From: vasu Date: Wed, 20 Mar 2024 20:46:53 +0530 Subject: [PATCH 5/9] rendering updates -v5 --- .../3d_measurements_stereo_vision.mdx | 32 +++++++++---------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx b/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx index 5d8269ef3..85e1f4253 100644 --- a/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx +++ b/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx @@ -60,30 +60,30 @@ Figure 2: Image formation using 2 cameras With the above configuration in place, we have the below equations which map a point in 3D to the image plane in 2D. 1. Left camera - 1. \\(u\_left = f\_x * \frac{x}{z} + O\_x\\) - 2. \\(v\_left = f\_y * \frac{y}{z} + O\_y\\) + 1. \\(u\_left = f\_x * \frac{x}{z} + O\_x\\) + 2. \\(v\_left = f\_y * \frac{y}{z} + O\_y\\) 2. Right camera 1. \\(u\_right = f\_x * \frac{x-b}{z} + O\_x\\) 2. \\(v\_right = f\_y * \frac{y}{z} + O\_y\\) -Different symbols used in above equations are defined below: -* \\(u\_left\\), \\(v\_left\\) refer to pixel coordinates of point P in the left image -* \\(u\_right\\), \\(v\_right\\) refer to pixel coordinates of point P in the right image -* \\(f\_x\\) refers to the focal length (in pixels) in x direction and \\(f\_y\\) refers to the focal length (in pixels) in y direction. Actually, there is only 1 focal length for a camera which is the distance between the pinhole (optical center of the lens) to the image plane. However, pixels may be rectangular and not perfect squares, resulting in different fx and fy values when we represent f in terms of pixels. -* x,y,z are 3D coordinates of the point P (any unit like cm, feet, etc can be used) -* \\(0\_x\\) and \\(0\_y\\) refer to pixel coordinates of the principal point -* b is called the baseline and refers to the distance between the left and right cameras. Same units are used for both b and x,y,z coordinates (any unit like cm, feet, etc can be used) - +Different symbols used in above equations are defined below: +* \\(u\_left\\), \\(v\_left\\) refer to pixel coordinates of point P in the left image +* \\(u\_right\\), \\(v\_right\\) refer to pixel coordinates of point P in the right image +* \\(f\_x\\) refers to the focal length (in pixels) in x direction and \\(f\_y\\) refers to the focal length (in pixels) in y direction. Actually, there is only 1 focal length for a camera which is the distance between the pinhole (optical center of the lens) to the image plane. However, pixels may be rectangular and not perfect squares, resulting in different fx and fy values when we represent f in terms of pixels. +* x,y,z are 3D coordinates of the point P (any unit like cm, feet, etc can be used) +* \\(0\_x\\) and \\(0\_y\\) refer to pixel coordinates of the principal point +* b is called the baseline and refers to the distance between the left and right cameras. Same units are used for both b and x,y,z coordinates (any unit like cm, feet, etc can be used) + We have 4 equations above and 3 unknowns - x, y and z coordinates of a 3D point P. Intrinsic camera parameters - focal lengths and principal point are assumed to be known. Equations 1.2 and 2.2 indicate that the v coordinate value in the left and right images is the same. -3. \\(v\_left = v\_right\\) - -Using equations 1.1, 1.2 and 2.1 we can derive the x,y,z coordinates of point P. +3. \\(v\_left = v\_right\\) -4. \\(x = \frac{b * (u\_left - O\_x)}{u\_left - u\_right}\\) -5. \\(y = \frac{b * f\_x * (v\_left - O\_y)}{ f\_y * (u\_left - u\_right)}\\) -6. \\(z = \frac{b * f\_x}{u\_left - u\_right}\\) +Using equations 1.1, 1.2 and 2.1 we can derive the x,y,z coordinates of point P. + +4. \\(x = \frac{b * (u\_left - O\_x)}{u\_left - u\_right}\\) +5. \\(y = \frac{b * f\_x * (v\_left - O\_y)}{ f\_y * (u\_left - u\_right)}\\) +6. \\(z = \frac{b * f\_x}{u\_left - u\_right}\\) Note that the x and y values above concern the left camera since the origin of the coordinate system is aligned with the left camera. The above equations show that we can find 3D coordinates of a point P using its 2 images captured from 2 different camera locations. z value is also referred to as the depth value. Using this technique, we can find the depth values for different pixels within an image and their real-world x and y coordinates. We can also find real-world distances between different points in an image. From dbfdb4d5ac01d2ccbd9bc4d87ec1c8a038850e8e Mon Sep 17 00:00:00 2001 From: vasu Date: Wed, 20 Mar 2024 20:54:49 +0530 Subject: [PATCH 6/9] rendering updates -v6 --- .../3d_measurements_stereo_vision.mdx | 24 +++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx b/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx index 85e1f4253..96bfa2356 100644 --- a/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx +++ b/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx @@ -60,20 +60,20 @@ Figure 2: Image formation using 2 cameras With the above configuration in place, we have the below equations which map a point in 3D to the image plane in 2D. 1. Left camera - 1. \\(u\_left = f\_x * \frac{x}{z} + O\_x\\) - 2. \\(v\_left = f\_y * \frac{y}{z} + O\_y\\) + 1. \\(u\_left = f\_x * \frac{x}{z} + O\_x\\) + 2. \\(v\_left = f\_y * \frac{y}{z} + O\_y\\) -2. Right camera +2. Right camera 1. \\(u\_right = f\_x * \frac{x-b}{z} + O\_x\\) 2. \\(v\_right = f\_y * \frac{y}{z} + O\_y\\) Different symbols used in above equations are defined below: -* \\(u\_left\\), \\(v\_left\\) refer to pixel coordinates of point P in the left image -* \\(u\_right\\), \\(v\_right\\) refer to pixel coordinates of point P in the right image -* \\(f\_x\\) refers to the focal length (in pixels) in x direction and \\(f\_y\\) refers to the focal length (in pixels) in y direction. Actually, there is only 1 focal length for a camera which is the distance between the pinhole (optical center of the lens) to the image plane. However, pixels may be rectangular and not perfect squares, resulting in different fx and fy values when we represent f in terms of pixels. -* x,y,z are 3D coordinates of the point P (any unit like cm, feet, etc can be used) -* \\(0\_x\\) and \\(0\_y\\) refer to pixel coordinates of the principal point -* b is called the baseline and refers to the distance between the left and right cameras. Same units are used for both b and x,y,z coordinates (any unit like cm, feet, etc can be used) +* \\(u\_left\\), \\(v\_left\\) refer to pixel coordinates of point P in the left image +* \\(u\_right\\), \\(v\_right\\) refer to pixel coordinates of point P in the right image +* \\(f\_x\\) refers to the focal length (in pixels) in x direction and \\(f\_y\\) refers to the focal length (in pixels) in y direction. Actually, there is only 1 focal length for a camera which is the distance between the pinhole (optical center of the lens) to the image plane. However, pixels may be rectangular and not perfect squares, resulting in different fx and fy values when we represent f in terms of pixels. +* x,y,z are 3D coordinates of the point P (any unit like cm, feet, etc can be used) +* \\(0\_x\\) and \\(0\_y\\) refer to pixel coordinates of the principal point +* b is called the baseline and refers to the distance between the left and right cameras. Same units are used for both b and x,y,z coordinates (any unit like cm, feet, etc can be used) We have 4 equations above and 3 unknowns - x, y and z coordinates of a 3D point P. Intrinsic camera parameters - focal lengths and principal point are assumed to be known. Equations 1.2 and 2.2 indicate that the v coordinate value in the left and right images is the same. @@ -81,9 +81,9 @@ We have 4 equations above and 3 unknowns - x, y and z coordinates of a 3D point Using equations 1.1, 1.2 and 2.1 we can derive the x,y,z coordinates of point P. -4. \\(x = \frac{b * (u\_left - O\_x)}{u\_left - u\_right}\\) -5. \\(y = \frac{b * f\_x * (v\_left - O\_y)}{ f\_y * (u\_left - u\_right)}\\) -6. \\(z = \frac{b * f\_x}{u\_left - u\_right}\\) +4. \\(x = \frac{b * (u\_left - O\_x)}{u\_left - u\_right}\\) +5. \\(y = \frac{b * f\_x * (v\_left - O\_y)}{ f\_y * (u\_left - u\_right)}\\) +6. \\(z = \frac{b * f\_x}{u\_left - u\_right}\\) Note that the x and y values above concern the left camera since the origin of the coordinate system is aligned with the left camera. The above equations show that we can find 3D coordinates of a point P using its 2 images captured from 2 different camera locations. z value is also referred to as the depth value. Using this technique, we can find the depth values for different pixels within an image and their real-world x and y coordinates. We can also find real-world distances between different points in an image. From 1da7915625ddb620a6fe44640921f9eceb907c9c Mon Sep 17 00:00:00 2001 From: vasu Date: Wed, 20 Mar 2024 21:13:41 +0530 Subject: [PATCH 7/9] rendering updates v7 --- .../3d_measurements_stereo_vision.mdx | 32 +++++++++---------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx b/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx index 96bfa2356..bb109ddbe 100644 --- a/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx +++ b/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx @@ -59,21 +59,21 @@ Figure 2: Image formation using 2 cameras With the above configuration in place, we have the below equations which map a point in 3D to the image plane in 2D. -1. Left camera +1. Left camera 1. \\(u\_left = f\_x * \frac{x}{z} + O\_x\\) - 2. \\(v\_left = f\_y * \frac{y}{z} + O\_y\\) + 2. \\(v\_left = f\_y * \frac{y}{z} + O\_y\\) -2. Right camera - 1. \\(u\_right = f\_x * \frac{x-b}{z} + O\_x\\) - 2. \\(v\_right = f\_y * \frac{y}{z} + O\_y\\) +2. Right camera + 1. \\(u\_right = f\_x * \frac{x-b}{z} + O\_x\\) + 2. \\(v\_right = f\_y * \frac{y}{z} + O\_y\\) Different symbols used in above equations are defined below: -* \\(u\_left\\), \\(v\_left\\) refer to pixel coordinates of point P in the left image -* \\(u\_right\\), \\(v\_right\\) refer to pixel coordinates of point P in the right image -* \\(f\_x\\) refers to the focal length (in pixels) in x direction and \\(f\_y\\) refers to the focal length (in pixels) in y direction. Actually, there is only 1 focal length for a camera which is the distance between the pinhole (optical center of the lens) to the image plane. However, pixels may be rectangular and not perfect squares, resulting in different fx and fy values when we represent f in terms of pixels. -* x,y,z are 3D coordinates of the point P (any unit like cm, feet, etc can be used) -* \\(0\_x\\) and \\(0\_y\\) refer to pixel coordinates of the principal point -* b is called the baseline and refers to the distance between the left and right cameras. Same units are used for both b and x,y,z coordinates (any unit like cm, feet, etc can be used) +* \\(u\_left\\), \\(v\_left\\) refer to pixel coordinates of point P in the left image +* \\(u\_right\\), \\(v\_right\\) refer to pixel coordinates of point P in the right image +* \\(f\_x\\) refers to the focal length (in pixels) in x direction and \\(f\_y\\) refers to the focal length (in pixels) in y direction. Actually, there is only 1 focal length for a camera which is the distance between the pinhole (optical center of the lens) to the image plane. However, pixels may be rectangular and not perfect squares, resulting in different fx and fy values when we represent f in terms of pixels. +* x,y,z are 3D coordinates of the point P (any unit like cm, feet, etc can be used) +* \\O\_x\\) and \\(O\_y\\) refer to pixel coordinates of the principal point +* b is called the baseline and refers to the distance between the left and right cameras. Same units are used for both b and x,y,z coordinates (any unit like cm, feet, etc can be used) We have 4 equations above and 3 unknowns - x, y and z coordinates of a 3D point P. Intrinsic camera parameters - focal lengths and principal point are assumed to be known. Equations 1.2 and 2.2 indicate that the v coordinate value in the left and right images is the same. @@ -81,9 +81,9 @@ We have 4 equations above and 3 unknowns - x, y and z coordinates of a 3D point Using equations 1.1, 1.2 and 2.1 we can derive the x,y,z coordinates of point P. -4. \\(x = \frac{b * (u\_left - O\_x)}{u\_left - u\_right}\\) +4. \\(x = \frac{b * (u\_left - O\_x)}{u\_left - u\_right}\\) 5. \\(y = \frac{b * f\_x * (v\_left - O\_y)}{ f\_y * (u\_left - u\_right)}\\) -6. \\(z = \frac{b * f\_x}{u\_left - u\_right}\\) +6. \\(z = \frac{b * f\_x}{u\_left - u\_right}\\) Note that the x and y values above concern the left camera since the origin of the coordinate system is aligned with the left camera. The above equations show that we can find 3D coordinates of a point P using its 2 images captured from 2 different camera locations. z value is also referred to as the depth value. Using this technique, we can find the depth values for different pixels within an image and their real-world x and y coordinates. We can also find real-world distances between different points in an image. @@ -106,7 +106,7 @@ Raw Right Image ![Raw Stacked Left and Right Images ](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/3d_stereo_vision_images/unrectified_stacked_frames.jpg?download=true) Raw Stacked Left and Right Images -Let's focus on a single point - the top left corner of the laptop. As per equation 3 above,\\(v\_left = v\_right\\) for the same point in the left and right images. However, notice that the red line, which is at a constant v value, touches the top-left corner of the laptop in the left image but misses this point by a few pixels in the right image. There are two main reasons for this discrepancy: +Let's focus on a single point - the top left corner of the laptop. As per equation 3 above, \\(v\_left = v\_right\\) for the same point in the left and right images. However, notice that the red line, which is at a constant v value, touches the top-left corner of the laptop in the left image but misses this point by a few pixels in the right image. There are two main reasons for this discrepancy: * The intrinsic parameters for the left and right cameras are different. The principal point for the left camera is at (319.13, 233.86), whereas it is (298.85, 245.52) for the right camera. The focal length for the left camera is 450.9, whereas it is 452.9 for the right camera. The values of fx are equal to fy for both the left and right cameras. These intrinsic parameters were read from the device using it's python API and could be different for different OAK-D Lite devices. * Left and right camera orientations differ slightly from the geometry of the simplified solution detailed above. @@ -123,7 +123,7 @@ Rectified Right Image Rectified and Stacked Left and Right Images ![Rectified and Stacked Left and Right Images](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/3d_stereo_vision_images/rectified_stacked_frames.jpg?download=true) -Let's also overlap the rectified left and right images to see the difference. We can see that the v values for different points remain mostly constant in the left and right images. However, the u values change, and this difference in the u values helps us find the depth information for different points in the scene, as shown in Equation 6 above. This difference in 'u' values \\(u\_left - u\_right\\) is called disparity, and we can notice that the disparity for points near the camera is greater compared to points further away. Depth z and disparity \\(u\_left - u\_right\\) are inversely proportional, as shown in equation 6. +Let's also overlap the rectified left and right images to see the difference. We can see that the v values for different points remain mostly constant in the left and right images. However, the u values change, and this difference in the u values helps us find the depth information for different points in the scene, as shown in Equation 6 above. This difference in 'u' values \\(u\_left - u\_right\\) is called disparity, and we can notice that the disparity for points near the camera is greater compared to points further away. Depth z and disparity \\(u\_left - u\_right\\) are inversely proportional, as shown in equation 6. Rectified and Overlapped Left and Right Images ![Rectified and Overlapped Left and Right Images](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/3d_stereo_vision_images/rectified_overlapping_frames.jpg?download=true) @@ -168,7 +168,7 @@ We can also compute 3D distances between different points using their (x,y,z) va | d5(9-10) | 16.9 | 16.7 | 1.2 | | d6(9-11) | 23.8 | 24 | 0.83 | -Calculated Dimension Results +Calculated Dimension Results ![Calculated Dimension Results] (https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/3d_stereo_vision_images/calculated_dim_results.png?download=true) ## Conclusion From 7ecdec0d483ec1d68c7cff669ff69f14ef04f311 Mon Sep 17 00:00:00 2001 From: vasu Date: Wed, 20 Mar 2024 21:30:02 +0530 Subject: [PATCH 8/9] rendering corrected -v8 --- .../3d_measurements_stereo_vision.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx b/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx index bb109ddbe..6a889a562 100644 --- a/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx +++ b/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx @@ -72,7 +72,7 @@ Different symbols used in above equations are defined below: * \\(u\_right\\), \\(v\_right\\) refer to pixel coordinates of point P in the right image * \\(f\_x\\) refers to the focal length (in pixels) in x direction and \\(f\_y\\) refers to the focal length (in pixels) in y direction. Actually, there is only 1 focal length for a camera which is the distance between the pinhole (optical center of the lens) to the image plane. However, pixels may be rectangular and not perfect squares, resulting in different fx and fy values when we represent f in terms of pixels. * x,y,z are 3D coordinates of the point P (any unit like cm, feet, etc can be used) -* \\O\_x\\) and \\(O\_y\\) refer to pixel coordinates of the principal point +* \\(O\_x\\) and \\(O\_y\\) refer to pixel coordinates of the principal point * b is called the baseline and refers to the distance between the left and right cameras. Same units are used for both b and x,y,z coordinates (any unit like cm, feet, etc can be used) We have 4 equations above and 3 unknowns - x, y and z coordinates of a 3D point P. Intrinsic camera parameters - focal lengths and principal point are assumed to be known. Equations 1.2 and 2.2 indicate that the v coordinate value in the left and right images is the same. @@ -169,7 +169,7 @@ We can also compute 3D distances between different points using their (x,y,z) va | d6(9-11) | 23.8 | 24 | 0.83 | Calculated Dimension Results -![Calculated Dimension Results] (https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/3d_stereo_vision_images/calculated_dim_results.png?download=true) +![Calculated Dimension Results](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/3d_stereo_vision_images/calculated_dim_results.png?download=true) ## Conclusion 1. In summary, we learned how stereo vision works, the equations used to find the real-world coordinates (x, y, z) of a point P given its two images captured from different viewpoints, and compared theoretical values with experimental results. From 28ebdf433b652f2776b09f08ed0776a555d5aed5 Mon Sep 17 00:00:00 2001 From: vasu Date: Wed, 20 Mar 2024 22:03:41 +0530 Subject: [PATCH 9/9] rendering update -v9 --- .../3d_measurements_stereo_vision.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx b/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx index 6a889a562..ac287651b 100644 --- a/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx +++ b/chapters/en/Unit 8 - 3D Vision, Scene Rendering and Reconstruction/3d_measurements_stereo_vision.mdx @@ -72,7 +72,7 @@ Different symbols used in above equations are defined below: * \\(u\_right\\), \\(v\_right\\) refer to pixel coordinates of point P in the right image * \\(f\_x\\) refers to the focal length (in pixels) in x direction and \\(f\_y\\) refers to the focal length (in pixels) in y direction. Actually, there is only 1 focal length for a camera which is the distance between the pinhole (optical center of the lens) to the image plane. However, pixels may be rectangular and not perfect squares, resulting in different fx and fy values when we represent f in terms of pixels. * x,y,z are 3D coordinates of the point P (any unit like cm, feet, etc can be used) -* \\(O\_x\\) and \\(O\_y\\) refer to pixel coordinates of the principal point +* \\(O\_x\\) and \\(O\_y\\) refer to pixel coordinates of the principal point * b is called the baseline and refers to the distance between the left and right cameras. Same units are used for both b and x,y,z coordinates (any unit like cm, feet, etc can be used) We have 4 equations above and 3 unknowns - x, y and z coordinates of a 3D point P. Intrinsic camera parameters - focal lengths and principal point are assumed to be known. Equations 1.2 and 2.2 indicate that the v coordinate value in the left and right images is the same.