diff --git a/docs/index.rst b/docs/index.rst index 32f766417..0fee4f3fd 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -18,6 +18,7 @@ Kaolin library is part of a larger suite of tools for 3D deep learning research. notes/checkpoints notes/diff_render notes/spc_summary + notes/camera_summary .. toctree:: :titlesonly: diff --git a/docs/notes/camera_summary.rst b/docs/notes/camera_summary.rst new file mode 100644 index 000000000..11d1367c4 --- /dev/null +++ b/docs/notes/camera_summary.rst @@ -0,0 +1,237 @@ +Camera summary +************** + +.. _camera_summary: + +Camera class +============ + +.. _camera_class: + +:class:`kaolin.render.camera.Camera` is a one-stop class for all camera related differentiable / non-differentiable transformations. +Camera objects are represented by *batched* instances of 2 submodules: + + - :ref:`CameraExtrinsics `: The extrinsics properties of the camera (position, orientation). + These are usually embedded in the view matrix, used to transform vertices from world space to camera space. + - :ref:`CameraIntrinsics `: The intrinsics properties of the lens + (such as field of view / focal length in the case of pinhole cameras). + Intrinsics parameters vary between different lens type, + and therefore multiple CameraIntrinsics subclasses exist, + to support different types of cameras: pinhole / perspective, orthographic, fisheye, and so forth. + For pinehole and orthographic lens, the intrinsics are embedded in a projection matrix. + The intrinsics module can be used to transform vertices from camera space to Normalized Device Coordinates. + +.. note:: + To avoid tedious invocation of camera functions through + ``camera.extrinsics.someop()`` and ``camera.intrinsics.someop()``, kaolin overrides the ``__get_attributes__`` + function to forward any function calls of ``camera.someop()`` to + the appropriate extrinsics / intrinsics submodule. + +The entire pipeline of transformations can be summarized as (ignoring homogeneous coordinates):: + + World Space Camera View Space + V ---CameraExtrinsics.transform()---> V' ---CameraIntrinsics.transform()--- + Shape~(B, 3) (view matrix) Shape~(B, 3) | + | + (linear lens: projection matrix) | + + homogeneus -> 3D | + V + Normalized Device Coordinates (NDC) + Shape~(B, 3) + When using view / projection matrices, conversion to homogeneous coordinates is required. + Alternatively, the `transform()` function takes care of such projections under the hood when needed. + +How to apply transformations with kaolin's Camera: + 1. Linear camera types, such as the commonly used pinhole camera, + support the :func:`view_projection_matrix()` method. + The returned matrix can be used to transform vertices through pytorch's matrix multiplication, or even be + passed to shaders as a uniform. + 2. All Cameras are guaranteed to support a general :func:`transform()` function + which maps coordinates from world space to Normalized Device Coordinates space. + For some lens types which perform non linear transformations, + the :func:`view_projection_matrix()` is non-defined. + Therefore the camera transformation must be applied through + a dedicated function. For linear cameras, + :func:`transform()` may use matrices under the hood. + 3. Camera parameters may also be queried directly. + This is useful when implementing camera params aware code such as ray tracers. +How to control kaolin's Camera: + - :class:`CameraExtrinsics`: is packed with useful methods for controlling the camera position and orientation: + :func:`translate() `, + :func:`rotate() `, + :func:`move_forward() `, + :func:`move_up() `, + :func:`move_right() `, + :func:`cam_pos() `, + :func:`cam_up() `, + :func:`cam_forward() `, + :func:`cam_up() `. + - :class:`CameraIntrinsics`: exposes a lens :func:`zoom() ` + operation. The exact functionality depends on the camera type. +How to optimize the Camera parameters: + - Both :class:`CameraExtrinsics`: and :class:`CameraIntrinsics` maintain + :class:`torch.Tensor` buffers of parameters which support pytorch differentiable operations. + - Setting ``camera.requires_grad_(True)`` will turn on the optimization mode. + - The :func:`gradient_mask` function can be used to mask out gradients of specific Camera parameters. + + .. note:: + :class:`CameraExtrinsics`: supports multiple representions of camera parameters + (see: :func:`switch_backend `). + Specific representations are better fit for optimization + (e.g.: they maintain an orthogonal view matrix). + Kaolin will automatically switch to using those representations when gradient flow is enabled + For non-differentiable uses, the default representation may provide better + speed and numerical accuracy. + +Other useful camera properties: + - Cameras follow pytorch in part, and support arbitrary ``dtype`` and ``device`` types through the + :func:`to()`, :func:`cpu()`, :func:`cuda()`, :func:`half()`, :func:`float()`, :func:`double()` + methods and :func:`dtype`, :func:`device` properties. + - :class:`CameraExtrinsics`: and :class:`CameraIntrinsics`: individually support the :func:`requires_grad` + property. + - Cameras implement :func:`torch.allclose` for comparing camera parameters under controlled numerical accuracy. + The operator ``==`` is reserved for comparison by ref. + - Cameras support batching, either through construction, or through the :func:`cat()` method. + + .. note:: + Since kaolin's cameras are batched, the view/projection matrices are of shapes :math:`(\text{num_cameras}, 4, 4)`, + and some operations, such as :func:`transform()` may return values as shapes of :math:`(\text{num_cameras}, \text{num_vectors}, 3)`. + +Concluding remarks on coordinate systems and other confusing conventions: + - kaolin's Cameras assume column major matrices, for example, the inverse view matrix (cam2world) is defined as: + + .. math:: + \begin{bmatrix} + r1 & u1 & f1 & px \\ + r2 & u2 & f2 & py \\ + r3 & u3 & f3 & pz \\ + 0 & 0 & 0 & 1 + \end{bmatrix} + + This sometimes causes confusion as the view matrix (world2cam) uses a transposed 3x3 submatrix component, + which despite this transposition is still column major (observed through the last `t` column): + + .. math:: + \begin{bmatrix} + r1 & r2 & r3 & tx \\ + u1 & u2 & u3 & ty \\ + f1 & f2 & f3 & tz \\ + 0 & 0 & 0 & 1 + \end{bmatrix} + + - kaolin's cameras do not assume any specific coordinate system for the camera axes. By default, the + right handed cartesian coordinate system is used. Other coordinate systems are supported through + :func:`change_coordinate_system() ` + and the ``coordinates.py`` module:: + + Y + ^ + | + |---------> X + / + Z - kaolin's NDC space is assumed to be left handed (depth goes inwards to the screen). + + The default range of values is [-1, 1]. + +CameraExtrinsics class +====================== + +.. _camera_extrinsics_class: + + :class:`kaolin.render.camera.CameraExtrinsics` holds the extrinsics parameters of a camera: position and orientation in space. + + This class maintains the view matrix of camera, used to transform points from world coordinates + to camera / eye / view space coordinates. + + This view matrix maintained by this class is column-major, and can be described by the 4x4 block matrix: + + .. math:: + + \begin{bmatrix} + R & t \\ + 0 & 1 + \end{bmatrix} + + where **R** is a 3x3 rotation matrix and **t** is a 3x1 translation vector for the orientation and position + respectively. + + This class is batched and may hold information from multiple cameras. + + :class:`CameraExtrinsics` relies on a dynamic representation backend to manage the tradeoff between various choices + such as speed, or support for differentiable rigid transformations. + Parameters are stored as a single tensor of shape :math:`(\text{num_cameras}, K)`, + where K is a representation specific number of parameters. + Transformations and matrices returned by this class support differentiable torch operations, + which in turn may update the extrinsic parameters of the camera:: + + convert_to_mat + Backend ---- > Extrinsics + Representation R View Matrix M + Shape (num_cameras, K), Shape (num_cameras, 4, 4) + < ---- + convert_from_mat + + .. note:: + + Unless specified manually with :func:`switch_backend`, + kaolin will choose the optimal representation backend depending on the status of ``requires_grad``. + .. note:: + + Users should be aware, but not concerned about the conversion from internal representations to view matrices. + kaolin performs these conversions where and if needed. + + Supported backends: + + - **"matrix_se3"**\: A flattened view matrix representation, containing the full information of + special euclidean transformations (translations and rotations). + This representation is quickly converted to a view matrix, but differentiable ops may cause + the view matrix to learn an incorrect, non-orthogonal transformation. + - **"matrix_6dof_rotation"**\: A compact representation with 6 degrees of freedom, ensuring the view matrix + remains orthogonal under optimizations. The conversion to matrix requires a single Gram-Schmidt step. + + .. seealso:: + + `On the Continuity of Rotation Representations in Neural Networks, Zhou et al. 2019 + `_ + + Unless stated explicitly, the definition of the camera coordinate system used by this class is up to the + choice of the user. + Practitioners should be mindful of conventions when pairing the view matrix managed by this class with a projection + matrix. + +CameraIntrinsics class +====================== + +.. _camera_intrinsics_class: + + :class:`kaolin.render.camera.CameraIntrinsics` holds the intrinsics parameters of a camera: + how it should project from camera space to normalized screen / clip space. + + The instrinsics are determined by the camera type, meaning parameters may differ according to the lens structure. + Typical computer graphics systems commonly assume the intrinsics of a pinhole camera (see: :class:`PinholeIntrinsics` class). + One implication is that some camera types do not use a linear projection (i.e: Fisheye lens). + + There are therefore numerous ways to use CameraIntrinsics subclasses: + + 1. Access intrinsics parameters directly. + This may typically benefit use cases such as ray generators. + 2. The :func:`transform()` method is supported by all CameraIntrinsics subclasses, + both linear and non-linear transformations, to project vectors from camera space to normalized screen space. + This method is implemented using differential pytorch operations. + 3. Certain CameraIntrinsics subclasses which perform linear projections, may expose the transformation matrix + via dedicated methods. + For example, :class:`PinholeIntrinsics` exposes a :func:`projection_matrix()` method. + This may typically be useful for rasterization based rendering pipelines (i.e: OpenGL vertex shaders). + + This class is batched and may hold information from multiple cameras. + Parameters are stored as a single tensor of shape :math:`(\text{num_cameras}, K)` where K is the number of + intrinsic parameters. + + currently there are two subclasses of intrinsics: :class:`kaolin.render.camera.OrthographicIntrinsics` and + :class:`kaolin.render.camera.PinholeIntrinsics`. + +API Documentation: +------------------ + +* Check all the camera classes and functions at the :ref:`API documentation`. + diff --git a/docs/notes/tutorial_index.rst b/docs/notes/tutorial_index.rst index 75d6b391d..187e462b4 100644 --- a/docs/notes/tutorial_index.rst +++ b/docs/notes/tutorial_index.rst @@ -82,7 +82,7 @@ Simple Recipes * `spc_trilinear_interp.py `_: computing trilinear interpolation of a point cloud on an SPC * Visualization: * `visualize_main.py `_: using Timelapse API to write mock 3D checkpoints - * `fast_mesh_sampling.py _`: Using CachedDataset to preprocess a ShapeNet dataset we can sample point clouds efficiently at runtime + * `fast_mesh_sampling.py `_: Using CachedDataset to preprocess a ShapeNet dataset we can sample point clouds efficiently at runtime * Camera: * `cameras_differentiable.py `_: optimize a camera position * `camera_transforms.py `_: using :func:`Camera.transform()` function diff --git a/kaolin/metrics/trianglemesh.py b/kaolin/metrics/trianglemesh.py index c215b7415..9588c4164 100644 --- a/kaolin/metrics/trianglemesh.py +++ b/kaolin/metrics/trianglemesh.py @@ -18,10 +18,16 @@ from ..ops.mesh import uniform_laplacian def point_to_mesh_distance(pointclouds, face_vertices): - r"""Computes the distances from pointclouds to meshes (represented by vertices and faces.) + r"""Computes the distances from pointclouds to meshes (represented by vertices and faces). + For each point in the pointcloud, it finds the nearest triangle in the mesh, and calculated its distance to that triangle. + .. note:: + + The calculated distance is the squared euclidean distance. + + Type 0 indicates the distance is from a point on the surface of the triangle. Type 1 to 3 indicates the distance is from a point to a vertices. @@ -33,7 +39,7 @@ def point_to_mesh_distance(pointclouds, face_vertices): pointclouds, of shape :math:`(\text{batch_size}, \text{num_points}, 3)`. face_vertices (torch.Tensor): vertices of each face of meshes, - of shape :math:`(\text{batch_size}, \text{num_faces}, 3, 3})`. + of shape :math:`(\text{batch_size}, \text{num_faces}, 3, 3)`. Returns: (torch.Tensor, torch.LongTensor, torch.IntTensor): @@ -147,7 +153,7 @@ def _unbatched_naive_point_to_mesh_distance(points, face_vertices): Args: points (torch.Tensor): of shape (num_points, 3). - faces_vertices (torch.LongTensor): of shape (num_faces, 3, 3). + face_vertices (torch.LongTensor): of shape (num_faces, 3, 3). Returns: (torch.Tensor, torch.LongTensor, torch.IntTensor): diff --git a/kaolin/ops/conversions/tetmesh.py b/kaolin/ops/conversions/tetmesh.py index 6ef9ac08c..528330668 100644 --- a/kaolin/ops/conversions/tetmesh.py +++ b/kaolin/ops/conversions/tetmesh.py @@ -121,8 +121,8 @@ def marching_tetrahedra(vertices, tets, sdf, return_tet_idx=False): Args: vertices (torch.tensor): batched vertices of tetrahedral meshes, of shape :math:`(\text{batch_size}, \text{num_vertices}, 3)`. - faces (torch.tensor): unbatched tetrahedral mesh topology, of shape - :math:`(\text{num_tetrahedrons}, 4)`. + tets (torch.tensor): unbatched tetrahedral mesh topology, of shape + :math:`(\text{num_tetrahedrons}, 4)`. sdf (torch.tensor): batched SDFs which specify the SDF value of each vertex, of shape :math:`(\text{batch_size}, \text{num_vertices})`. return_tet_idx (optional, bool): if True, return index of tetrahedron diff --git a/kaolin/ops/spc/spc.py b/kaolin/ops/spc/spc.py index 476f69d4b..977996fbb 100644 --- a/kaolin/ops/spc/spc.py +++ b/kaolin/ops/spc/spc.py @@ -268,7 +268,9 @@ def unbatched_query(octree, exsum, query_coords, level, with_parents=False): to only a single level (default: False). Returns: - pidx (torch.LongTensor): The indices into the point hierarchy of shape :math:`(\text{num_query})`. + pidx (torch.LongTensor): + + The indices into the point hierarchy of shape :math:`(\text{num_query})`. If with_parents is True, then the shape will be :math:`(\text{num_query, level+1})`. Examples: diff --git a/setup.py b/setup.py index a4f09319d..ad56aacce 100644 --- a/setup.py +++ b/setup.py @@ -29,7 +29,7 @@ ) else: import torch - torch_ver = parse_version(torch.__version__) + torch_ver = parse_version(parse_version(torch.__version__).base_version) if (torch_ver < parse_version(TORCH_MIN_VER) or torch_ver > parse_version(TORCH_MAX_VER)): if IGNORE_TORCH_VER: @@ -178,9 +178,12 @@ def write_version_file(): def get_requirements(): requirements = [] - requirements.append('scipy>=1.2.0,<=1.7.2') requirements.append('Pillow>=8.0.0') requirements.append('tqdm>=4.51.0') + if sys.version_info < (3, 8): + requirements.append('scipy>=1.2.0,<=1.7.3') + else: + requirements.append('scipy>=1.2.0') if sys.version_info >= (3, 10): warnings.warn("usd-core is not compatible with python_version >= 3.10 " "and won't be installed, please use supported python_version "