Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the gradient of the NCC cost function missing the coefficient of 1/n? #1184

Open
will-zzy opened this issue Dec 6, 2024 · 6 comments
Open

Comments

@will-zzy
Copy link

will-zzy commented Dec 6, 2024

Describe the bug
Thank you for your work! I have recently been studying the implementation of surface evolution and am hoping to apply it to CUDA. However, when I derived the gradient of the NCC cost function, I noticed that the dZNCC at libs/MVS/SceneRefine.cpp line 868 is missing a coefficient (although this does not affect the optimization, as it can be considered part of the optimization step size).

const Real dZNCC((Real)imageA(r,c)*invSqrtVAVB - (Real)imageB(r,c)*ZNCCinvVB + imageMeanB(r,c)*ZNCCinvVB - imageMeanA(r,c)*invSqrtVAVB);

Nonetheless, from a rigorous perspective, should the dZNCC be multiplied by a factor of 1/n, similar to what is done in line 888?
Thank You!

Screenshots
image

@cdcseacave
Copy link
Owner

Hi @will-zzy and thank you spotting this. Indeed it seems that the division by n is missing. I'm very curious about your work here. Can you elaborate on what your goals, what is the project you are working on? I'm asking because I'm also working on implementing this code in CUDA, and maybe we can unite our forces.

@will-zzy
Copy link
Author

will-zzy commented Dec 27, 2024

@cdcseacave
Thank you for your reply! My goal is to combine surface evolution with Gaussian splatting. Since surface evolution based on functional optimization can be easily adapted to Gaussian splatting (if we use mesh geodesics as the metric, the splatting method becomes equivalent to mapping a “Gaussian texture” on the mesh onto the image), this may lead to a method that achieves high-quality geometric reconstruction and real-time realistic rendering.

Therefore, my first goal is to implement the relevant mesh Euler operations on the GPU. It seems there is already a fairly mature approach in graphics, for example: "High-performance simplification of triangular surfaces using a GPU." This paper maintains a GPU version of CGAL's data structures.

My second goal is to incorporate the gradient of the geodesic-based differentiable rendering loss into the gradient flow of the NCC cost. I still need to study the derivations for this. Some references include "Differentiable Rendering using RGBXY Derivatives and Optimal Transport" and "Discrete Geodesic Graph (DGG) for Computing Geodesic Distances on Polyhedral Surfaces.", "Geometry Field Splatting with Gaussian Surfels"

@cdcseacave
Copy link
Owner

I like what you try to achieve, but I do not see clearly how to connect a mesh to 3DGS. If we represent a mesh as connected triangles, it is very hard / unintuitive to change its topology, so that means you will need to introduce the mesh very late in the optimization when the surface is very close to the true shape, and find a very accurate way extracting that surface from 3dgs (to capture the true topology). I think it will be better some other way of representing the surface, maybe implicitly like SDF, however in that case I do not see clearly how to connect 3dgs to SDF.

@will-zzy
Copy link
Author

will-zzy commented Dec 27, 2024

@cdcseacave
In fact, my idea is not about 3D Gaussian ellipsoids. It is more similar to Quadratic Gaussian Splatting (QGS), where 2D Gaussian distributions are defined on surfaces:
QGS_demo_convex

Imagine that this quadratic paraboloid is made up of triangular meshes. As the triangular meshes change, the 2D Gaussian distributions defined on the surfaces will produce similar effects. Of course, connected triangular faces have higher degrees of freedom and should show more detail.

For example, on the two triangular faces below, a 2D Gaussian ellipse spans across two connected triangles. If the centroid and covariance of the 2D Gaussian distribution are predefined, the Gaussian weight at any point on the triangular mesh can be determined based on the mesh's geodesic distance.
demo_mesh_splatting
Therefore, this definition method does not require changing the mesh's topology. It is more like 2D Gaussian distributions drifting across the surface formed by the triangular mesh.

The original idea of QGS is based on the following theory: for models optimized using photometric consistency errors, we want a representation method where the more realistic the rendering, the higher the quality of geometric reconstruction. If this holds true, we can use various methods to improve rendering quality to enhance geometric reconstruction accuracy.

The opposite of this theory is: the further the expressed geometric structure deviates from the real surface, the worse the rendering quality. This shows that if a representation method's rendering results are highly sensitive to geometry, then it has a stronger ability to express that "more realistic rendering leads to higher geometric reconstruction quality." Therefore, we are committed to rendering in the surfel form. For 3D Gaussian ellipsoids, their strong texture fitting ability means they can render realistic images even on inaccurate geometry, making it inefficient to optimize geometric structures using photometric consistency errors.

On the other hand, the key to Gaussian splatting methods rendering realistic images lies in the volumetric rendering formula, not the shape of the primitives. The volumetric rendering formula depends on the relative positions of the primitives, not their absolute positions, i.e., C = ∑_i {ci αi Ti}. This means that even if all 2D Gaussian primitives are compressed onto a single plane, realistic images can still be rendered after sufficient optimization.
Additionally, surfel representations can use exact volumetric rendering formulas instead of approximations, as demonstrated in paper "Geometry Field Splatting with Gaussian Surfels".

@cdcseacave
Copy link
Owner

Thank you for the details, however I am still missing a link in this chain I think. I am familiar with the 3DGS space, including QGS, and I'm also pursuing a way to accurately model the surface with 3DGS. 3DGS is a first approach. we can think of 2DGS as a specialization (simplification) of QGS, PGSR as well, etc. But all have in common the fact that they do not model the surface directly, but parametric splats: continuous functions, that together form a continuous space. Even QGS is using TSDF to model the surface after training and extract the mesh.

I still do not understand which are the building blocks you propose, as if they are triangles, I do not see how they are topology agnostic. As soon as you have 2 connected tirangles like above, that is a fixed topology.

@will-zzy
Copy link
Author

@cdcseacave
Thank you for the additional discussion. Indeed, almost all GS methods do not explicitly model a topologically connected triangular mesh. However, in methods like PGSR and MVG-splatting, we can verify that their face elements, once converged, lie on the real surface because the homography-based warp NCC takes effect.

I apologize for not clearly stating the prerequisites. My understanding is that you are concerned about the topology among the primitives, given that existing GS methods typically optimize freely in continuous 3D space, causing most primitives to converge onto or near the actual surface without modeling any topology. During optimization and densification, these primitives do not explicitly interact with each other (e.g., through topology).

In contrast, our primitives differ from most methods. They are not triangular patches or triangles embedded with Gaussians (as in GaMeS), but rather a set of Gaussian distributions defined on a manifold mesh with a fixed topology.

My idea is as follows: we start with an initial mesh ​$M_0$ (just like in mesh refinement). It need not be highly accurate; it could be derived from a sparse point cloud or from a large model. This mesh has a fixed topology and is assumed to be a manifold surface, without requiring it to be watertight. Topology changes can only occur through mesh refinement operations like decimation or subdivision. (Of course, a sparse point cloud cannot form a complete scene mesh; during optimization, we may add new connected components through mesh growth to gradually model the entire surface.)

We then initialize the centroids $\mu$ and covariances $\Sigma$ of 2D Gaussian distributions on $M_0$. These 2D Gaussians are not bound to a single triangular face, nor is their scale tied to face size (a constraint sometimes seen in editing-based work such as GaMeS or “Mesh-based Gaussian Splatting for Real-time Large-scale Deformation”). Each 2D Gaussian can span multiple triangular faces through geodesic distances.

During optimization, the Gaussian centroids are allowed to move freely across different triangular faces. Because we use geodesic distance as the metric, the Gaussian distributions are completely defined on the manifold mesh. Thus, they do not affect the mesh’s topology but instead rely on the mesh (with its fixed topology at any given time) to fit the surface texture.

Thank you again for your insights. If there is anything imprecise in my idea, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants