diff --git a/source/3D Location Encoder/NeRF.md b/source/3D Location Encoder/NeRF.md index 35c0ca96..694dc6b1 100644 --- a/source/3D Location Encoder/NeRF.md +++ b/source/3D Location Encoder/NeRF.md @@ -36,6 +36,15 @@ The `NERFSpatialRelationLocationEncoder` is designed to compute spatial embeddin
+Encoded \( x \) = \(\bigoplus_{i=0}^{L-1}\) \([ \sin(2^i \pi x), \cos(2^i \pi x)]\) + +Encoded \( y \) = \(\bigoplus_{i=0}^{L-1}\) \([ \sin(2^i \pi y), \cos(2^i \pi y)]\) + +Encoded \( z \) = \(\bigoplus_{i=0}^{L-1}\) \([ \sin(2^i \pi z), \cos(2^i \pi z)]\) + +Where ⊕ denotes concatenation of vectors. + + ### Configuration Parameters - **coord_dim**: Dimensionality of the space being encoded (e.g., 2D, 3D). - **frequency_num**: Number of different sinusoidal frequencies used to encode spatial differences. diff --git a/source/3D Location Encoder/xyz.md b/source/3D Location Encoder/xyz.md index 120d1859..85cefffb 100644 --- a/source/3D Location Encoder/xyz.md +++ b/source/3D Location Encoder/xyz.md @@ -46,9 +46,18 @@ Processes a batch of coordinates and converts them into spatial relation embeddi - **Formulas:** - Convert latitude `lat` and longitude `lon` coordinates into radians. - Calculate `x, y, z` coordinates using the following equations: -- + + $$x = \cos(lat) \times \cos(lon)$$ + $$y = \cos(lat) \times \sin(lon)$$ + $$z = \sin(lat)$$
+ + Where: + + - *lat* is the latitude coordinate in radians. + - *lon* is the longitude coordinate in radians. + - *x*, *y*, *z* are the resulting Cartesian coordinates. + - Concatenate `x, y, z` coordinates to form the high-dimensional vector representation. - **Returns:** diff --git a/source/Basic Concepts/Single point location encoder.md b/source/Basic Concepts/Single point location encoder.md index 7bfff295..181838d4 100644 --- a/source/Basic Concepts/Single point location encoder.md +++ b/source/Basic Concepts/Single point location encoder.md @@ -9,6 +9,10 @@ output: +$$ +Enc(\mathbf{x}) = \mathbf{NN}(PE(\mathbf{x})) +$$ + ## EncoderMultiLayerFeedForwardNN() `NN(⋅) : ℝ^W -> ℝ^d` is a learnable neural network component which maps the input position embedding `PE(x) ∈ ℝ^W` into the location embedding `Enc(x) ∈ ℝ^d`. A common practice is to define `NN(⋅)` as a multi-layer perceptron, while Mac Aodha et al. (2019) adopted a more complex `NN(⋅)` which includes an initial fully connected layer, followed by a series of residual blocks. The purpose of `NN(⋅)` is to provide a learnable component for the location encoder, which captures the complex interaction between input locations and target labels.