Q. Define following terms
- Scalers
- Vectors
- Matrices
- Tensors
Answer
- Scalers : A scaler is just a single number. example -
$1, 2, 3$ etc - Vectors : A vector is an array of numbers. It is like identifying points in the space, with each element giving the coordinate along a different axis.
$$ \mathbf{x} = \begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} $$
- Matrices : A matrix is 2D array of numbers.
Let
$$ \mathbf{A} = \begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{bmatrix} $$
- Tensors: An array of numbers arranged on a regular grid with a variable number of axes is known as a tensor.
Let
$$ \mathcal{T} = \begin{bmatrix} \begin{bmatrix} t_{111} & t_{112} \\ t_{121} & t_{122} \end{bmatrix}, & \begin{bmatrix} t_{211} & t_{212} \\ t_{221} & t_{222} \end{bmatrix} \end{bmatrix} $$
Q. What is broadcasting in matrices in context of deep learning?
Answer
In deep learning we allow the addition of vector and matrix, yielding another matrix:
Q. Dot product
- What’s the geometric interpretation of the dot product of two vectors?
- Given a vector
$u$ , find vector$v$ of unit length such that the dot product of$u$ and$v$ is maximum.
Answer
- Let
$\vec{A}= ⟨a_1....a_k⟩$ and$\vec{B}= ⟨b_1....b_k⟩$ be k-dimensional vectors. The dot product of$\vec{A}$ and$\vec{B}$ , denoted$\vec{A} \cdot \vec{B}$ is a number, defined as follows
The dot product has the following geometric interpretation: Let
- To find a vector $ v $ of unit length such that the dot product of $ u $ and $ v $ is maximum, we want to maximize the expression for the dot product $ u \cdot v $
According to the formula for the dot product:
where:
- $ |u| $ is the magnitude of $ u $,
- $ |v| $ is the magnitude of $ v $ (which is 1 in this case because $ v $ is a unit vector),
-
$\theta$ is the angle between $ u $ and $ v $.
To maximize $ u \cdot v
Thus, vector $ v $ should be a unit vector in the direction of $ u $. This can be achieved by normalizing $ u $. The normalization of $ u $ is done by dividing $ u $ by its magnitude. If $ u $ is represented as
Then, the unit vector $ v $ in the direction of $ u $ is:
This vector $ v $ will have a unit length and the dot product $ u \cdot v $ will be maximum, equal to the magnitude of $ u $ (since $ u \cdot v = |u| \cdot 1 \cdot \cos(0^\circ) = |u| $).
Q. Outer product
- Given two vectors
$a=[3,2,1]$ and$b=[−1,0,1]$ . Calculate the outer product$a^Tb$ ? - Give an example of how the outer product can be useful in ML.
Answer
- resultant product will be a
$3 \times 3$ matrix, which can be given as follows:
$$ \left[\begin{matrix} -3 & 0 & 3 \\ -2 & 0 & 2 \\ -1 & 0 & 1 \end{matrix}\right] $$
- One useful application of the outer product in machine learning is in the computation of covariance matrices, where the outer product is used to calculate the covariance of different feature vectors. For instance, the covariance matrix can be estimated as the average outer product of the centered data vectors (i.e., data vectors from which the mean has been subtracted). This is crucial for algorithms that rely on data distribution, such as Principal Component Analysis (PCA) and many types of clustering algorithms.
Q. What does it mean for two vectors to be linearly independent?
Answer
Two vectors are said to be linearly independent if no vector in the set can be written as a linear combination of the others. In simpler terms, neither of the vectors can be expressed as a scalar multiple or a combination involving scalar multiples of the other vector.
For two vectors
is
Q. Given two sets of vectors
Answer
Q. How can we inspect if two vectors are orthogonal?
Answer
Two vectors are orthogonal to each other if
Q. How to check if two vectors are orthonormal?
Answer
Two vectors are orthonormal to each other if they are orthogonal and both have unit norm.
Q. Given
Answer
$$ \text{Dimension of the span} = \min(n, d) $$
Q. Norms and metrics
- What's the norm? What is
$L_0,L_1,L_2,L_{norm}$ ? - How do norms and metrics differ? Given a norm, make a metric. Given a metric, can we make a norm?
Answer
- A norm on a vector space is a function that assigns a non-negative length or size to vectors, except for the zero vector, which is assigned a length of zero. Norms are denoted by
$|\cdot|$ and must satisfy the following properties for any vectors$x, y$ and any scalar$a$ :
-
Non-negativity:
$|x| \geq 0$ and$|x| = 0$ if and only if$x = 0$ . -
Scalar multiplication:
$|a \cdot x| = |a| \cdot |x|$ . -
Triangle inequality:
$|x + y| \leq |x| + |y|$ .
Different types of norms can be defined on vector spaces:
-
$L_0$ norm (not a true norm): It counts the number of non-zero entries in a vector. It does not satisfy the triangle inequality or the homogeneity property (scalar multiplication), which is why it's technically not a norm. -
$L_1$ norm: It is defined as$|x|_1 = \sum |x_i|$ , summing the absolute values of the entries of the vector. -
$L_2$ norm (Euclidean norm): It is defined as$|x|_2 = \sqrt{\sum x_i^2}$ , which corresponds to the usual geometric length of a vector. -
$L_p$ norm: It generalizes the$L_1$ and$L_2$ norms and is defined as$|x|_p = (\sum |x_i|^p)^{1/p}$ for$1 \leq p < \infty$ .
- Norm vs Metric
- A norm provides a way to measure the length of vectors in vector spaces.
- A metric is a more general function that defines a distance between any two elements in a set, satisfying:
-
Non-negativity:
$d(x, y) \geq 0$ and$d(x, y) = 0$ if and only if$x = y$ . -
Symmetry:
$d(x, y) = d(y, x)$ . -
Triangle inequality:
$d(x, z) \leq d(x, y) + d(y, z)$ .
-
Non-negativity:
Given a norm, make a metric
If you have a norm
Given a metric, can we make a norm?
Not all metrics come from norms. To derive a norm from a metric
-
Translation invariance:
$d(x+z, y+z) = d(x, y)$ for all$x, y, z$ . -
Homogeneity:
$d(\alpha x, \alpha y) = |\alpha| d(x, y)$ for all scalars$\alpha$ .
If a metric satisfies these conditions, it can be associated with a norm, where the norm
Q. Explain transpose operation on matrices?
Answer
The transpose of a matrix is mirror image of the matrix across a diagonal line, called as main diagonal. We denote the transpose of a matrix
It is defined as:
$$ \mathbf{A^T}{i, j} = \mathbf{A{j, i}} $$
Q. Define the condition under which we can multiply two matrices?
Answer
Suppose we have two matrices
If shape of
$$ \mathbf{C_{m \times p}} = \mathbf{A_{m \times n}}\mathbf{B_{n \times p}} $$
Where,
$$ C_{i, j} = \sum_k A_{i, k}B_{k, j} $$
Q. What is the Hadamard product?
Answer
The Hadamard product, also known as the element-wise product, is an operation that takes two matrices of the same dimensions and produces another matrix of the same dimensions, where each element is the product of the corresponding elements of the input matrices.
Mathematically, if
$$ \mathbf{C} = \begin{bmatrix} a_{11} \cdot b_{11} & a_{12} \cdot b_{12} & \cdots \\ a_{21} \cdot b_{21} & a_{22} \cdot b_{22} & \cdots \\ \vdots & \vdots & \ddots \end{bmatrix} $$
where $ c_{ij} = a_{ij} \cdot b_{ij} $.
Q. Where do we use Hadamard product?
Answer
The Hadamard product is commonly used in various applications such as signal processing, neural networks, and other fields where element-wise operations are needed.
Q. How is the Hadamard product different from the dot product?
Answer
The Hadamard product and dot product are distinct operations:
-
Operation:
- Hadamard Product: An element-wise multiplication of two matrices of the same size, resulting in a matrix of the same dimensions.
- Dot Product: Involves multiplying corresponding elements of vectors or matrices and summing the results; for matrices, it refers to matrix multiplication.
-
Output Dimensions:
- Hadamard Product: Output has the same dimensions as the input matrices.
-
Dot Product: The output matrix dimensions depend on the inner dimensions of the inputs (e.g., multiplying an
$m \times n$ matrix by an$n \times p$ matrix results in an$m \times p$ matrix).
-
Applications:
- Hadamard Product: Used in element-wise operations in deep learning and image processing.
- Dot Product: Used in vector projections, transformations, and solving linear systems.
Example:
For matrices $\mathbf{A} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}$ and $\mathbf{B} = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}$:
- Hadamard Product:
$$ \mathbf{A} \circ \mathbf{B} = \begin{bmatrix} 5 & 12 \\ 21 & 32 \end{bmatrix} $$
- Dot Product (Matrix Multiplication):
$$ \mathbf{A} \cdot \mathbf{B} = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix} $$
Q. Write the properties of matrix product operations?
Answer
- Matrix multiplication is distributive
$$ \mathbf{A}(\mathbf{B} + \mathbf{C}) = \mathbf{A}\mathbf{B} + \mathbf{A}\mathbf{C} $$
- Matrix multiplication is associative
$$ \mathbf{A}(\mathbf{B}\mathbf{C}) = (\mathbf{A}\mathbf{B})\mathbf{C} $$
- Matrix multiplication is not commutative
$$ \mathbf{A}\mathbf{B} \neq \mathbf{B}\mathbf{A} $$
Q. Is dot product between two vectors are commutative?
Answer
Yes
Q. What is an identity matrix?
Answer
An identity matrix is a matrix that does not change any vector when we multiply that vector by that matrix.
$$ \forall \mathbf{x} \in \mathbb{R}^n, ; I_n \mathbf{x} = \mathbf{x} $$
The structure of identity matrix is simple, all the entries along with main diagonal is
Q. Define matrix inverse?
Answer
The matrix inverse of
$$ \mathbf{A^{-1}}\mathbf{A} = \mathbf{I_n} $$
Q. How to check if a matrix is a singular matrix?
Answer
Key characteristics of a singular matrix:
-
$\mathbf{A}$ should be a square matrix -
Determinant:
$\det(\mathbf{A}) = 0$ . - Linear Dependence: At least one row or column is redundant.
-
Non-Invertibility: The matrix cannot be inverted, meaning there is no matrix
$\mathbf{A}^{-1}$ such that$\mathbf{A} \mathbf{A}^{-1} = I$ , where$I$ is the identity matrix.
Q. Under what conditions inverse of a matrix exists?
Answer
The inverse of a matrix
-
$\det(\mathbf{A}) \neq 0$ : The determinant is non-zero. -
Square Matrix:
$\mathbf{A}$ is$n \times n$ . - Full Rank: All rows/columns are linearly independent.
- No Zero Eigenvalues: No eigenvalues are zero.
Q. What is rank of matrix and how to determine it?
Answer
The rank of a matrix is defined as the maximum number of linearly independent rows or columns in the matrix. It represents the dimension of the row space or column space of the matrix.
We can determine rank via performing row operations to transform the matrix into row echelon form (REF). The rank is the number of non-zero rows in this form.
Q. What is a full rank matrix??
Answer
If the rank is equal to the smallest dimension of the matrix (i.e., the number of rows or columns), the matrix is said to have full rank.
Q. How to check if matrix is full rank?
Answer
For square matrices, compute the determinant. If that is non-zero, the matrix is of full rank.
If the matrix
Q. Why do we say that matrices are linear transformations?
Answer
Matrices are considered linear transformations because they map vectors from one space to another while preserving the operations of vector addition and scalar multiplication, which are the core properties of linearity.
Here is the proof for the same:
Every matrix transformation is a linear transformation
Suppose that
$$ T(c \mathbf{u} + d \mathbf{v}) = A(c \mathbf{u} + d \mathbf{v}) $$
$$ = c A \mathbf{u} + d A \mathbf{v} $$
$$ = c T $$
$$ \text{As } T(c \mathbf{u} + d \mathbf{v}) = c T(\mathbf{u}) + d T(\mathbf{v}), \text{ the transformation } T \text{ must be linear.} $$
Q. Do all matrices have an inverse? Is the inverse of a matrix always unique?
Answer
Not all matrices have an inverse; a matrix must be square and non-singular to have its inverse.
If a matrix
$$ \mathbf{A} \mathbf{B} = \mathbf{I} \quad \text{and} \quad \mathbf{A} \mathbf{C} = \mathbf{I} $$
Multiplying both sides of the first equation by
$$ \mathbf{B} = \mathbf{C} $$
Thus, the inverse is unique.
Q. What does norm of a vector represents?
Answer
The norm of a vector
Q. Explain Euclidean norm?
Answer
It is the
It is also a common to measure the size of a vector using the squared
$$ ||x|| = (\sum_{i} |x_i|^{2})^{\frac{1}{2}} $$
Q. When we should use
Answer
Q. What does max norm and unit norm depicts?
Answer
Max Norm
The max norm simplifies to the absolute value of the element with the largest magnitude in the vector.
It is denoted by
$$ | \mathbf{x} |_{\infty} = \max_i |x_i| $$
Unit Norm
A vector with unit norm
$$ ||x||_{2} = 1 $$
Q. How to measure size of a matrix?
Answer
We can use Frobenius norm, which is like
$$ | \mathbf{A} |{F} = \sqrt{\sum{i,j} A_{i,j}^2} $$
Q. Can you write the dot product of two vectors in terms of their norms?
Answer
Yes, the dot product of two vectors
$$ \mathbf{x} \cdot \mathbf{y} = \mathbf{x}^T \mathbf{y} $$
This can also be expressed as:
$$ \mathbf{x}^T \mathbf{y} = |\mathbf{x}|_2 |\mathbf{y}|_2 \cos{\theta} $$
where
Q. Define following type of matrices:
- Diagonal Matrix
- Symmetric Matrix
- Orthogonal Matrix
Answer
Diagonal Matrix
A matrix
Symmetric Matrix
A symmetric matrix is any matrix that is equal to its own transpose.
$$ \mathbf{A} = \mathbf{A^T} $$
Orthogonal Matrix
An orthogonal matrix is a square matrix whose rows are mutually orthonormal and whose columns are also mutually orthonormal.
$$ \mathbf{A^T}\mathbf{A} = \mathbf{A}\mathbf{A^T} = \mathbf{I} $$
Which implies that,
$$ \mathbf{A^{-1}} = \mathbf{A^T} $$
Q. What is Eigen decomposition?
Answer
Eigen decomposition is a matrix factorization method where a matrix is decomposed into its eigenvalues and corresponding eigenvectors.
Q. Define eigenvector and eigenvalues?
Answer
An eigenvector of a square matrix
$$ \mathbf{A}\mathbf{v} = \lambda \mathbf{v} $$
In this equation,
Q. Express the eigen decomposition of square matrix
Answer
Suppose the matrix
Lets define a matrix
$$ \mathbf{V} = [v^{(1)},..,v^{(n)}] $$
$$ \mathbf{\lambda} = [\lambda_1,..,\lambda_n]^T $$
Eigen decomposition of
$$ \mathbf{A} = \mathbf{V}\text{diag}(\mathbf{\lambda})\mathbf{V}^{-1} $$
Q. What is the significant to eigen-decomposition?
Answer
Eigendecomposition of a matrix tells use many useful facts about the matrix.
- The matrix is singular if and only if any of the eigenvalues are zero.
- The determinant of the matrix
$\mathbf{A}$ equals the product of its eigenvalues. - The trace of the matrix
$\mathbf{A}$ equals the summation of its eigenvalues. - If the eigenvalues of
$\mathbf{A}$ are$\lambda_{i}$ and$\mathbf{A}$ is non-singular, then the eigenvalues of$\mathbf{A^{-1}}$ are simply$\lambda{1}{\lambda_{i}}$ . - The eigenvectors of
$\mathbf{A^{-1}}$ are the same as eigenvectors of$\mathbf{A}$ .
Q. Do non-square matrices have eigenvalues?
Answer
Eigenvalues and eigenvectors of a matrix
Q. What is the benefit of using Singular Value Decomposition(SVD) over eigenvalue decomposition?
Answer
- Eigenvalue decomposition focuses on diagonalizing a square matrix, which may not always be possible.
- SVD provides an orthogonal factorization that works for any matrix and is often used to analyze the matrix structure, solve least squares problems, and more.
Q. What is Singular Value Decomposition (SVD)?
Answer
Singular Value Decomposition (SVD) is a technique used to factorize a matrix into its constituent singular vectors and singular values.
For a given real matrix
$$ \mathbf{A} = \mathbf{U} \mathbf{D} \mathbf{V}^T $$
where:
-
$\mathbf{U}$ is an$m \times m$ orthogonal matrix whose columns are the left singular vectors. -
$\mathbf{D}$ is an$m \times n$ diagonal matrix containing the singular values on its diagonal. It may not be square. -
$\mathbf{V}$ is an$n \times n$ orthogonal matrix whose columns are the right singular vectors.
SVD provides a way to decompose a matrix into components that reveal important properties and facilitate various applications in data analysis, signal processing, and more.
Q. What is the relationship between eigen-value decomposition and SVD?
Answer
We can interpret the SVD of
- The left singular vectors
$\mathbf{A}$ are the eigenvectors of$\mathbf{AA^T}$ - The right singular vectors
$\mathbf{A}$ are the eigenvectors of$\mathbf{A^{T}A}$ - The non zero singular values of
$\mathbf{A}$ are the square roots of the eigenvalues of$\mathbf{AA^T}$ (true for$\mathbf{A^{T}A}$ too).
Q. What is trace of a matrix?
Answer
The trace operator gives the sum of all the diagonal entries of a matrix.
$$ Tr(A) = \sum_i{A_{i, i}} $$
Q. Write the main properties of trace operator?
Answer
- Frobenius norm of a matrix:
$$ ||A||_{F} = \sqrt{Tr(AA^T)} $$
- Trace operator is invariant to the transpose operator
$$ Tr(A) = Tr(A^T) $$
- Invariance to cyclic permutations
$$ Tr(AB) = Tr(BA) $$
Q. What does the determinant of a matrix represent?
Answer
The determinant of a square matrix denoted by
Q. What does the absolute value of the determinant depicts?
Answer
The absolute value of the determinant of a matrix provides a measure of the scale factor by which the matrix expands or contracts space.
-
$|\text{det}(\mathbf{A})| = 0$ : Space is contracted completely and transformation will result in loss of dimensionality. -
$|\text{det}(\mathbf{A})| = 1$ : Volume remains unchanged -
$|\text{det}(\mathbf{A})| > 1$ : Volume gets enlarged -
$|\text{det}(\mathbf{A})| < 1$ : Volume gets compressed
Q. What happens to the determinant of a matrix if we multiply one of its rows by a scalar
Answer
If you multiply one of the rows of a matrix
- Let
$\mathbf{A}$ be an$n \times n$ matrix. - If you multiply one row of
$\mathbf{A}$ by a scalar$t$ , the new matrix$\mathbf{A'}$ will have a determinant given by:
$$ \text{det}(\mathbf{A}') = t \cdot \text{det}(\mathbf{A}) $$
This property reflects that the determinant is a multilinear function of the rows of the matrix. Hence, multiplying a row by a scalar scales the determinant by that scalar.
Q. A
Answer
- Trace: The trace of a matrix is the sum of its eigenvalues. Therefore, for this matrix:
$$ \text{Trace} = 3 + 3 + 2 + (-1) = 7 $$
- Determinant: The determinant of a matrix is the product of its eigenvalues. Thus, for this matrix:
$$ \text{Determinant} = 3 \times 3 \times 2 \times (-1) = -18 $$
Q. Given the following matrix:
Without explicitly using the equation for calculating determinants, what can we say about this matrix’s determinant?
Answer
We can write the above matrix into its row echelon form:
- Add the first row to the second row to make the entry in the second row, first column zero:
$$ R2 \rightarrow R2 + R1 $$
$$ \begin{bmatrix} 1 & 4 & -2 \\ 0 & 7 & 0 \\ 3 & 5 & -6 \end{bmatrix} $$
- Subtract
$3$ times the first row from the third row to make the entry in the third row, first column zero:
$$ R3 \rightarrow R3 - 3 \times R1 $$
$$ \begin{bmatrix} 1 & 4 & -2 \\ 0 & 7 & 0 \\ 0 & -7 & 0 \end{bmatrix} $$
- Divide the second row by
$7$ to normalize the leading coefficient:
$$ R2 \rightarrow \frac{1}{7} \times R2 $$
$$ \begin{bmatrix} 1 & 4 & -2 \\ 0 & 1 & 0 \\ 0 & -7 & 0 \end{bmatrix} $$
- Add
$7$ times the second row to the third row to make the entry in the third row, second column zero:
$$ R3 \rightarrow R3 + 7 \times R2 $$
$$ \begin{bmatrix} 1 & 4 & -2 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{bmatrix} $$
- Subtract
$4$ times the second row from the first row to make the entry in the first row, second column zero:
$$ R1 \rightarrow R1 - 4 \times R2 $$
$$ \begin{bmatrix} 1 & 0 & -2 \\ 0 & 1 & 0 \\ 0 & 0 & 0 \end{bmatrix} $$
The presence of a row of zeros in the row echelon form indicates that the matrix is singular, meaning its determinant is zero.
Q. What’s the difference between the covariance matrix
Answer
-
Dimensions:
-
$A^T A$ is$n \times n$ (columns of$A$ ). -
$A A^T$ is$m \times m$ (rows of$A$ ).
-
-
Focus:
-
$A^T A$ focuses on the relationships between columns. -
$A A^T$ focuses on the relationships between rows.
-
-
Applications:
-
$A^T A$ is used to understand the variance and covariance of data columns. -
$A A^T$ is used to understand the similarity and inner product of data rows.
-
Q. Given
- Find
$x$ such that:$Ax=b$ . - When does this have a unique solution?
- Why is it when
$A$ has more columns than rows,$Ax=b$ has multiple solutions? - Given a matrix
$A$ with no inverse. How would you solve the equation$Ax=b$ ? What is the pseudo inverse and how to calculate it?
Answer
- To find
$x$ such that$A x = b$ where$A \in \mathbb{R}^{n \times m}$ and$b \in \mathbb{R}^n$ , you generally need to solve a linear system. The method used depends on the properties of$A$ :
-
If
$A$ is square (i.e.,$n = m$ ) and invertible, you can find$x$ directly using:
$$ x = A^{-1} b $$
-
If
$A$ is not square or not invertible, we may use other methods such as:- Gaussian Elimination: Useful for finding solutions and performing row reductions.
-
Least Squares Solution: If
$A$ has more rows than columns ($n > m$ ) and does not have an exact solution, find the least squares solution. -
Pseudo-Inverse: When
$A$ is not invertible or not square, the Moore-Penrose pseudo-inverse is used.
- The linear system
$A x = b$ has a unique solution if:
-
The matrix
$A$ is square ($n = m$ ) and invertible, meaning$\text{det}(A) \neq 0$ . In this case, the matrix$A$ has full rank, and the solution is given by$x = A^{-1} b$ . -
For non-square matrices, a unique solution occurs when the system is consistent and has a unique solution if:
- The matrix
$A$ has full column rank (if$m \leq n$ ) and$b$ is in the column space of$A$ . - In the case of an over-determined system (more rows than columns),
$A$ should have full column rank.
- The matrix
-
The system
$A x = b$ has infinitely many solutions because there are free variables associated with the null space. This leads to a solution space that forms an affine subspace in (\mathbb{R}^m), where each solution can be expressed as$x = x_0 + \text{null}(A)$ , where$x_0$ is a particular solution and (\text{null}(A)) represents the null space of$A$ .
Q. Given a very large symmetric matrix