Skip to content

Commit c7fcad6

Browse files
committed
update linear model documentation
1 parent bc50efa commit c7fcad6

File tree

6 files changed

+91
-68
lines changed

6 files changed

+91
-68
lines changed

docs/numpy_ml.linear_models.rst

Lines changed: 30 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -53,8 +53,8 @@ In particular, the ridge model is the same as the OLS model:
5353
5454
\mathbf{y} = \mathbf{bX} + \mathbf{\epsilon}
5555
56-
where :math:`\epsilon \sim \mathcal{N}(0, \sigma^2 I)`, except now the error
57-
for the model is calculated as
56+
where :math:`\epsilon \sim \mathcal{N}(\mathbf{0}, \sigma^2 \mathbf{I})`,
57+
except now the error for the model is calculated as
5858

5959
.. math::
6060
@@ -66,9 +66,9 @@ the adjusted normal equation:
6666
.. math::
6767
6868
\hat{\mathbf{b}}_{Ridge} =
69-
(\mathbf{X}^\top \mathbf{X} + \alpha I)^{-1} \mathbf{X}^\top \mathbf{y}
69+
(\mathbf{X}^\top \mathbf{X} + \alpha \mathbf{I})^{-1} \mathbf{X}^\top \mathbf{y}
7070
71-
where :math:`(\mathbf{X}^\top \mathbf{X} + \alpha I)^{-1}
71+
where :math:`(\mathbf{X}^\top \mathbf{X} + \alpha \mathbf{I})^{-1}
7272
\mathbf{X}^\top` is the pseudoinverse / Moore-Penrose inverse adjusted for
7373
the `L2` penalty on the model coefficients.
7474

@@ -81,7 +81,7 @@ the `L2` penalty on the model coefficients.
8181
<h2>Bayesian Linear Regression</h2>
8282

8383
In its general form, Bayesian linear regression extends the simple linear
84-
regression model by introducing priors on model parameters b and/or the
84+
regression model by introducing priors on model parameters *b* and/or the
8585
error variance :math:`\sigma^2`.
8686

8787
The introduction of a prior allows us to quantify the uncertainty in our
@@ -98,7 +98,7 @@ data :math:`X^*` with the posterior predictive distribution:
9898

9999
.. math::
100100
101-
p(y^* \mid X^*, X, Y) = \int_{b} p(y^* \mid X^*, b) p(b \mid X, y) db
101+
p(y^* \mid X^*, X, Y) = \int_{b} p(y^* \mid X^*, b) p(b \mid X, y) \ \text{d}b
102102
103103
Depending on the choice of prior it may be impossible to compute an
104104
analytic form for the posterior / posterior predictive distribution. In
@@ -116,11 +116,11 @@ prior on `b` is Gaussian. A common parameterization is:
116116

117117
.. math::
118118
119-
b | \sigma, b_V \sim \mathcal{N}(b_{mean}, \sigma^2 b_V)
119+
b | \sigma, V \sim \mathcal{N}(\mu, \sigma^2 V)
120120
121-
where :math:`b_{mean}`, :math:`\sigma` and :math:`b_V` are hyperparameters. Ridge
122-
regression is a special case of this model where :math:`b_{mean}` = 0,
123-
:math:`\sigma` = 1 and :math:`b_V = I` (ie., the prior on `b` is a zero-mean,
121+
where :math:`\mu`, :math:`\sigma` and :math:`V` are hyperparameters. Ridge
122+
regression is a special case of this model where :math:`\mu = 0`,
123+
:math:`\sigma = 1` and :math:`V = I` (i.e., the prior on *b* is a zero-mean,
124124
unit covariance Gaussian).
125125

126126
Due to the conjugacy of the above prior with the Gaussian likelihood, there
@@ -129,22 +129,22 @@ parameters:
129129

130130
.. math::
131131
132-
A &= (b_V^{-1} + X^\top X)^{-1} \\
133-
\mu_b &= A b_V^{-1} b_{mean} + A X^\top y \\
134-
\text{cov}_b &= \sigma^2 A \\
132+
A &= (V^{-1} + X^\top X)^{-1} \\
133+
\mu_b &= A V^{-1} \mu + A X^\top y \\
134+
\Sigma_b &= \sigma^2 A \\
135135
136136
The model posterior is then
137137

138138
.. math::
139139
140-
b \mid X, y \sim \mathcal{N}(\mu_b, \text{cov}_b)
140+
b \mid X, y \sim \mathcal{N}(\mu_b, \Sigma_b)
141141
142142
We can also compute a closed-form solution for the posterior predictive distribution as
143143
well:
144144

145145
.. math::
146146
147-
y^* \mid X^*, X, Y \sim \mathcal{N}(X^* \mu_b, \ \ X^* \text{cov}_b X^{* \top} + I)
147+
y^* \mid X^*, X, Y \sim \mathcal{N}(X^* \mu_b, \ \ X^* \Sigma X^{* \top} + I)
148148
149149
where :math:`X^*` is the matrix of new data we wish to predict, and :math:`y^*`
150150
are the predicted targets for those data.
@@ -160,7 +160,7 @@ are the predicted targets for those data.
160160

161161
--------------------------------
162162

163-
If *both* b and the error variance :math:`\sigma^2` are unknown, the
163+
If *both* *b* and the error variance :math:`\sigma^2` are unknown, the
164164
conjugate prior for the Gaussian likelihood is the Normal-Gamma
165165
distribution (univariate likelihood) or the Normal-Inverse-Wishart
166166
distribution (multivariate likelihood).
@@ -169,22 +169,22 @@ distribution (multivariate likelihood).
169169

170170
.. math::
171171
172-
b, \sigma^2 &\sim \text{NG}(b_{mean}, b_{V}, \alpha, \beta) \\
172+
b, \sigma^2 &\sim \text{NG}(\mu, V, \alpha, \beta) \\
173173
\sigma^2 &\sim \text{InverseGamma}(\alpha, \beta) \\
174-
b \mid \sigma^2 &\sim \mathcal{N}(b_{mean}, \sigma^2 b_{V})
174+
b \mid \sigma^2 &\sim \mathcal{N}(\mu, \sigma^2 V)
175175
176-
where :math:`\alpha, \beta, b_{V}`, and :math:`b_{mean}` are
177-
parameters of the prior.
176+
where :math:`\alpha, \beta, V`, and :math:`\mu` are parameters of the
177+
prior.
178178

179179
**Multivariate**
180180

181181
.. math::
182182
183-
b, \Sigma &\sim \mathcal{NIW}(b_{mean}, \lambda, \Psi, \rho) \\
183+
b, \Sigma &\sim \mathcal{NIW}(\mu, \lambda, \Psi, \rho) \\
184184
\Sigma &\sim \mathcal{W}^{-1}(\Psi, \rho) \\
185-
b \mid \Sigma &\sim \mathcal{N}(b_{mean}, \frac{1}{\lambda} \Sigma)
185+
b \mid \Sigma &\sim \mathcal{N}(\mu, \frac{1}{\lambda} \Sigma)
186186
187-
where :math:`b_{mean}, \lambda, \Psi`, and :math:`\rho` are
187+
where :math:`\mu, \lambda, \Psi`, and :math:`\rho` are
188188
parameters of the prior.
189189

190190

@@ -194,30 +194,30 @@ parameters:
194194

195195
.. math::
196196
197-
B &= y - X b_{mean} \\
197+
B &= y - X \mu \\
198198
\text{shape} &= N + \alpha \\
199-
\text{scale} &= \frac{1}{\text{shape}} (\alpha \beta + B^\top (X b_V X^\top + I)^{-1} B) \\
199+
\text{scale} &= \frac{1}{\text{shape}} (\alpha \beta + B^\top (X V X^\top + I)^{-1} B) \\
200200
201201
where
202202

203203
.. math::
204204
205205
\sigma^2 \mid X, y &\sim \text{InverseGamma}(\text{shape}, \text{scale}) \\
206-
A &= (b_V^{-1} + X^\top X)^{-1} \\
207-
\mu_b &= A b_V^{-1} b_{mean} + A X^\top y \\
208-
\text{cov}_b &= \sigma^2 A
206+
A &= (V^{-1} + X^\top X)^{-1} \\
207+
\mu_b &= A V^{-1} \mu + A X^\top y \\
208+
\Sigma_b &= \sigma^2 A
209209
210210
The model posterior is then
211211

212212
.. math::
213213
214-
b | X, y, \sigma^2 \sim \mathcal{N}(\mu_b, \text{cov}_b)
214+
b | X, y, \sigma^2 \sim \mathcal{N}(\mu_b, \Sigma_b)
215215
216216
We can also compute a closed-form solution for the posterior predictive distribution:
217217

218218
.. math::
219219
220-
y^* \mid X^*, X, Y \sim \mathcal{N}(X^* \mu_b, \ X^* \text{cov}_b X^{* \top} + I)
220+
y^* \mid X^*, X, Y \sim \mathcal{N}(X^* \mu_b, \ X^* \Sigma_b X^{* \top} + I)
221221
222222
**Models**
223223

numpy_ml/linear_models/bayesian_regression.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ def __init__(self, alpha=1, beta=2, mu=0, V=None, fit_intercept=True):
5454
posterior_predictive : dict or None
5555
Frozen random variable for the posterior predictive distribution,
5656
:math:`P(y \mid X)`. This value is only set following a call to
57-
:meth:`numpy_ml.linear_models.BayesianLinearRegressionUnknownVariance.predict`.
57+
:meth:`predict <numpy_ml.linear_models.BayesianLinearRegressionUnknownVariance.predict>`.
5858
""" # noqa: E501
5959
# this is a placeholder until we know the dimensions of X
6060
V = 1.0 if V is None else V
@@ -90,7 +90,11 @@ def fit(self, X, y):
9090
y : :py:class:`ndarray <numpy.ndarray>` of shape `(N, K)`
9191
The targets for each of the `N` examples in `X`, where each target
9292
has dimension `K`.
93-
"""
93+
94+
Returns
95+
-------
96+
self : :class:`BayesianLinearRegressionUnknownVariance<numpy_ml.linear_models.BayesianLinearRegressionUnknownVariance>` instance
97+
""" # noqa: E501
9498
# convert X to a design matrix if we're fitting an intercept
9599
if self.fit_intercept:
96100
X = np.c_[np.ones(X.shape[0]), X]
@@ -130,6 +134,7 @@ def fit(self, X, y):
130134
"sigma**2": stats.distributions.invgamma(a=shape, scale=scale),
131135
"b | sigma**2": stats.multivariate_normal(mean=mu, cov=cov),
132136
}
137+
return self
133138

134139
def predict(self, X):
135140
"""
@@ -206,7 +211,7 @@ def __init__(self, mu=0, sigma=1, V=None, fit_intercept=True):
206211
posterior_predictive : dict or None
207212
Frozen random variable for the posterior predictive distribution,
208213
:math:`P(y \mid X)`. This value is only set following a call to
209-
:meth:`numpy_ml.linear_models.BayesianLinearRegressionKnownVariance.predict`.
214+
:meth:`predict <numpy_ml.linear_models.BayesianLinearRegressionKnownVariance.predict>`.
210215
""" # noqa: E501
211216
# this is a placeholder until we know the dimensions of X
212217
V = 1.0 if V is None else V

numpy_ml/linear_models/glm.py

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -53,10 +53,10 @@ def __init__(self, link, fit_intercept=True, tol=1e-5, max_iter=100):
5353
5454
Notes
5555
-----
56-
The generalized linear model (GLM) [a]_ [b]_ assumes that each target/dependent
56+
The generalized linear model (GLM) [7]_ [8]_ assumes that each target/dependent
5757
variable :math:`y_i` in target vector :math:`\mathbf{y} = (y_1, \ldots,
5858
y_n)`, has been drawn independently from a pre-specified distribution
59-
in the exponential family [e]_ with unknown mean :math:`\mu_i`. The GLM
59+
in the exponential family [11]_ with unknown mean :math:`\mu_i`. The GLM
6060
models a (one-to-one, continuous, differentiable) function, *g*, of
6161
this mean value as a linear combination of the model parameters
6262
:math:`\mathbf{b}` and observed covariates, :math:`\mathbf{x}_i`:
@@ -79,22 +79,22 @@ def __init__(self, link, fit_intercept=True, tol=1e-5, max_iter=100):
7979
"Binomial", "Logit", ":math:`g(x) = \log(x) - \log(n - x)`"
8080
"Poisson", "Log", ":math:`g(x) = \log(x)`"
8181
82-
An iteratively re-weighted least squares (IRLS) algorithm [c]_ can be
82+
An iteratively re-weighted least squares (IRLS) algorithm [9]_ can be
8383
employed to find the maximum likelihood estimate for the model
8484
parameters :math:`\beta` in any instance of the generalized linear
85-
model. IRLS is equivalent to Fisher scoring [d]_, which itself is
85+
model. IRLS is equivalent to Fisher scoring [10]_, which itself is
8686
a slight modification of classic Newton-Raphson for finding the zeros
8787
of the first derivative of the model log-likelihood.
8888
8989
References
9090
----------
91-
.. [a] Nelder, J., & Wedderburn, R. (1972). Generalized linear
91+
.. [7] Nelder, J., & Wedderburn, R. (1972). Generalized linear
9292
models. *Journal of the Royal Statistical Society, Series A
9393
(General), 135(3)*: 370–384.
94-
.. [b] https://en.wikipedia.org/wiki/Generalized_linear_model
95-
.. [c] https://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares
96-
.. [d] https://en.wikipedia.org/wiki/Scoring_algorithm
97-
.. [e] https://en.wikipedia.org/wiki/Exponential_family
94+
.. [8] https://en.wikipedia.org/wiki/Generalized_linear_model
95+
.. [9] https://en.wikipedia.org/wiki/Iteratively_reweighted_least_squares
96+
.. [10] https://en.wikipedia.org/wiki/Scoring_algorithm
97+
.. [11] https://en.wikipedia.org/wiki/Exponential_family
9898
9999
Parameters
100100
----------
@@ -136,7 +136,11 @@ def fit(self, X, y):
136136
A dataset consisting of `N` examples, each of dimension `M`.
137137
y : :py:class:`ndarray <numpy.ndarray>` of shape `(N,)`
138138
The targets for each of the `N` examples in `X`.
139-
"""
139+
140+
Returns
141+
-------
142+
self : :class:`GeneralizedLinearModel <numpy_ml.linear_models.GeneralizedLinearModel>` instance
143+
""" # noqa: E501
140144
y = np.squeeze(y)
141145
assert y.ndim == 1
142146

numpy_ml/linear_models/linear_regression.py

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ def __init__(self, fit_intercept=True):
1818
1919
y_i = \beta^\top \mathbf{x}_i + \epsilon_i
2020
21-
In this equation :math:`epsilon_i \sim \mathcal{N}(0, \sigma^2_i)` is
21+
In this equation :math:`\epsilon_i \sim \mathcal{N}(0, \sigma^2_i)` is
2222
the error term associated with example :math:`i`, and
2323
:math:`\sigma^2_i` is the variance of the corresponding example.
2424
@@ -111,11 +111,15 @@ def update(self, X, y, weights=None):
111111
with larger weights exert greater influence on model fit. When
112112
`y` is a vector (i.e., `K = 1`), weights should be set to the
113113
reciporical of the variance for each measurement (i.e., :math:`w_i
114-
= 1/sigma^2_i`). When `K > 1`, it is assumed that all columns of
114+
= 1/\sigma^2_i`). When `K > 1`, it is assumed that all columns of
115115
`y` share the same weight :math:`w_i`. If None, examples are
116116
weighted equally, resulting in the standard linear least squares
117117
update. Default is None.
118-
"""
118+
119+
Returns
120+
-------
121+
self : :class:`LinearRegression <numpy_ml.linear_models.LinearRegression>` instance
122+
""" # noqa: E501
119123
if not self._is_fit:
120124
raise RuntimeError("You must call the `fit` method before calling `update`")
121125

@@ -166,7 +170,7 @@ def _update2D(self, X, y, W):
166170
beta += S_inv @ X.T @ (y - X @ beta)
167171

168172
def fit(self, X, y, weights=None):
169-
"""
173+
r"""
170174
Fit regression coefficients via maximum likelihood.
171175
172176
Parameters
@@ -181,11 +185,15 @@ def fit(self, X, y, weights=None):
181185
with larger weights exert greater influence on model fit. When
182186
`y` is a vector (i.e., `K = 1`), weights should be set to the
183187
reciporical of the variance for each measurement (i.e., :math:`w_i
184-
= 1/sigma^2_i`). When `K > 1`, it is assumed that all columns of
188+
= 1/\sigma^2_i`). When `K > 1`, it is assumed that all columns of
185189
`y` share the same weight :math:`w_i`. If None, examples are
186190
weighted equally, resulting in the standard linear least squares
187191
update. Default is None.
188-
"""
192+
193+
Returns
194+
-------
195+
self : :class:`LinearRegression <numpy_ml.linear_models.LinearRegression>` instance
196+
""" # noqa: E501
189197
N = X.shape[0]
190198

191199
weights = np.ones(N) if weights is None else np.atleast_1d(weights)
@@ -226,4 +234,3 @@ def predict(self, X):
226234
if self.fit_intercept:
227235
X = np.c_[np.ones(X.shape[0]), X]
228236
return X @ self.beta
229-
# return np.dot(X, self.beta)

numpy_ml/linear_models/naive_bayes.py

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -82,16 +82,17 @@ def fit(self, X, y):
8282
8383
Notes
8484
-----
85-
The model parameters are stored in the :py:attr:`parameters` attribute.
85+
The model parameters are stored in the :py:attr:`parameters
86+
<numpy_ml.linear_models.GaussianNBClassifier.parameters>` attribute.
8687
The following keys are present:
8788
88-
mean: :py:class:`ndarray <numpy.ndarray>` of shape `(K, M)`
89-
Feature means for each of the `K` label classes
90-
sigma: :py:class:`ndarray <numpy.ndarray>` of shape `(K, M)`
91-
Feature variances for each of the `K` label classes
92-
prior : :py:class:`ndarray <numpy.ndarray>` of shape `(K,)`
93-
Prior probability of each of the `K` label classes, estimated
94-
empirically from the training data
89+
"mean": :py:class:`ndarray <numpy.ndarray>` of shape `(K, M)`
90+
Feature means for each of the `K` label classes
91+
"sigma": :py:class:`ndarray <numpy.ndarray>` of shape `(K, M)`
92+
Feature variances for each of the `K` label classes
93+
"prior": :py:class:`ndarray <numpy.ndarray>` of shape `(K,)`
94+
Prior probability of each of the `K` label classes, estimated
95+
empirically from the training data
9596
9697
Parameters
9798
----------
@@ -102,8 +103,8 @@ def fit(self, X, y):
102103
103104
Returns
104105
-------
105-
self: object
106-
"""
106+
self : :class:`GaussianNBClassifier <numpy_ml.linear_models.GaussianNBClassifier>` instance
107+
""" # noqa: E501
107108
P = self.parameters
108109
H = self.hyperparameters
109110

@@ -165,7 +166,7 @@ def _log_posterior(self, X):
165166
def _log_class_posterior(self, X, class_idx):
166167
r"""
167168
Compute the (unnormalized) log posterior for the label at index
168-
`class_idx` in :py:attr:`labels`.
169+
`class_idx` in :py:attr:`labels <numpy_ml.linear_models.GaussianNBClassifier.labels>`.
169170
170171
Notes
171172
-----
@@ -199,8 +200,9 @@ def _log_class_posterior(self, X, class_idx):
199200
-------
200201
log_class_posterior : :py:class:`ndarray <numpy.ndarray>` of shape `(N,)`
201202
Unnormalized log probability of the label at index `class_idx`
202-
in :py:attr:`labels` for each example in `X`
203-
"""
203+
in :py:attr:`labels <numpy_ml.linear_models.GaussianNBClassifier.labels>`
204+
for each example in `X`
205+
""" # noqa: E501
204206
P = self.parameters
205207
mu = P["mean"][class_idx]
206208
prior = P["prior"][class_idx]

0 commit comments

Comments
 (0)