Clean up show methods and documentation (#412)

dmbates · palday · web-flow · commit 048718dd13b1 · 2020-10-07T16:34:21.000-05:00
* Extend model-fit criteria section in show. Closes #411 (not entirely but the other change can wait) * Clean up docs Co-authored-by: Phillip Alday <palday@users.noreply.github.com>
diff --git a/docs/src/GaussHermite.md b/docs/src/GaussHermite.md
@@ -1,6 +1,6 @@
 # Normalized Gauss-Hermite Quadrature
 
-[*Gaussian Quadrature rules*](https://en.wikipedia.org/wiki/Gaussian_quadrature) provide sets of `x` values, called *abscissae*, and weights, `w`, to approximate an integral with respect to a *weight function*, $g(x)$.
+[*Gaussian Quadrature rules*](https://en.wikipedia.org/wiki/Gaussian_quadrature) provide sets of `x` values, called *abscissae*, and corresponding weights, `w`, to approximate an integral with respect to a *weight function*, $g(x)$.
 For a `k`th order rule the approximation is
 ```math
 \int f(x)g(x)\,dx \approx \sum_{i=1}^k w_i f(x_i)
@@ -93,7 +93,7 @@ A *binary response* is a "Yes"/"No" type of answer.
 For example, in a 1989 fertility survey of women in Bangladesh (reported in [Huq, N. M. and Cleland, J., 1990](https://www.popline.org/node/371841)) one response of interest was whether the woman used artificial contraception.
 Several covariates were recorded including the woman's age (centered at the mean), the number of live children the woman has had (in 4 categories: 0, 1, 2, and 3 or more), whether she lived in an urban setting, and the district in which she lived.
 The version of the data used here is that used in review of multilevel modeling software conducted by the Center for Multilevel Modelling, currently at University of Bristol (http://www.bristol.ac.uk/cmm/learning/mmsoftware/data-rev.html).
-These data are available as the `Contraception` data frame in the test data for the `MixedModels` package.
+These data are available as the `:contra` dataset.
 ```@example Main
 contra = DataFrame(MixedModels.dataset(:contra))
 describe(contra)
@@ -109,8 +109,7 @@ shows that the proportion of women using artificial contraception is approximate
 A model with fixed-effects for age, age squared, number of live children and urban location and with random effects for district, is fit as
 ```@example Main
 const form1 = @formula use ~ 1 + age + abs2(age) + livch + urban + (1|dist);
-m1 = fit!(GeneralizedLinearMixedModel(form1, contra,
-    Bernoulli()), fast=true)
+m1 = fit(MixedModel, form1, contra, Bernoulli(), fast=true)
 ```
 
 For a model such as `m1`, which has a single, scalar random-effects term, the unscaled conditional density of the spherical random effects variable, $\mathcal{U}$,
@@ -125,7 +124,7 @@ To use Gauss-Hermite quadrature the contributions of each of the $u_i,\;i=1,\dot
 ```@example Main
 const devc0 = map!(abs2, m1.devc0, m1.u[1]);  # start with uᵢ²
 const devresid = m1.resp.devresid;   # n-dimensional vector of deviance residuals
-const refs = first(m1.LMM.reterms).refs;  # n-dimensional vector of indices in 1:q
+const refs = only(m1.LMM.reterms).refs;  # n-dimensional vector of indices in 1:q
 for (dr, i) in zip(devresid, refs)
     devc0[i] += dr
 end
@@ -141,7 +140,7 @@ freqtable(contra, :dist)'
 
 Because the first district has one of the largest sample sizes and the third district has the smallest sample size, these two will be used for illustration.
 For a range of $u$ values, evaluate the individual components of the deviance and store them in a matrix.
-```@setup Main
+```@example Main
 const devc = m1.devc;
 const xvals = -5.0:2.0^(-4):5.0;
 const uv = vec(m1.u[1]);
diff --git a/docs/src/constructors.md b/docs/src/constructors.md
@@ -136,7 +136,7 @@ describe(pastes)
 fm4 = fit(MixedModel, @formula(strength ~ 1 + (1|sample) + (1|batch)), pastes)
 ```
 
-An alternative syntax with a solidus (the "`/`" character) separating grouping factors, read "`cask` nested within `batch`", fits the same model.
+An alternative syntax with a solidus (the "`/`" character) separating grouping factors, read "`cask` nested within `batch`", fits the same model. (`sample` is just an explicitly stored version of `batch & cask`.)
 ```@example Main
 fit(MixedModel, @formula(strength ~ 1 + (1|batch/cask)), pastes)
 ```
@@ -160,17 +160,14 @@ end
 ### Simplifying the random effect correlation structure
 
 MixedEffects.jl estimates not only the *variance* of the effects for each random effect level, but also the *correlation* between the random effects for different predictors.
-So, for the model of the *sleepstudy* data above, one of the parameters that is estimated is the correlation between each subject's random intercept (i.e., their baseline reaction time) and slope (i.e., their particular change in reaction time over days of sleep deprivation).
+So, for the model of the *sleepstudy* data above, one of the parameters that is estimated is the correlation between each subject's random intercept (i.e., their baseline reaction time) and slope (i.e., their particular change in reaction time per day of sleep deprivation).
 In some cases, you may wish to simplify the random effects structure by removing these correlation parameters.
 This often arises when there are many random effects you want to estimate (as is common in psychological experiments with many conditions and covariates), since the number of random effects parameters increases as the square of the number of predictors, making these models difficult to estimate from limited data.
 
 The special syntax `zerocorr` can be applied to individual random effects terms inside the `@formula`:
 ```@example Main
 fm2zerocorr_fm = fit(MixedModel, @formula(reaction ~ 1 + days + zerocorr(1 + days|subj)), sleepstudy)
 ```
-```@setup Main
-    all(fm2zerocorr == fm2zerocorr_fm)
-```
 
 Alternatively, correlations between parameters can be removed by including them as separate random effects terms:
 ```@example Main
@@ -189,6 +186,7 @@ Separating the `1` and `days` random effects into separate terms removes the cor
 fit(MixedModel, @formula(reaction ~ 1 + days + (1|subj) + (days|subj)), sleepstudy,
     contrasts = Dict(:days => DummyCoding()))
 ```
+(Notice that the variance component for `days: 1` is estimated as zero, so the correlations for this component are undefined and expressed as `NaN`, not a number.)
 
 An alternative is to force all the levels of `days` as indicators using `fulldummy` encoding.
 ```@docs
@@ -234,10 +232,10 @@ The canonical link, which is `LogitLink` for the `Bernoulli` distribution, is us
 Note that, in keeping with convention in the [`GLM` package](https://github.com/JuliaStats/GLM.jl), the distribution family for a binary (i.e. 0/1) response is the `Bernoulli` distribution.
 The `Binomial` distribution is only used when the response is the fraction of trials returning a positive, in which case the number of trials must be specified as the case weights.
 
-### Optional arguments to fit!
+### Optional arguments to fit
 
 An alternative approach is to create the `GeneralizedLinearMixedModel` object then call `fit!` on it.
-In this form optional arguments `fast` and/or `nAGQ` can be passed to the optimization process.
+The optional arguments `fast` and/or `nAGQ` can be passed to the optimization process via both `fit` and `fit!` (i.e these optimization settings are not used nor recognized when constructing the model). 
 
 As the name implies, `fast=true`, provides a faster but somewhat less accurate fit.
 These fits may suffice for model comparisons.
@@ -344,7 +342,7 @@ coefnames(fm1)
 ```
 ```@example Main
 fixef(fm1)
-fixefnames
+fixefnames(fm1)
 ```
 
 An alternative extractor for the fixed-effects coefficient is the `β` property.
@@ -445,6 +443,15 @@ These are sometimes called the *best linear unbiased predictors* or [`BLUPs`](ht
 
 At a superficial level these can be considered as the "estimates" of the random effects, with a bit of hand waving, but pursuing this analogy too far usually results in confusion.
 
+To obtain tables associating the values of the conditional modes with the levels of the grouping factor, use
+```@docs
+raneftables
+```
+as in
+```@example Main
+DataFrame(only(raneftables(fm1)))
+```
+
 The corresponding conditional variances are returned by
 ```@docs
 condVar
@@ -521,4 +528,3 @@ fm4r = fit(MixedModel, @formula(diameter ~ 1+(1|plate)+(1|sample)),
 sum(leverage(fm4r))
 ```
 
-
diff --git a/docs/src/optimization.md b/docs/src/optimization.md
@@ -98,7 +98,7 @@ In the types of `LinearMixedModel` available through the `MixedModels` package,
 
 For the simple example
 ```@example Main
-using DataFrames, MixedModels
+using BenchmarkTools, DataFrames, MixedModels
 ```
 ```@example Main
 dyestuff = MixedModels.dataset(:dyestuff)
@@ -145,8 +145,7 @@ MixedModels.getθ(t21)
 
 Random-effects terms in the model formula that have the same grouping factor are amalgamated into a single `ReMat` object.
 ```@example Main
-fm3 = fit!(LinearMixedModel(@formula(reaction ~ 1+days+(1|subj) + (0+days|subj)),
-    sleepstudy))
+fm3 = fit(MixedModel, @formula(reaction ~ 1+days+(1|subj) + (0+days|subj)), sleepstudy)
 t31 = first(fm3.reterms);
 Int.(t31)
 ```
@@ -170,17 +169,7 @@ Note that the first `ReMat` in `fm4.terms` corresponds to grouping factor `G` ev
 
 ### Progress of the optimization
 
-An optional named argument, `verbose=true`, in the call to `fit` for a `LinearMixedModel` causes printing of the objective and the $\theta$ parameter at each evaluation during the optimization.
-```@example Main
-fit(MixedModel,
-    @formula(yield ~ 1 + (1|batch)),
-    dyestuff,
-    verbose=true);
-fit(MixedModel,
-    @formula(reaction ~ 1 + days + (1+days|subj)),
-    sleepstudy,
-    verbose=true);
-```
+An optional named argument, `verbose=true`, in the call to `fit` for a `LinearMixedModel` causes printing of the objective and the $\theta$ parameter at each evaluation during the optimization.  (Not illustrated here.)
 
 A shorter summary of the optimization process is always available as an
 ```@docs
@@ -333,7 +322,7 @@ mdl.b # conditional modes of b
 ```
 
 ```@example Main
-fit!(mdl, fast=true, verbose=true);
+fit!(mdl, fast=true);
 ```
 
 The optimization process is summarized by
@@ -344,15 +333,13 @@ mdl.LMM.optsum
 
 As one would hope, given the name of the option, this fit is comparatively fast.
 ```@example Main
-@time(fit!(GeneralizedLinearMixedModel(vaform,
-    verbagg, Bernoulli()), fast=true))
+@btime fit(MixedModel, vaform, verbagg, Bernoulli(), fast=true)
 ```
 
 The alternative algorithm is to use PIRLS to find the conditional mode of the random effects, given $\beta$ and $\theta$ and then use the general nonlinear optimizer to fit with respect to both $\beta$ and $\theta$.
-Because it is slower to incorporate the $\beta$ parameters in the general nonlinear optimization, the fast fit is performed first and used to determine starting estimates for the more general optimization.
 
 ```@example Main
-@time mdl1 = fit(MixedModel, vaform, verbagg, Bernoulli())
+mdl1 = @btime fit(MixedModel, vaform, verbagg, Bernoulli())
 ```
 
 This fit provided slightly better results (Laplace approximation to the deviance of 8151.400 versus 8151.583) but took 6 times as long.
diff --git a/docs/src/rankdeficiency.md b/docs/src/rankdeficiency.md
@@ -42,7 +42,7 @@ The same holds for the associated [`fixefnames`](@ref) and [`coefnames`](@ref).
 In MixedModels.jl, we use standard numerical techniques to detect rank deficiency.
 We currently offer no guarantees as to which exactly of the standard techniques (pivoted QR decomposition, pivoted Cholesky decomposition, etc.) will be used.
 This choice should be viewed as an implementation detail.
-Similarly, we offer no guarentees as to which of columns will be treated as redundant.
+Similarly, we offer no guarantees as to which of columns will be treated as redundant.
 This choice may vary between releases and even between platforms (both in broad strokes of "Linux" vs. "Windows" and at the level of which BLAS options are loaded on a given processor architecture) for the same release.
 In other words, *you should not rely on the order of the pivoted columns being consistent!* when you switch to a different computer or a different operating system.
 If consistency in the pivoted columns is important to you, then you should instead determine your rank ahead of time and remove extraneous columns / predictors from your model specification.
diff --git a/src/generalizedlinearmixedmodel.jl b/src/generalizedlinearmixedmodel.jl
@@ -624,7 +624,14 @@ function Base.show(io::IO, m::GeneralizedLinearMixedModel)
     println(io, "  ", m.LMM.formula)
     println(io, "  Distribution: ", Distribution(m.resp))
     println(io, "  Link: ", GLM.Link(m.resp), "\n")
-    println(io, "  Deviance: ", Ryu.writefixed(deviance(m, nAGQ), 4))
+    println(io)
+    nums = Ryu.writefixed.([loglikelihood(m), deviance(m), aic(m), aicc(m), bic(m)], 4)
+    fieldwd = max(maximum(textwidth.(nums)) + 1, 11)
+    for label in [" logLik", " deviance", "AIC", "AICc", "BIC"]
+        print(io, rpad(lpad(label, (fieldwd + textwidth(label)) >> 1), fieldwd))
+    end
+    println(io)
+    print.(Ref(io), lpad.(nums, fieldwd))
     println(io)
 
     show(io, VarCorr(m))
diff --git a/src/linearmixedmodel.jl b/src/linearmixedmodel.jl
@@ -787,9 +787,9 @@ function Base.show(io::IO, m::LinearMixedModel)
     if REML
         println(io, " REML criterion at convergence: ", oo)
     else
-        nums = Ryu.writefixed.([-oo / 2, oo, aic(m), bic(m)], 4)
+        nums = Ryu.writefixed.([-oo / 2, oo, aic(m), aicc(m), bic(m)], 4)
         fieldwd = max(maximum(textwidth.(nums)) + 1, 11)
-        for label in [" logLik", "-2 logLik", "AIC", "BIC"]
+        for label in ["  logLik", "-2 logLik", "AIC", "AICc", "BIC"]
             print(io, rpad(lpad(label, (fieldwd + textwidth(label)) >> 1), fieldwd))
         end
         println(io)