-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathDirect_Coefficient_Transformation.tex
861 lines (717 loc) · 31.7 KB
/
Direct_Coefficient_Transformation.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
\documentclass[]{article}
\usepackage{lmodern}
\usepackage{amssymb,amsmath}
\usepackage{ifxetex,ifluatex}
\usepackage{fixltx2e} % provides \textsubscript
\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\else % if luatex or xelatex
\ifxetex
\usepackage{mathspec}
\else
\usepackage{fontspec}
\fi
\defaultfontfeatures{Ligatures=TeX,Scale=MatchLowercase}
\fi
% use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
% use microtype if available
\IfFileExists{microtype.sty}{%
\usepackage{microtype}
\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
}{}
\usepackage[margin=1in]{geometry}
\usepackage{hyperref}
\hypersetup{unicode=true,
pdftitle={Direct Coefficient Recentering and Rescaling},
pdfauthor={Doug Hemken},
pdfborder={0 0 0},
breaklinks=true}
\urlstyle{same} % don't use monospace font for urls
\usepackage{graphicx,grffile}
\makeatletter
\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi}
\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi}
\makeatother
% Scale images if necessary, so that they will not overflow the page
% margins by default, and it is still possible to overwrite the defaults
% using explicit options in \includegraphics[width, height, ...]{}
\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
\IfFileExists{parskip.sty}{%
\usepackage{parskip}
}{% else
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}
}
\setlength{\emergencystretch}{3em} % prevent overfull lines
\providecommand{\tightlist}{%
\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
\setcounter{secnumdepth}{0}
% Redefines (sub)paragraphs to behave more like sections
\ifx\paragraph\undefined\else
\let\oldparagraph\paragraph
\renewcommand{\paragraph}[1]{\oldparagraph{#1}\mbox{}}
\fi
\ifx\subparagraph\undefined\else
\let\oldsubparagraph\subparagraph
\renewcommand{\subparagraph}[1]{\oldsubparagraph{#1}\mbox{}}
\fi
%%% Use protect on footnotes to avoid problems with footnotes in titles
\let\rmarkdownfootnote\footnote%
\def\footnote{\protect\rmarkdownfootnote}
%%% Change title format to be more compact
\usepackage{titling}
% Create subtitle command for use in maketitle
\newcommand{\subtitle}[1]{
\posttitle{
\begin{center}\large#1\end{center}
}
}
\setlength{\droptitle}{-2em}
\title{Direct Coefficient Recentering and Rescaling}
\pretitle{\vspace{\droptitle}\centering\huge}
\posttitle{\par}
\author{Doug Hemken}
\preauthor{\centering\large\emph}
\postauthor{\par}
\predate{\centering\large\emph}
\postdate{\par}
\date{2018-12-04}
\begin{document}
\maketitle
{
\setcounter{tocdepth}{1}
\tableofcontents
}
\hypertarget{introduction}{%
\section{Introduction}\label{introduction}}
For analysts working with linear models, recentering and rescaling the
variables under analysis is such a routine task it hardly garners
attention. In fields where there are no natural, physical units of
measurement - education and psychology, to name two - it is common
practice to refer to standardized units of measure. It is not uncommon
to see analysts fit and report the same model in both the original and
standardized units of measurement.
This is a classic problem, with a classic solution widely implemented in
statistical software, with well recognized limitations.
\hypertarget{the-problem-higher-order-models}{%
\section{The Problem: Higher Order
Models}\label{the-problem-higher-order-models}}
For additive models - models with intercepts and slopes of single
variables to polynomial degree one - the analyst can directly transform
the coefficients in the model via a classic formula that appears in most
textbooks. Consider, for example, the regression model
\[y = \beta_0 + \beta_1x_1 + \beta_2x_2\] where \(x_1\), \(x_2\), and
\(y\) are all continuous variables. If we transform the data so that all
the variables are centered, the transformed coefficients for this model
are given by \(\beta_0^\delta=0\) and \(\beta_i^\delta=\beta_i\) for
\(i\ge 1\). If we further transform the data so that all variables are
standardized, the transformed coefficients for this model are given by
\(\beta_0^z=0\) and \[\beta_i^z=\frac{\sigma_{x_i}}{\sigma_y}\beta_i\]
However, once interaction terms and higher order polynomial terms appear
in a model, using the classic formula requires recentering and rescaling
higher order terms with the means and standard deviations of the higher
order data vectors, the (Hadamard, element-wise) product of the lower
order data vectors, independent of the rescaling of lower order terms.
Consider, for example, the regression model
\[y = \beta_0 + \beta_1x_1 + \beta_2x_2+ \beta_3x_1x_2\] The classic
formula transforms \(\beta_0\), \(\beta_1\), and \(\beta_2\) as before.
For \(\beta_3\) we use the standard deviation of the product term
\(\sigma_{x_1x_2} (\neq \sigma_{x_1}\sigma_{x_2})\) as the numerator
rescaling factor. This implies that \(x_1x_2\) would have been
recentered with \(\mu_{x_1x_2}(\neq \mu_{x_1}\mu_{x_2})\), although we
do not need to actually calculate this. These are \textbf{\emph{not}}
the coefficients we would get if the data were first transformed, and
the model re-estimated.
This produces coefficients that are difficult to interpret because terms
involving the very same variables are on different scales. While this
can be useful for some purposes, such as calculating predicted values,
residuals, and goodness of fit, standard practice where the coefficients
are to be interpreted is to recalculate the data, then refit the model.
Refitting the model to recalculated data has the advantage that software
also produces a variance-covariance matrix appropriate to the
transformed coefficients, and sets up the software for post-estimation
operations with the transformed model.
Available software (SAS, Stata, SPSS) calculates standardized
coefficients directly using the classic approach, perhaps as a legacy of
the sweep operations of the 1970s. R has at least 3 packages -
QuantPsyc, lm.beta, and lsr - that implement this classic formula.
It seems to be little appreciated that the coefficients for recentered
and rescaled (including standardized) models can be easily calculated
directly.
\hypertarget{direct-transformation}{%
\section{Direct Transformation}\label{direct-transformation}}
\hypertarget{one-variable}{%
\subsection{One Variable}\label{one-variable}}
Transforming the coefficients and the variance-covariance matrix of a
linear model with a single continous outcome and a single continuous
predictor is straighforward. Consider the simple model
\[y = b_0 + b_1x\] or in the usual matrix form \[Y=X\beta\]
If we wish to recenter our model in terms of \(x_\delta=x-\mu_x\), an
arbitrarily recentered \(x\), we can do so without calculating the
\(x_\delta\). We use a linear transformation \(C\) to map the vector of
coefficients \(\beta\) to a vector of centered coefficients,
\(\beta^\delta\). \[\beta^\delta=C\beta\] where \(C\) takes the form
\[C=\begin{bmatrix}1 & \mu \\ 0 & 1 \end{bmatrix}\] Rescaling in terms
of \(x_z=x_\delta/\sigma\) can be done directly with the linear
transformation \[S=\begin{bmatrix}1 & 0 \\ 0 & \sigma \end{bmatrix}\] To
standardize the coefficients, we recenter, then rescale. In one step
this is
\[Z = S \times C =\begin{bmatrix}1 & \mu \\ 0 & \sigma \end{bmatrix}\]
To fully standardize this model, the final step is to standardize \(y\),
\(y_z=(y-\mu_y)/\sigma_y\). This requires adjusting \(\beta_0^z\) by
\(\mu_y\), which in this simple case leaves \(\beta_0^z=0\), and
dividing \(\beta^z/\sigma_y\). More complicated models also require
these two final operations for full standardization. It will simplify
further discussion to drop consideration of this.
It is worth noting that ``recentering'' and ``rescaling'' may be done
with any arbitrary constants, although it is perhaps most often done
with a sample mean and sample standard deviation. However, this same
approach would hold for converting a model where \(x\) is expressed, for
example, in degrees Fahrenheit to one expressed in degrees Centrigrade.
\hypertarget{example-one-variable-recentering}{%
\subsubsection{Example: One Variable
Recentering}\label{example-one-variable-recentering}}
\begin{verbatim}
example <- lm(mpg ~ wt, data=mtcars)
\end{verbatim}
Here the coefficients can be recentered as if the \(x\) variable
\texttt{wt} were recentered to the sample mean.
\begin{verbatim}
C <- matrix(c(1,0,mean(mtcars$wt),1), ncol=2)
C%*%coef(example)
\end{verbatim}
\begin{verbatim}
[,1]
[1,] 20.09
[2,] -5.34
\end{verbatim}
We can check that this agrees with recentering the data, then refitting
the model.
\begin{verbatim}
wtcentered <- mtcars$wt - mean(mtcars$wt)
check <- lm(mpg ~ wtcentered, data=mtcars)
cbind(C%*%coef(example),coef(check))
\end{verbatim}
\begin{verbatim}
[,1] [,2]
(Intercept) 20.09 20.09
wtcentered -5.34 -5.34
\end{verbatim}
We can also use the same recentering matrix to transform the
variance-covariance matrix of the coefficients.
\begin{verbatim}
C%*%vcov(example)%*%t(C)
\end{verbatim}
\begin{verbatim}
[,1] [,2]
[1,] 0.29 0.000
[2,] 0.00 0.313
\end{verbatim}
\begin{verbatim}
vcov(check)
\end{verbatim}
\begin{verbatim}
(Intercept) wtcentered
(Intercept) 0.29 0.000
wtcentered 0.00 0.313
\end{verbatim}
\begin{verbatim}
# check equality
norm(C%*%vcov(example)%*%t(C)-vcov(check), "F")
\end{verbatim}
\begin{verbatim}
[1] 4.48e-16
\end{verbatim}
A change of basis for the column space of \(X\) induces a change of
basis for the column space of the coefficient vector, and a change of
basis for the row and column space of the variance-covariance matrix.
\(C\), \(S\), and \(Z\) are change of basis transformations.
\hypertarget{two-variables}{%
\subsection{Two Variables}\label{two-variables}}
Now consider a model with two continuous independent variables and an
interaction term, so the columns of \(X\) are
\(\begin{bmatrix} 1_n &x_1 &x_2 &x_1x_2 \end{bmatrix}\). We compose the
coefficient change of basis from the two simple transformations as a
direct product. Denote
\[C_1=\begin{bmatrix}1 & \mu_1 \\ 0 & 1 \end{bmatrix}\]
\[C_2=\begin{bmatrix}1 & \mu_2 \\ 0 & 1 \end{bmatrix}\] Then
\[C = C_2 \otimes C_1 = \begin{bmatrix} 1 & \mu_1 &\mu_2 &\mu_2\mu_1 \\
0 &1 &0 &\mu_2 \\ 0 &0 &1 &\mu_1 \\ 0 &0 &0 &1 \end{bmatrix}\] The
classic standardization formula, as noted above, assumes recentering
with different constants in adjusting the coefficients. Note here that
recentering will always leave the highest order term unchanged.
While this transformation is simple to construct in theory, in practice
attention must be given to the column ordering: to use \(C\) we must
recognize that the column space is ordered
\(\begin{bmatrix} 1_n &x_1 &x_2 &x_1x_2 \end{bmatrix}\), so the order
must match that of the coefficient vector and the variance-covariance
matrix, perhaps through permutation.
While in general \(C_1 \otimes C_2 \neq C_2 \otimes C_1\), for any such
operation there exists a permutation, \(P\), such that
\(C_1 \otimes C_2 = P^T(C_2 \otimes C_1)P\). As long as we include
simple recentering and rescaling matrices for every variable used in our
coefficient terms, up to a final permutation their order does not
matter.
\hypertarget{example-two-variable-recentering}{%
\subsubsection{Example: Two Variable
Recentering}\label{example-two-variable-recentering}}
In order to build a recentering matrix, then, we need to collect a
labelled vector of recentering constants, and an ordered vector of
coefficient terms.
\begin{verbatim}
source("stdParm functions.r")
ex2 <- lm(mpg ~ wt*disp, data=mtcars) # the base model
x.means <- colMeans(mtcars[,c("wt","disp")]) # recentering constants (means)
b.terms <- names(coef(ex2)) # coefficients/terms
C <- recentering.matrix(x.means, b.terms)
C
\end{verbatim}
\begin{verbatim}
(Intercept) wt disp wt:disp
(Intercept) 1 3.22 231 742.29
wt 0 1.00 0 230.72
disp 0 0.00 1 3.22
wt:disp 0 0.00 0 1.00
\end{verbatim}
This, then, is what we use to produce recentered coefficients, and the
accompanying variance-covariance matrix.
\begin{verbatim}
C %*% coef(ex2)
\end{verbatim}
\begin{verbatim}
[,1]
(Intercept) 18.8695
wt -3.7950
disp -0.0187
wt:disp 0.0117
\end{verbatim}
\begin{verbatim}
C %*% vcov(ex2) %*% C
\end{verbatim}
\begin{verbatim}
(Intercept) wt disp wt:disp
(Intercept) -0.67299 -1.8682 -155.27 -431.03
wt -1.84937 -4.8817 -426.70 -1126.34
disp 0.00716 0.0165 1.65 3.82
wt:disp 0.00826 0.0237 1.90 5.47
\end{verbatim}
\hypertarget{example-two-variable-rescaling}{%
\subsection{Example: Two Variable
Rescaling}\label{example-two-variable-rescaling}}
Rescaling is just as easy. Here,
\[S_1=\begin{bmatrix}1 &0 \\ 0 &\sigma_1 \end{bmatrix}\]
\[S_2=\begin{bmatrix}1 &0 \\ 0 &\sigma_2 \end{bmatrix}\] Then
\[S = S_2 \otimes S_1 = \begin{bmatrix} 1 &0 &0 &0 \\
0 &\sigma_1 &0 &0 \\ 0 &0 &\sigma_2 &0 \\ 0 &0 &0 &\sigma_2\sigma_1 \end{bmatrix}\]
While the classic standardization formula takes a similar diagonal form,
the rescaling constants for higher order terms (\(>1\)) are different.
Rescaling can be useful independent of recentering, for example, to
rescale from United States customary units to SI units where the zero of
each scale remains unchanged.
\begin{verbatim}
# pounds to kilograms, and cubic inches to liters
x.scales <- c(1/453.592, 61.024)
names(x.scales) <- c("wt", "disp")
S <- recentering.matrix(x.scales, b.terms, type="scale")
S
\end{verbatim}
\begin{verbatim}
(Intercept) wt disp wt:disp
(Intercept) 1 0.0000 0 0.000
wt 0 0.0022 0 0.000
disp 0 0.0000 61 0.000
wt:disp 0 0.0000 0 0.135
\end{verbatim}
Check the coefficients:
\begin{verbatim}
wtkg <- mtcars$wt*453.592 # 1000 lbs to kg
displ <- mtcars$disp/61.024 # cu.in. to liters
ex3 <- lm(mpg~wtkg*displ, data=mtcars)
ex3coefs <- cbind(S %*% coef(ex2),coef(ex3))
colnames(ex3coefs) <- c("Direct","Data Trans.")
ex3coefs
\end{verbatim}
\begin{verbatim}
Direct Data Trans.
(Intercept) 44.08200 44.08200
wt -0.01432 -0.01432
disp -3.43920 -3.43920
wt:disp 0.00157 0.00157
\end{verbatim}
Compare the variance-covariance matrices:
\begin{verbatim}
norm(S %*% vcov(ex2) %*% S-vcov(ex3), "F")
\end{verbatim}
\begin{verbatim}
[1] 7.26e-15
\end{verbatim}
\hypertarget{interaction-terms-factorial-models-and-direct-products}{%
\section{Interaction Terms, Factorial Models, and Direct
Products}\label{interaction-terms-factorial-models-and-direct-products}}
To understand the use of the direct product here, we need to consider
the relationship between the \emph{data} space and the \emph{parameter}
space of a model. In all of the models considered here, the column space
of the parameters is a subset of the outer product of the columns space
of the data, including a column for the intercept term.
The data are modeled in an outer product of the data column space. In
transforming the coefficients, we need the same vector space for both
columns and rows - the Kronecker operator provides this twofold outer
product very neatly.
We may consider a model that includes only the mean of the response as a
zero-order model, sometimes called an intercept-only model. A model with
means of several categories, parameterized as a mean and offsets to that
mean, is a model with multiple intercepts (only), and is also a
zero-order model. Classical ANOVA models are zero-order models.
A model of a response with a continuous variable includes both an
intercept and a slope. This is a first-order model, with a zero-order
term and a first-order term. Adding categorical variables to the model
adds more intercepts, or zero-order terms. Adding continuous variables
adds more slopes, or first-order terms. Such additive models are all
first-order models.
An interaction term is formed as the product of two variables. A product
of categorical variables adds intercepts to the model. The interaction
of a categorical variable and a continuous variable adds slopes to the
model. In either case, the order of the model remains the same. But the
interaction formed from the product of two continuous variables adds a
second-order term to a model, a curvature.
A factorial model is formed by adding all the products of all the zero-
and first-order terms, in all combinations. If we think of the terms in
a model as its column space, then any linear model resides in a subspace
of the factorial column space. The columns of any linear model are a
subset of the columns in a full-factorial model.
Finally, transformations of the parameters are measured within the
parameter space. But transformations that reflect changes in the data
space are an outer product. Transformations from one basis for
parameters to another uses this outer product for both its column space
and row space. Kronecker operations give us just this result.
\hypertarget{less-than-full-factorial-models}{%
\section{Less-Than Full Factorial
Models}\label{less-than-full-factorial-models}}
A less-than-full factorial model can be though of as an outer product
with some terms zeroed out.
In practice, many models with interaction or polynomial terms are not
full factorial models. To derive the correct recentering and rescaling
matrices means realizing that some of the elements of \(\beta\) are
\(0\). Doing so allows us to work with reduced coefficient vectors and
reduced transformation matrices.
\hypertarget{dropping-higher-order-terms}{%
\subsection{Dropping Higher Order
Terms}\label{dropping-higher-order-terms}}
Consider again the additive model
\[y = \beta_0 + \beta_1x_1 + \beta_2x_2\]
The recentering matrix for \(x_1\) and \(x_2\) is as given above,
however for \(\beta\) we have
\[\begin{bmatrix}\beta_0^\delta \\ \beta_1^\delta \\ \beta_2^\delta \\ \beta_3^\delta \end{bmatrix}=
\begin{bmatrix} 1 & \mu_1 &\mu_2 &\mu_2\mu_1 \\
0 &1 &0 &\mu_2 \\ 0 &0 &1 &\mu_1 \\ 0 &0 &0 &1 \end{bmatrix}
\begin{bmatrix}\beta_0 \\ \beta_1 \\ \beta_2 \\ 0 \end{bmatrix}\] This
simplifies to
\[\begin{bmatrix}\beta_0^\delta \\ \beta_1^\delta \\ \beta_2^\delta \\ \beta_3^\delta \end{bmatrix}=
\begin{bmatrix} 1 & \mu_1 &\mu_2 \\
0 &1 &0 \\ 0 &0 &1 \\ 0 &0 &0 \end{bmatrix}
\begin{bmatrix}\beta_0 \\ \beta_1 \\ \beta_2 \end{bmatrix}\] Not
surprisingly, this leaves \(\beta_3^\delta=0\), and we can further
simplify
\[\begin{bmatrix}\beta_0^\delta \\ \beta_1^\delta \\ \beta_2^\delta \end{bmatrix}=
\begin{bmatrix} 1 & \mu_1 &\mu_2 \\
0 &1 &0 \\ 0 &0 &1 \end{bmatrix}
\begin{bmatrix}\beta_0 \\ \beta_1 \\ \beta_2 \end{bmatrix}\] In other
words, we end up with the only tranformation being to \(\beta_0\), which
had to be the result for a recentered additive model. Following this
approach for rescaling, we can derive in matrix form our classic
standardization formula for additive models as well.
\hypertarget{dropping-lower-order-terms}{%
\subsection{Dropping Lower Order
Terms}\label{dropping-lower-order-terms}}
Another point worth considering is the effect of recentering variables
on a model where a lower-order term has been dropped beneath a
higher-order term, such as a no-intercept model or a nested terms model.
Consider the model \[y = \beta_0 + \beta_1x_1 + \beta_3x_1x_2\] where
the term \(\beta_2x_2\) has been dropped, setting \(\beta_2=0\).
Here, our coefficient transformation looks like
\[\begin{bmatrix}\beta_0^\delta \\ \beta_1^\delta \\ \beta_2^\delta \\ \beta_3^\delta \end{bmatrix}=
\begin{bmatrix} 1 & \mu_1 &\mu_2 &\mu_2\mu_1 \\
0 &1 &0 &\mu_2 \\ 0 &0 &1 &\mu_1 \\ 0 &0 &0 &1 \end{bmatrix}
\begin{bmatrix}\beta_0 \\ \beta_1 \\ 0 \\ \beta_3 \end{bmatrix}\] We can
simplify this somewhat as
\[\begin{bmatrix}\beta_0^\delta \\ \beta_1^\delta \\ \beta_2^\delta \\ \beta_3^\delta \end{bmatrix}=
\begin{bmatrix} 1 & \mu_1 &\mu_2\mu_1 \\
0 &1 &\mu_2 \\ 0 &0 &\mu_1 \\ 0 &0 &1 \end{bmatrix}
\begin{bmatrix}\beta_0 \\ \beta_1 \\ \beta_3 \end{bmatrix}\] But here we
see that our recentered model gains a term and a coefficient,
\(\beta_2^\delta (=\mu_1\beta_3)\)!
The highest order term in which a variable appears is always unchanged
by recentering; lower order terms change when any of the \emph{other}
variables in a higher order term which includes the variables in the
lower order term are recentered. Going back to the additive model,
consisting of only first-order (slope) and zero-order (intercept) terms,
we see that only the intercept changes when the first order \(x_i\) are
recentered. Recentering a first-order model that had no intercept would
transform the coefficients so that an intercept was included.
If we build recentering and rescaling matrices variable by variable, we
can use less-than-full factorial combinations as building blocks. That
is to say, we could build a matrix for a full-factorial model and then
drop columns for unused terms, or we could approach this piecemeal.
{[}Checking for missing lower order terms in not currently implemented.
However, this should be easy to accomplish: drop columns in a full
factorial \(C\) or \(S\) not included in the coefficient vector
(i.e.~not among the terms), then check for rows that are zero vectors
and drop (only) those.{]}
\hypertarget{untransformed-variables}{%
\subsection{Untransformed Variables}\label{untransformed-variables}}
It may be that the analyst wishes to leave some variables untransformed.
One way to view this is that the recentering constant \(\mu=0\) and the
rescaling constant \(\sigma=1\). So the ``transformation'' for this
variable is just the identity matrix
\[S=C=\begin{bmatrix}1 & 0 \\ 0 &1 \end{bmatrix}\] This leads to a
simplification of the full factorial transformation matrix in terms of
direct sums. If we have \(C_2=I_2\), then
\[C_2 \otimes C_1 = \begin{bmatrix}C_1 &0 \\ 0 &C_1\end{bmatrix} = C_1 \oplus C_1 \]
\hypertarget{polynomial-terms}{%
\section{Polynomial Terms}\label{polynomial-terms}}
Polynomial models may also be built as outer products, but with lower
order terms collected.
Like less-than-full factorial models, models with polynomial terms are
worth a little extra scrutiny. Consider the model
\[y = \beta_0 + \beta_1x + \beta_3x^2\] If we rewrite this as
\[y = \beta_0 + \beta_1x + \beta_2x +\beta_3xx\] it looks like a
factorial model. But all the effect of \(x\) is collected in a single
term, \(\beta_1\), so \(\beta_2=0\). This is perhaps easier to see if we
look at the recentering transformation
\[\begin{bmatrix}\beta_0^\delta \\ \beta_1^\delta \\ \beta_2^\delta \\ \beta_3^\delta \end{bmatrix}=
\begin{bmatrix} 1 & \mu &\mu &\mu\mu \\
0 &1 &0 &\mu \\ 0 &0 &1 &\mu \\ 0 &0 &0 &1 \end{bmatrix}
\begin{bmatrix}\beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3 \end{bmatrix}\]
If we take \(\beta_2=0\), then we can simplify as before
\[\begin{bmatrix}\beta_0^\delta \\ \beta_1^\delta \\ \beta_2^\delta \\ \beta_3^\delta \end{bmatrix}=
\begin{bmatrix} 1 & \mu &\mu\mu \\
0 &1 &\mu \\ 0 &0 &\mu \\ 0 &0 &1 \end{bmatrix}
\begin{bmatrix}\beta_0 \\ \beta_1 \\ \beta_3 \end{bmatrix}\] But now
\(\beta_1^\delta\) and \(\beta_2^\delta\) both have part of the effect
of a recentered \(x_1\). If we collect these terms in
\(\beta^\delta_{12}=\beta_{1}^\delta+\beta_{2}^\delta\) we end up with
\[\begin{bmatrix}\beta_0^\delta \\ \beta_{12}^\delta \\ \beta_3^\delta \end{bmatrix}=
\begin{bmatrix} 1 & \mu &\mu\mu \\
0 &1 &2\mu \\ 0 &0 &1 \end{bmatrix}
\begin{bmatrix}\beta_0 \\ \beta_1 \\ \beta_3 \end{bmatrix}\] Models
that include terms to any polynomial degree can be handled in this
manner.
Polynomial models where higher degree terms are included while dropping
lower degree terms, when recentered, will have the lower order terms
re-emerge, just as we saw with less-than-full factorial models with
dropped lower order terms.
Because rescaling transformation are diagonal matrices, the
simplification for a polynomial rescaling just drops a column and a row/
\hypertarget{categorical-terms}{%
\section{Categorical terms}\label{categorical-terms}}
In practice there are a number of approaches used for categorical
variables, i.e. collections of indicator/contrast variables.
\begin{itemize}
\tightlist
\item
Leaving the intercept terms in reference coding amounts to leaving the
coefficients for indicators untransformed, as described previously.
Where there are \(k\) categories, then, we may use an \(I_k\) identity
matrix, and do our computation are direct sums.
\item
Standardizing each term as a z-score is equivalent to treating each
category in the same manner as continuous variables, as described
previously.
\item
Transforming to coding other than reference coding is again a
``recentering'' change of basis in that it changes where we find zero
in the parameter space. Here, however, we need another simple
transformation.
\end{itemize}
For example, the general form of the reference coding to grand-mean
coding recentering matrix for a categorical variable with \(k\)
categories is \[
\begin{bmatrix}
1 &1/k &\cdots &1/k \\
0 &(k-1)/k &\cdots &-1/k \\
0 &-1/k &\cdots &-1/k \\
\vdots &\vdots &\vdots &\vdots \\
0 &-1/k &\cdots &(k-1)/k
\end{bmatrix}
\]
This is then used in the same way as previously discussed recentering
transformations. (The first category remains the dropped column in this
transformation.)
\begin{verbatim}
C1 <- ref.to.gm(3)
rownames(C1) <- colnames(C1) <- c("(Intercept)", "cyl6", "cyl8")
C1
\end{verbatim}
\begin{verbatim}
(Intercept) cyl6 cyl8
(Intercept) 1 0.333 0.333
cyl6 0 0.667 -0.333
cyl8 0 -0.333 0.667
\end{verbatim}
Combined with our transformation matrix for a continuous variable
\begin{verbatim}
wtmean <- mean(mtcars$wt)
names(wtmean) <- "wt"
C2 <- mean.to.matrix(wtmean)
C <- kron(C2,C1)
C
\end{verbatim}
\begin{verbatim}
(Intercept) cyl6 cyl8 wt cyl6:wt cyl8:wt
(Intercept) 1 0.333 0.333 3.22 1.072 1.072
cyl6 0 0.667 -0.333 0.00 2.145 -1.072
cyl8 0 -0.333 0.667 0.00 -1.072 2.145
wt 0 0.000 0.000 1.00 0.333 0.333
cyl6:wt 0 0.000 0.000 0.00 0.667 -0.333
cyl8:wt 0 0.000 0.000 0.00 -0.333 0.667
\end{verbatim}
This converts an uncentered model with reference coding for the
indicators to centered \texttt{wt} and grand-mean centered \texttt{cyl}.
\begin{verbatim}
cylf <- as.factor(mtcars$cyl)
excat <- lm(mpg ~ cylf*wt, data=mtcars)
summary(excat)
\end{verbatim}
\begin{verbatim}
Call:
lm(formula = mpg ~ cylf * wt, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.151 -1.380 -0.639 1.494 5.252
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 39.57 3.19 12.39 2.1e-12 ***
cylf6 -11.16 9.36 -1.19 0.24358
cylf8 -15.70 4.84 -3.24 0.00322 **
wt -5.65 1.36 -4.15 0.00031 ***
cylf6:wt 2.87 3.12 0.92 0.36620
cylf8:wt 3.45 1.63 2.12 0.04344 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.45 on 26 degrees of freedom
Multiple R-squared: 0.862, Adjusted R-squared: 0.835
F-statistic: 32.4 on 5 and 26 DF, p-value: 2.26e-10
\end{verbatim}
\begin{verbatim}
C %*% coef(excat)
\end{verbatim}
\begin{verbatim}
[,1]
(Intercept) 19.227
cyl6 0.237
cyl8 -2.413
wt -3.540
cyl6:wt 0.760
cyl8:wt 1.347
\end{verbatim}
\begin{verbatim}
contrasts(cylf) <- contr.sum
coef(lm(mpg~cylf*wtcentered, data=mtcars)) # note different dropped level
\end{verbatim}
\begin{verbatim}
(Intercept) cylf1 cylf2 wtcentered
19.227 2.176 0.237 -3.540
cylf1:wtcentered cylf2:wtcentered
-2.107 0.760
\end{verbatim}
\hypertarget{implementation}{%
\section{Implementation}\label{implementation}}
The basic algorithm here, the Kronecker product, is as old as the hills
and implemented in any software that handles matrix operations.
Integrating the pieces into useful software requires a little more work.
A useful implementation would take as input:
\begin{itemize}
\tightlist
\item
a vector of recentering constants, labeled, one per variable in the
model
\item
a vector of rescaling constants, labeled, one per variable
\item
a vector of coefficient terms (the labels themselves), formed so that
constituent variables and polynomial degree can be extracted from each
term
\end{itemize}
As implemented so far (the \texttt{recentering.matrix()} R function in
the two-variable examples above, (which calculates both recentering and
rescaling matrices) this handles less-than-full factorial models.
Missing from this implementation are:
\begin{itemize}
\tightlist
\item
handling dropped lower order terms
\item
handling polynomials
\item
handling categorical contrast conversions
\end{itemize}
The algorithm for collecting like terms in a transformation matrix
(\texttt{collect.terms()}), for a single variable taken to an arbitrary
polynomial degree, is ready, but not yet integrated into the full
construction algorithm.
A useful computational simplification not yet fully implemented is:
\begin{itemize}
\tightlist
\item
using direct sums for untransformed variables and collections of
categorical indicators, rather than direct products,
\texttt{factor.direct.sum()}.
\end{itemize}
Source code can be found at \url{https://github.com/Hemken/stdParm-R}
\hypertarget{errors-in-predictions}{%
\section{Errors in Predictions}\label{errors-in-predictions}}
The main benefit of direct transformation is improved interpretability,
and the general utility of the results for any post-estimation
calculation, without the cost of recalculating all the data values and
re-estimating the model.
An additional benefit is the reduction of numerical error.
In principle, the predicted values from the original model should be
exactly equal to the predicted values from a model estimated from the
centered data.
Here we compare two different methods of generating model coefficients
for recentered data. In the first method, we actually recenter the data,
then re-estimate the model, then calculate predicted values. In the
second method, we calculate the model coefficients directly, without
re-estimating the model. Then we calculate predicted values (using the
same recentered data as in the first method).
The model is
\[ y = \beta_0 + \beta_1x_1 +\beta_2x_2 + \beta_3x_3 + \beta_4x_1x_2 +
\beta_5x_1x_3 + \beta_6x_2x_3 + \beta_7x_1x_2x_3\]
\hypertarget{simulation}{%
\subsection{Simulation}\label{simulation}}
\begin{itemize}
\tightlist
\item
Numerical results: first is recentered data, second is directly
recentered coefficients
\item
Graphical results: black is recentered data, red is directly
recentered coefficients
\end{itemize}
The numerical results are the mean norm of the difference between the
predicted values of the original model versus each of the coefficient
transformation methods - recalculation and direct transformation.
The graphical results are a kernel density plot of the normed
differences for each simulation.
All of the differences are tiny. The differences visible in the graph
are attributable to the QR estimation for the re-estimation method.
\begin{verbatim}
library(parallel)
cl <- makeCluster(8)
nvals <- 100L
clusterExport(cl, c("nvals", "sim.3var.center", "gen_3x",
"mean.to.matrix", "matching.terms", "vars.in.terms", "kron",
"clean.kron.names", "matrix.build.clean"))
devnorms <- parSapply(cl, 1:100000, sim.3var.center, nvals)
rowMeans(devnorms)
\end{verbatim}
\begin{verbatim}
[1] 1.39e-14 6.70e-15
\end{verbatim}
\begin{verbatim}
kd.plot.overlay(t(devnorms), nvals)
\end{verbatim}
\includegraphics{errors/output_6_1.png}
\end{document}