diff --git a/toolbox/helpfiles/FSDA/images/simulateLM_01.png b/toolbox/helpfiles/FSDA/images/simulateLM_01.png index 1403a6652..7a656492a 100644 Binary files a/toolbox/helpfiles/FSDA/images/simulateLM_01.png and b/toolbox/helpfiles/FSDA/images/simulateLM_01.png differ diff --git a/toolbox/helpfiles/FSDA/images/simulateLM_02.png b/toolbox/helpfiles/FSDA/images/simulateLM_02.png index 4d2cff2bb..f52fa66b7 100644 Binary files a/toolbox/helpfiles/FSDA/images/simulateLM_02.png and b/toolbox/helpfiles/FSDA/images/simulateLM_02.png differ diff --git a/toolbox/helpfiles/FSDA/simulateLM.html b/toolbox/helpfiles/FSDA/simulateLM.html index a9529082b..af47a9eb6 100644 --- a/toolbox/helpfiles/FSDA/simulateLM.html +++ b/toolbox/helpfiles/FSDA/simulateLM.html @@ -1,26 +1,21 @@ -
simulateLM simulates linear regression data with prespecified values of statistical indexes.
simulateLM simulates linear regression data. It is possible to specify: - 1) the requested value of R2 (or equivaletly its SNR);
- 2) the values of the beta coefficients (possibly sparse);
- 3) the correlation (covariance) matrix among the explanatory variables.
- 4) the value of the intercept term.
- 5) the distribution to use to generate the Xs;
- 6) the distribution to use to generate the ys.
- 7) the MSOM contamination in Xs and ys.
- 8) the VIOM contamination in ys.
Simulate with prefixed value of R2.out
=simulateLM(n
,
Name, Value
)
- Set value of R2;
R2=0.82; -n=10000; -out=simulateLM(n,'R2',R2); -outLM=fitlm(out.X,out.y);
- Set value of R2;
R2=0.26; -n=10000; -A = gallery('moler',5,0.2); -out=simulateLM(n,'R2',R2,'SigmaX',A); +simulateLM \ No newline at end of file diff --git a/toolbox/regression/simulateLM.m b/toolbox/regression/simulateLM.m index 1f88f279f..f0dd769ac 100644 --- a/toolbox/regression/simulateLM.m +++ b/toolbox/regression/simulateLM.m @@ -12,7 +12,7 @@ % 6) the distribution to use to generate the ys. % 7) the MSOM contamination in Xs and ys. % 8) the VIOM contamination in ys. -% +% % Required input arguments: % % n : sample size. Scalar. n is a positive integer @@ -22,9 +22,15 @@ % Optional input arguments: % % R2 : Squared multiple correlation coefficient (R2). Scalar. The -% requested value of R2. A number in the -% interval [0 1] that specifies the requested value of R2. -% The default is to simulate regression data with R2=0; +% requested value of R2. A number in the interval [0 1] that +% specifies the asymptotic requested value of R2. The default is +% to simulate regression data with R2=0; Note that the value of +% R2 is the one in the population not in the sample in the sense +% that if, for example 'R2',00 the sample data (expecially if n +% is very small) can have a value which is slightly different +% from the prefixed one. If the exact value of R2 is required +% then the user has to use option exactR2. See below for further +% details. % Example - 'R2',0.90 % Data Types - double % @@ -59,7 +65,7 @@ % Data Types - double % distriby : distribution to use to simulate the response. Character. % Character that specifies the distribution to use to -% simulate the values of the explanatory variables. The +% simulate the values of the response. The % default is to use the Standard normal distribution. % Example - 'distriby', 'Lognormal' % Data Types - double @@ -71,6 +77,15 @@ % with parameters mu and sigma respectively equal to 2 and 10. % Example - 'distribypars', '[2 10]' % Data Types - double +% +% exactR2 : exact value of R2. Boolean. +% If exactR2 is the sample data have the requested value of +% R2. The default is exactR2 equal to false, that is just +% asymptotically, the sample data have a value of R2 equal +% to the one which is specified in option R2. +% Example - 'exactR2', true +% Data Types - logical +% % nexpl : number of explanatory variables. If vector beta is % supplied, nexpl is equal to length(beta). Similarly if % SigmaX is supplied nexpl is set equal to size(SigmaX,1). @@ -91,33 +106,33 @@ % Example - 'plots',false % Data Types - single | double % -% pMSOM : Proportion of MSOM outliers. The default is 10% MSOM +% pMSOM : Proportion of MSOM outliers. The default is 10% MSOM % contamination. % Example - 'pMSOM',0.25 % Data Types - double -% -% pVIOM : Proportion of VIOM outliers (non-overlapping with MSOM). +% +% pVIOM : Proportion of VIOM outliers (non-overlapping with MSOM). % The default is 10% VIOM contamination. % Example - 'pVIOM',0.25 % Data Types - double -% +% % shiftMSOMe : Mean-shift on the error terms for MSOM outliers. % Default value shiftMSOMe==10. % Example - 'shiftMSOMe',-3 % Data Types - double -% -% predxMSOM : Predictors subject to a mean shift by MSOM. It is a +% +% predxMSOM : Predictors subject to a mean shift by MSOM. It is a % p-dimensional vector indexing design matrix columns. % Default value is to contaminate only the non-zero % entries of beta_true (excluding the intercept). % Example - 'predxMSOM',true(2,1) % Data Types - boolean -% +% % shiftMSOMx : Mean-shift on the predictor terms for MSOM outliers. % Default value shiftMSOMx==10. % Example - 'shiftMSOMx',3 % Data Types - double -% +% % inflVIOMe : Variance-inflation for the errors subject to a VIOM. % Default value is inflVIOMe==10. % Example - 'inflVIOMe',5 @@ -147,8 +162,8 @@ % % References: % -% Insolia, L., F. Chiaromonte, and M. Riani (2020a). -% "A Robust Estimation Approach for Mean-Shift and Variance-Inflation Outliers". +% Insolia, L., F. Chiaromonte, and M. Riani (2020a). +% "A Robust Estimation Approach for Mean-Shift and Variance-Inflation Outliers". % Festschrift in Honor of R. Dennis Cook pp 17–41. % % @@ -262,7 +277,7 @@ predxMSOM = ''; shiftMSOMx = 10; inflVIOMe = 10; - +exactR2=false; options=struct('R2',R2,... 'beta',beta,'SigmaX',SigmaX,... @@ -271,7 +286,7 @@ 'nexpl',nexpl,'intercept',intercept,'plots',plots, ... 'SNR', SNR, 'pMSOM', pMSOM, 'pVIOM', pVIOM, ... 'shiftMSOMe', shiftMSOMe, 'predxMSOM', predxMSOM, ... - 'shiftMSOMx', shiftMSOMx, 'inflVIOMe', inflVIOMe); + 'shiftMSOMx', shiftMSOMx, 'inflVIOMe', inflVIOMe,'exactR2',exactR2); %% User options @@ -280,12 +295,12 @@ [varargin{:}] = convertStringsToChars(varargin{:}); UserOptions=varargin(1:2:length(varargin)); if ~isempty(UserOptions) - + % Check if number of supplied options is valid if length(varargin) ~= 2*length(UserOptions) error('FSDA:simulateLM:WrongInputOpt','Number of supplied options is invalid. Probably values for some parameters are missing.'); end - + % Check if all the specified optional arguments were present in % structure options. Remark: the nocheck option has already been dealt % by routine chkinputR. @@ -295,17 +310,17 @@ disp(strcat('Non existent user option found->', char(WrongOptions{:}))) error('FSDA:simulateLM:NonExistInputOpt','In total %d non-existent user options found.', length(WrongOptions)); end - + % Check the presence of input options beta, SigmaX and nexpl. betaboo=max(strcmp(UserOptions,'beta'))==1; SigmaXboo=max(strcmp(UserOptions,'SigmaX'))==1; nexplboo=max(strcmp(UserOptions,'nexpl'))==1; - + % Write in structure 'options' the options chosen by the user. for i=1:2:length(varargin) options.(varargin{i})=varargin{i+1}; end - + R2=options.R2; nexpl=options.nexpl; beta=options.beta; @@ -317,6 +332,7 @@ distribypars = options.distribypars; plots=options.plots; intercept=options.intercept; + exactR2=options.exactR2; SNR = options.SNR; pMSOM = options.pMSOM; pVIOM = options.pVIOM; @@ -335,21 +351,21 @@ if SNR<=0 error('FSDA:simulateLM:WrongOpt','SNR must be greater than zero'); end - + % Preliminary checks both beta, nexpl and SigmaX have been supplied. if betaboo==true && SigmaXboo==true && nexplboo==true - + if nexpl~=size(SigmaX,1) error('FSDA:simulateLM:WrongOpt',['Length of supplied vector beta ' ... 'must be equal to number of rows (columns) of matrix SigmaX']); end - + if nexpl~=length(beta) error('FSDA:simulateLM:WrongOpt',['Length of supplied vector beta ' ... 'must be equal to number of rows (columns) of matrix SigmaX']); end end - + % Preliminary checks just beta and SigmaX have been supplied. if betaboo==true && SigmaXboo==true && nexplboo==false nexpl=length(betaboo); @@ -358,7 +374,7 @@ 'must be equal to number of rows (columns) of matrix SigmaX']); end end - + % Preliminary checks just beta and nexpl have been supplied. if betaboo==true && SigmaXboo==false && nexpl==true nexpl=length(beta); @@ -368,7 +384,7 @@ end SigmaX=eye(nexpl); end - + % Preliminary checks just SigmaX and nexpl have been supplied. if betaboo==false && SigmaXboo==true && nexplboo==true nexplchk=size(SigmaXboo,1); @@ -378,60 +394,60 @@ end beta=ones(nexpl,1); end - + % Preliminary checks just beta has been supplied. if betaboo==true && SigmaXboo == false && nexplboo==false nexpl=length(beta); SigmaX=eye(nexpl); end - + % Preliminary checks just SigmaX has been supplied. if betaboo==false && SigmaXboo == true && nexplboo==false nexpl=size(SigmaX,1); beta=ones(nexpl,1); end - + % Preliminary checks just nexpl has been supplied. if betaboo==false && SigmaXboo == false && nexplboo==true beta=ones(nexpl,1); SigmaX=eye(nexpl); end - + [T,err] = cholcov(SigmaX); if err ~= 0 error('FSDA:mvnrnd:BadCovariance2DSymPos','WrongSigma'); end lXpars=length(distribXpars); - + %% checks on the explanatory variables. if ischar(distribX) - if lXpars==1 - X = random(distribX,distribXpars,n,nexpl); - elseif lXpars==2 - X = random(distribX,distribXpars(1),distribXpars(2),n,nexpl); - elseif lXpars==3 - X = random(distribX,distribXpars(1),distribXpars(2),distribXpars(3),n,nexpl); - else - X = random(distribX,distribXpars(1),distribXpars(2),distribXpars(3),distribXpars(4),n,nexpl); - end - % Generate the X in such a way their corr is SigmaX. - X=X*T; + if lXpars==1 + X = random(distribX,distribXpars,n,nexpl); + elseif lXpars==2 + X = random(distribX,distribXpars(1),distribXpars(2),n,nexpl); + elseif lXpars==3 + X = random(distribX,distribXpars(1),distribXpars(2),distribXpars(3),n,nexpl); + else + X = random(distribX,distribXpars(1),distribXpars(2),distribXpars(3),distribXpars(4),n,nexpl); + end + % Generate the X in such a way their corr is SigmaX. + X=X*T; else % In this case, the user has directly supplied matrix X. % Make sure that the size of X is n-by-nexpl. X=distribX; [nchk,nexplchk]=size(X); - if nchk~=n + if nchk~=n error('FSDA:simulateLM:WrongOpt',['supplied matrix X must have ' ... num2str(n) ' rows']); - end - if nexpl~=nexplchk + end + if nexpl~=nexplchk error('FSDA:simulateLM:WrongOpt',['supplied matrix X must have ' ... num2str(nexpl) ' columns']); - end + end end - - + + lypars=length(distribypars); if lypars==1 err = random(distriby,distribypars,n,1); @@ -442,35 +458,57 @@ else err = random(distriby,distribypars(1),distribypars(2),distribypars(3),distribypars(4),n,1); end - + p=nexpl+intercept; - + % Divide by std and multiply by a small sample correction factor. - err=sqrt((n)/(n-p))*err/std(err,1); + err=sqrt((n)/(n-p))*err/std(err,1); % err=err/std(err,1); - + if R2>0 && isempty(SNR) % Find var(\epsilon) which produces a value of R2 centered around % the one which has been requested. vareps=(intercept+beta'*SigmaX*beta)*((1 - R2)/R2); y=intercept+X*beta(:)+err*sqrt(vareps); + if exactR2==true + step=100; + outTMP=fitlm(X,y); + while abs(outTMP.Rsquared.Ordinary-R2)>0.01 + if outTMP.Rsquared.Ordinary >R2 + while outTMP.Rsquared.Ordinary>R2 + vareps=vareps+step; + y=intercept+X*beta(:)+err*sqrt(vareps); + outTMP=fitlm(X,y); + end + step=step/2; + elseif outTMP.Rsquared.OrdinarysimulateLM
simulateLM simulates linear regression data with pre-specified values of statistical indexes.
Description
simulateLM simulates linear regression data. It is possible to specify: + 1) the requested value of R2 (or equivalently its SNR);
+ 2) the values of the beta coefficients (possibly sparse);
+ 3) the correlation (covariance) matrix among the explanatory variables.
+ 4) the value of the intercept term.
+ 5) the distribution to use to generate the Xs;
+ 6) the distribution to use to generate the ys.
+ 7) the MSOM contamination in Xs and ys.
+ 8) the VIOM contamination in ys.
Simulate with prefixed value of R2.
out
=simulateLM(n
,Name, Value
)Examples
Simulate with prefixed value of R2. + Set value of R2;
R2=0.82; +n=10000; +out=simulateLM(n,'R2',R2); +outLM=fitlm(out.X,out.y);
Related Examples
Use prefixed correlation matrix for cov(X). + Set value of R2;
R2=0.26; +n=10000; +A = gallery('moler',5,0.2); +out=simulateLM(n,'R2',R2,'SigmaX',A); outLM=fitlm(out.X,out.y)outLM = @@ -29,139 +24,147 @@ y ~ 1 + x1 + x2 + x3 + x4 + x5 Estimated Coefficients: - Estimate SE tStat pValue - _________ ________ _______ __________ + Estimate SE tStat pValue + ________ ________ ______ __________ - (Intercept) -0.076653 0.053898 -1.4222 0.15501 - x1 1.0515 0.056414 18.638 3.0647e-76 - x2 0.92447 0.056001 16.508 2.0145e-60 - x3 1.0394 0.055297 18.797 1.72e-77 - x4 0.92012 0.055248 16.654 1.8903e-61 - x5 1.029 0.054653 18.828 9.8022e-78 + (Intercept) 0.075868 0.053908 1.4073 0.15936 + x1 1.1242 0.056285 19.974 4.6082e-87 + x2 1.0032 0.055788 17.982 3.5569e-71 + x3 0.95948 0.055435 17.308 3.7476e-66 + x4 0.97913 0.055195 17.739 2.3933e-69 + x5 0.98381 0.054149 18.169 1.3412e-72 Number of observations: 10000, Error degrees of freedom: 9994 Root Mean Squared Error: 5.39 -R-squared: 0.253, Adjusted R-Squared: 0.253 -F-statistic vs. constant model: 679, p-value = 0 -
Use prefixed values of R2, beta and intercept. - Set value of R2.
R2=0.92; -beta=[3; 4; 5; 2; 7]; -intercept=true; -n=100000; -out=simulateLM(n,'R2',R2,'beta',beta); -outLM=fitlm(out.X,out.y);
Sim study. - Compare the distribution of values of R2 with data generated from - Normal with those generated from Student T with 5 degrees of freedom.
% Set value of R2. -R2=0.92; -beta=[3; 4; 5; 2; 7; 2; 3]; -nsimul=1000; -R2all=zeros(nsimul,2); -n=100; -df=5; -for j=1:nsimul -% Data generated from Normal -out=simulateLM(n,'R2',R2,'beta',beta); -outLM=fitlm(out.X,out.y); -R2all(j,1)=outLM.Rsquared.Ordinary; -% Data generated from T(5) -out=simulateLM(n,'R2',R2,'beta',beta,'distriby','T','distribypars',df); -outLM=fitlm(out.X,out.y); -R2all(j,2)=outLM.Rsquared.Ordinary; -end -boxplot(R2all,'Labels',{'Normal', 'T(5)'});Input Arguments
n
— sample size. Scalar.n is a positive integer - which defines the length of the simulated data. For example - if n=100, y will be 100x1 and X will be 100xp.
Data Types:
single| double
Name-Value Pair Arguments
Specify optional comma-separated pairs of
Example:Name,Value
arguments.Name
is the argument name andValue
is the corresponding value.Name
must appear inside single quotes (' '
). You can specify several name and value pair arguments in any order asName1,Value1,...,NameN,ValueN
.
'R2',0.90 -
,'SNR',10 -
,'beta',[3 5 8] -
,'Sigma', gallery('lehmer',5) -
,'distribX', 'Beta' -
,'distribXpars', '[0.2 0.6]' -
,'distriby', 'Lognormal' -
,'distribypars', '[2 10]' -
,'distribypars', '[2 10]' -
,'intercept', true -
,'plots',false -
,'pMSOM',0.25 -
,'pVIOM',0.25 -
,'shiftMSOMe',-3 -
,'predMSOM',true(2,1) -
,'shiftMSOMx',3 -
,'inflVIOMe',5 -
R2
—Squared multiple correlation coefficient (R2).scalar.The - requested value of R2. A number in the - interval [0 1] which specifies the requested value of R2.
- The default is to simulate regression data with R2=0;
-
Example:
'R2',0.90 -
Data Types:
double
SNR
—Signal to noise ratio characterizing the simulation.this is defined such that sigma_error == sqrt(var(X_u*beta_true)/SNR) The default is SNR=='' and R2 is used instead.Example:
'SNR',10 -
Data Types:
double
beta
—the values of the beta coefficients.vector.Vector which - contains the values of the regression coefficients. The - default is a vector of ones.
-
Example:
'beta',[3 5 8] -
Data Types:
double
SigmaX
—the correlation matrix.matrix.Positive definite matrix - which contains the correlation matrix among regressors. The - default is the identity matrix.
-
Example:
'Sigma', gallery('lehmer',5) -
Data Types:
double
distribX
—distribution to use to simulate the regressors.character.Character which specifies the distribution to use to - simulate the values of the explanatory variables.
- For the list of valid names see MATLAB function random.
- Default is to use the Standard normal distribution.
-
Example:
'distribX', 'Beta' -
Data Types:
double
distribXpars
—parameters of the distribution to use in distribX.vector.Scalar value or array of scalar values containing the - distribution parameters specified in distribX.
-
Example:
'distribXpars', '[0.2 0.6]' -
Data Types:
double
distriby
—distribution to use to simulate the response.character.Character which specifies the distribution to use to - simulate the values of the explanatory variables. The - default is to use the Standard normal distribution.
-
Example:
'distriby', 'Lognormal' -
Data Types:
double
distribypars
—parameters of the distribution to use in distriby.vector.Scalar value or array of scalar values containing the - distribution parameters specified in distriby. For examples - if distriby is 'Lognormal' and 'distribypars' is [2 10], the - errors are generated according to a Log Normal distribution - with parameters mu and sigma respectively equal to 2 and 10.
-
Example:
'distribypars', '[2 10]' -
Data Types:
double
nexpl
—number of explanatory variables.if vector beta is supplied nexpl is equal to length(beta).Similarly if - sigmaX is supplied nexpl is set equal to size(sigmaX,1).
- Note that both nexpl is supplied together with beta and SigmaX it is check that - nexpl =length(beta) = size(SigmaX,1). If options beta and - sigmaX are empty nexpl is set equal to 3.
-
Example:
'distribypars', '[2 10]' -
Data Types:
double
intercept
—value of the intercept to use.boolean.The default value - for intercept is false.
-
Example:
'intercept', true -
Data Types:
boolean
plots
—Plot on the screen.boolean.If plots = true, the yXplot which shows the response - against all the explanatory variables s shown on the - screen. The default value for plots is false, that is no - plot is shown on the screen.
-
Example:
'plots',false -
Data Types:
single | double
pMSOM
—Proportion of MSOM outliers.the default is 10% MSOM contmaination.Example:
'pMSOM',0.25 -
Data Types:
double
pVIOM
—Proportion of VIOM outliers (non-overlapping with MSOM).the default is 10% VIOM contmaination.Example:
'pVIOM',0.25 -
Data Types:
double
shiftMSOMe
—Mean-shift on the error terms for MSOM outliers.default value shiftMSOMe==10.Example:
'shiftMSOMe',-3 -
Data Types:
double
predxMSOM
—Predictors subject to a mean shift by MSOM.it is a p-dimensional vector indexing design matrix columns.Default value is to contaminate only the non-zero - entries of beta_true (excluding the intercept).
-
Example:
'predMSOM',true(2,1) -
Data Types:
boolean
shiftMSOMx
—Mean-shift on the predictor terms for MSOM outliers.default value shiftMSOMx==10.Example:
'shiftMSOMx',3 -
Data Types:
double
inflVIOMe
—Variance-inflation for the errors subject to a VIOM.default value is inflVIOMe==10.Example:
'inflVIOMe',5 -
Data Types:
double
Output Arguments
out
— description StructureStructure which contains the following fields
Value Description y
simulated response. Vector. Column vector of length n - containing the response.
X
simulated regressors. Matrix . Matrix of size - n-times-nexpl containing the values of the regressors.
- - Optional Output (for pVIOM+pMSOM>0):
yc
Contaminated response vector.
Xc
Contaminated response vector.
ind_clean
Indexes for non-outlying cases.
ind_MSOM
Indexes for MSOM outlying cases.
ind_VIOM
Indexes for VIOM outlying cases.
vareps
Variance for the uncontaminated errors.
References
Insolia, L., F. Chiaromonte, and M. Riani (2020a).
- "A Robust Estimation Approach for Mean-Shift and Variance-Inflation Outliers".
+R-squared: 0.26, Adjusted R-Squared: 0.259 +F-statistic vs. constant model: 701, p-value = 0 +
Use prefixed values of R2, beta and intercept. + Set value of R2.
R2=0.92; +beta=[3; 4; 5; 2; 7]; +intercept=true; +n=100000; +out=simulateLM(n,'R2',R2,'beta',beta); +outLM=fitlm(out.X,out.y);
Sim study. + Compare the distribution of values of R2 with data generated from + Normal with those generated from Student T with 5 degrees of freedom.
% Set value of R2. +R2=0.92; +beta=[3; 4; 5; 2; 7; 2; 3]; +nsimul=1000; +R2all=zeros(nsimul,2); +n=100; +df=5; +for j=1:nsimul +% Data generated from Normal. +out=simulateLM(n,'R2',R2,'beta',beta); +outLM=fitlm(out.X,out.y); +R2all(j,1)=outLM.Rsquared.Ordinary; +% Data generated from T(5). +out=simulateLM(n,'R2',R2,'beta',beta,'distriby','T','distribypars',df); +outLM=fitlm(out.X,out.y); +R2all(j,2)=outLM.Rsquared.Ordinary; +end +boxplot(R2all,'Labels',{'Normal', 'T(5)'});
Use SNR and include MSOM (on active features) and VIOM contamination. %% Use SNR and include MSOM (on active features) and VIOM contamination. +SNR=3; +beta=[2, 2, 0, 0]; +intercept=true; +n=100; +out=simulateLM(n,'SNR',SNR,'beta',beta, 'pMSOM', 0.1, 'pVIOM', 0.2, 'plots', 1); +X = out.X; +y = out.y; +outLM=fitlm(X,y); +Xc = out.Xc; +yc = out.yc; +outLM2=fitlm(Xc,yc);Input Arguments
n
— sample size. Scalar.n is a positive integer + which defines the length of the simulated data. For example + if n=100, y will be 100x1 and X will be 100xp.
Data Types:
single| double
Name-Value Pair Arguments
Specify optional comma-separated pairs of
Example:Name,Value
arguments.Name
is the argument name andValue
is the corresponding value.Name
must appear inside single quotes (' '
). You can specify several name and value pair arguments in any order asName1,Value1,...,NameN,ValueN
.
'R2',0.90 +
,'SNR',10 +
,'beta',[3 5 8] +
,'SigmaX', gallery('lehmer',5) +
,'distribX', 'Beta' +
,'distribXpars', '[0.2 0.6]' +
,'distriby', 'Lognormal' +
,'distribypars', '[2 10]' +
,'exactR2', true +
,'nexpl', '[2 10]' +
,'intercept', true +
,'plots',false +
,'pMSOM',0.25 +
,'pVIOM',0.25 +
,'shiftMSOMe',-3 +
,'predxMSOM',true(2,1) +
,'shiftMSOMx',3 +
,'inflVIOMe',5 +
R2
—Squared multiple correlation coefficient (R2).scalar.The + requested value of R2. A number in the interval [0 1] that + specifies the asymptotic requested value of R2. The default is + to simulate regression data with R2=0; Note that the value of + R2 is the one in the population not in the sample in the sense + that if, for example 'R2',00 the sample data (expecially if n + is very small) can have a value which is slightly different + from the prefixed one. If the exact value of R2 is required + then the user has to use option exactR2. See below for further + details.
+
Example:
'R2',0.90 +
Data Types:
double
SNR
—Signal to noise ratio characterizing the simulation.this is defined such that sigma_error == sqrt(var(X_u*beta_true)/SNR).The default is SNR=='' and R2 is used instead.
+
Example:
'SNR',10 +
Data Types:
double
beta
—the values of the beta coefficients.vector.Vector which + contains the values of the regression coefficients. The + default is a vector of ones.
+
Example:
'beta',[3 5 8] +
Data Types:
double
SigmaX
—the correlation matrix.matrix.Positive definite matrix + which contains the correlation matrix among regressors. The + default is the identity matrix.
+
Example:
'SigmaX', gallery('lehmer',5) +
Data Types:
double
distribX
—distribution to use to simulate the regressors.character.Character that specifies the distribution to use to + simulate the values of the explanatory variables.
+ For the list of valid names see MATLAB function random.
+ Default is to use the Standard normal distribution.
+
Example:
'distribX', 'Beta' +
Data Types:
double
distribXpars
—parameters of the distribution to use in distribX.vector.Scalar value or array of scalar values containing the + distribution parameters specified in distribX.
+
Example:
'distribXpars', '[0.2 0.6]' +
Data Types:
double
distriby
—distribution to use to simulate the response.character.Character that specifies the distribution to use to + simulate the values of the response. The + default is to use the Standard normal distribution.
+
Example:
'distriby', 'Lognormal' +
Data Types:
double
distribypars
—parameters of the distribution to use in distriby.vector.Scalar value or array of scalar values containing the + distribution parameters specified in distriby. For examples, + if distriby is 'Lognormal' and 'distribypars' is [2 10], the + errors are generated according to a Log Normal distribution + with parameters mu and sigma respectively equal to 2 and 10.
+
Example:
'distribypars', '[2 10]' +
Data Types:
double
exactR2
—exact value of R2.boolean.If exactR2 is the sample data have the requested value of + R2. The default is exactR2 equal to false, that is just + asymptotically, the sample data have a value of R2 equal + to the one which is specified in option R2.
+
Example:
'exactR2', true +
Data Types:
logical
nexpl
—number of explanatory variables.if vector beta is supplied, nexpl is equal to length(beta).Similarly if + SigmaX is supplied nexpl is set equal to size(SigmaX,1).
+ Note that both nexpl is supplied together with beta and SigmaX it is check that + nexpl =length(beta) = size(SigmaX,1). If options beta and + SigmaX are empty nexpl is set equal to 3.
+
Example:
'nexpl', '[2 10]' +
Data Types:
double
intercept
—value of the intercept to use.boolean.The default value + for intercept is false.
+
Example:
'intercept', true +
Data Types:
boolean
plots
—Plot on the screen.boolean.If plots = true, the yXplot that shows the response + against all the explanatory variables s shown on the + screen. The default value for plots is false, that is no + plot is shown on the screen.
+
Example:
'plots',false +
Data Types:
single | double
pMSOM
—Proportion of MSOM outliers.the default is 10% MSOM contamination.Example:
'pMSOM',0.25 +
Data Types:
double
pVIOM
—Proportion of VIOM outliers (non-overlapping with MSOM).the default is 10% VIOM contamination.Example:
'pVIOM',0.25 +
Data Types:
double
shiftMSOMe
—Mean-shift on the error terms for MSOM outliers.default value shiftMSOMe==10.Example:
'shiftMSOMe',-3 +
Data Types:
double
predxMSOM
—Predictors subject to a mean shift by MSOM.it is a p-dimensional vector indexing design matrix columns.Default value is to contaminate only the non-zero + entries of beta_true (excluding the intercept).
+
Example:
'predxMSOM',true(2,1) +
Data Types:
boolean
shiftMSOMx
—Mean-shift on the predictor terms for MSOM outliers.default value shiftMSOMx==10.Example:
'shiftMSOMx',3 +
Data Types:
double
inflVIOMe
—Variance-inflation for the errors subject to a VIOM.default value is inflVIOMe==10.Example:
'inflVIOMe',5 +
Data Types:
double
Output Arguments
out
— description StructureStructure that contains the following fields:
Value Description y
simulated response. Vector. Column vector of length n + containing the response.
X
simulated regressors. Matrix. Matrix of size + n-times-nexpl containing the values of the regressors.
+ + Optional Output (for pVIOM+pMSOM>0):
yc
Contaminated response vector.
Xc
Contaminated response vector.
ind_clean
Indexes for non-outlying cases.
ind_MSOM
Indexes for MSOM outlying cases.
ind_VIOM
Indexes for VIOM outlying cases.
vareps
Variance for the uncontaminated errors.
References
Insolia, L., F. Chiaromonte, and M. Riani (2020a).
+ "A Robust Estimation Approach for Mean-Shift and Variance-Inflation Outliers".
Festschrift in Honor of R. Dennis Cook pp 17–41.
This page has been automatically generated by our routine publishFSSee Also
0 - + eu = err*sqrt(vareps); Xu = X; @@ -480,7 +518,7 @@ indMSOM = randperm(n, nMSOM); % MSOM (also on X). Xc = Xu; - Xc(indMSOM, predxMSOM) = Xu(indMSOM, predxMSOM) + shiftMSOMx; + Xc(indMSOM, predxMSOM) = Xu(indMSOM, predxMSOM) + shiftMSOMx; ec = eu; ec(indMSOM) = eu(indMSOM) + shiftMSOMe; @@ -497,7 +535,7 @@ indcont = unique([indMSOM, indVIOM]); % clean obs indexes indkeep = setdiff(1:n, indcont); - + if plots==true indunit = zeros(n, 1); indunit(indMSOM) = 1; @@ -513,7 +551,7 @@ out.ind_MSOM = indMSOM; out.ind_VIOM = indVIOM; out.vareps = vareps; - + end out.X = X;