Skip to content

Commit

Permalink
Merge pull request #116 from PSIAIMS/Regression
Browse files Browse the repository at this point in the history
Regression
  • Loading branch information
statasaurus authored Jan 8, 2024
2 parents 3bbc5d4 + 090804e commit 3dca639
Show file tree
Hide file tree
Showing 2 changed files with 77 additions and 19 deletions.
25 changes: 23 additions & 2 deletions R/linear-regression.qmd
Original file line number Diff line number Diff line change
@@ -1,15 +1,36 @@
---
title: "Linear Regression"
output: html_document
date: "2023-04-22"
date: last-modified
date-format: D MMMM, YYYY
---

To demonstrate the use of linear regression we examine a dataset that illustrates the relationship between Height and Weight in a group of 237 teen-aged boys and girls. The dataset is available at (../data/htwt.csv) and is imported to the workspace.

```{r}
### Descriptive Statistics

The first step is to obtain the simple descriptive statistics for the numeric variables of htwt data, and one-way frequencies for categorical variables. This is accomplished by employing summary function. There are 237 participants who are from 13.9 to 25 years old. It is a cross-sectional study, with each participant having one observation. We can use this data set to examine the relationship of participants' height to their age and sex.

```{r setup, include=true}
knitr::opts_chunk$set(echo = TRUE)
htwt<-read.csv("../data/htwt.csv")
summary(htwt)
```

In order to create a regression model to demonstrate the relationship between age and height for females, we first need to create a flag variable identifying females and an interaction variable between age and female gender flag.

```{r}
htwt$female <- ifelse(htwt$SEX=='f',1,0)
htwt$fem_age <- htwt$AGE * htwt$female
head(htwt)
```
### Regression Analysis
Next, we fit a regression model, representing the relationships between gender, age, height and the interaction variable created in the datastep above. We again use a where statement to restrict the analysis to those who are less than or equal to 19 years old. We use the clb option to get a 95% confidence interval for each of the parameters in the model. The model that we are fitting is ***height = b0 + b1 x female + b2 x age + b3 x fem_age + e***
```{r setup, include=true}
regression<-lm(HEIGHT~female+AGE+fem_age, data=htwt, AGE<=19)
summary(regression)
```

From the coefficients table b0,b1,b2,b3 are estimated as b0=28.88 b1=13.61 b2=2.03 b3=-0.92942

The resulting regression model for height, age and gender based on the available data is ***height=28.8828 + 13.6123 x female + 2.0313 x age -0.9294 x fem_age***
71 changes: 54 additions & 17 deletions SAS/linear-regression.qmd
Original file line number Diff line number Diff line change
@@ -1,32 +1,69 @@
---
title: "Linear Regression"
output: html_document
date: "2023-04-22"
date: last-modified
date-format: D MMMM, YYYY
---

Import sas dataset proc import out= WORK.htwt datafile= "C:\Documents and Settings\kwelch\Desktop\b510\htwt.sav" DBMS=SAV REPLACE; run;
To demonstrate the use of linear regression we examine a dataset that illustrates the relationship between Height and Weight in a group of 237 teen-aged boys and girls. The dataset is available at (../data/htwt.csv) and is imported to sas using proc import procedure.

### Descriptive Statistics

title "Descriptive Statistics for HTWT Data Set"; proc means data=htwt; run;
The first step is to obtain the simple descriptive statistics for the numeric variables of htwt data, and one-way frequencies for categorical variables. This is accomplished by employing proc means and proc freq procedures There are 237 participants who are from 13.9 to 25 years old. It is a cross-sectional study, with each participant having one observation. We can use this data set to examine the relationship of participants' height to their age and sex.

**Output** Descriptive Statistics for HTWT Data Set\
The MEANS Procedure
```{r eval=FALSE}
proc means data=htwt;
run;
Descriptive Statistics for HTWT Data Set
The MEANS Procedure
Variable Label N Mean Std Dev Minimum Maximum
-----------------------------------------------------------------------------
AGE AGE 237 16.4430380 1.8425767 13.9000000 25.0000000
HEIGHT HEIGHT 237 61.3645570 3.9454019 50.5000000 72.0000000
WEIGHT WEIGHT 237 101.3080169 19.4406980 50.5000000 171.5000000
----------------------------------------------------------------------------
```

## Variable Label N Mean Std Dev Minimum Maximum
```{r eval=FALSE}
proc freq data=htwt;
tables sex;
run;
AGE AGE 237 16.4430380 1.8425767 13.9000000 25.0000000 HEIGHT HEIGHT 237 61.3645570 3.9454019 50.5000000 72.0000000 WEIGHT WEIGHT 237 101.3080169 19.4406980 50.5000000 171.5000000 ----------------------------------------------------------------------------
Oneway Frequency Tabulation for Sex for HTWT Data Set
The FREQ Procedure
**Create a new data set with new variables** data htwt2; set htwt;
Cumulative Cumulative
SEX Frequency Percent Frequency Percent
-------------------------------------------------------------
f 111 46.84 111 46.84
m 126 53.16 237 100.00
```

**Create dummy variables for female** if sex="f" then female=1; if sex="m" then female=0;
In order to create a regression model to demonstrate the relationship between age and height for females, we first need to create a flag variable identifying females and an interaction variable between age and female gender flag.

**Create interaction** fem_age = female \* age;\
```{r eval=FALSE}
data htwt2;
set htwt;
if sex="f" then female=1;
if sex="m" then female=0;
*model to demonstrate interaction between female gender and age;
fem_age = female * age;
run;
```
### Regression Analysis
Next, we fit a regression model, representing the relationships between gender, age, height and the interaction variable created in the datastep above. We again use a where statement to restrict the analysis to those who are less than or equal to 19 years old. We use the clb option to get a 95% confidence interval for each of the parameters in the model. The model that we are fitting is ***height = b0 + b1 x female + b2 x age + b3 x fem_age + e***

title "ANCOVA for Males and Females"; title2 "Relationship of Height to Age"; proc reg data=htwt2; where age \<=19; model height = female age fem_age / clb; quit;
```{r eval=FALSE}
proc reg data=htwt2;
where age <=19;
model height = female age fem_age / clb;
run; quit;
Model: MODEL1 Dependent Variable: HEIGHT
```
Number of Observations Read 219
Number of Observations Used 219
Expand All @@ -46,8 +83,8 @@ Model: MODEL1 Dependent Variable: HEIGHT

We examine the parameter estimates in the output below.

```
Parameter Estimates
```{r eval=FALSE}
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t| 95% Confidence Limits
Intercept 1 28.88281 2.87343 10.05 <.0001 23.21911 34.54650
Expand All @@ -56,6 +93,6 @@ We examine the parameter estimates in the output below.
fem_age 1 -0.92943 0.24782 -3.75 0.0002 -1.41791 -0.44096
```

The model that we are fitting is: height=b0 + b1 x female + b2 x age + b3 x fem_age + eij height
From the parameter estimates table the coefficients b0,b1,b2,b3 are estimated as b0=28.88 b1=13.61 b2=2.03 b3=-0.92942

b0=28.88 b1=13.61 b2=2.03 b3=-0.92942
The resulting regression model for height, age and gender based on the available data is ***height=28.88281 + 13.61231 x female + 2.03130 x age -0.92943 x fem_age***

0 comments on commit 3dca639

Please sign in to comment.