diff --git a/Comp/r-sas_ci_for_prop.qmd b/Comp/r-sas_ci_for_prop.qmd index a1a69b68..81402807 100644 --- a/Comp/r-sas_ci_for_prop.qmd +++ b/Comp/r-sas_ci_for_prop.qmd @@ -6,37 +6,91 @@ execute: ## Introduction -There are several ways to calculate a confidence interval (CI) for a proportion. You need to select the method based on if you have a 1 sample proportion (1 proportion calculated from 1 group of subjects), or if you have 2 samples and you want a CI for the difference in the 2 proportions. The difference in proportion can come from either 2 independent samples (different subjects in each of the 2 groups), or can be matched (the same subject with 1 result in 1 group and 1 result in the other group [paired data]). +The methods to use for calculating a confidence interval (CI) for a proportion depend on the type of proportion you have. + +- 1 sample proportion (1 proportion calculated from 1 group of subjects) + +- 2 sample proportions and you want a CI for the difference in the 2 proportions. + + - If the 2 samples come from 2 independent samples (different subjects in each of the 2 groups) + + - If the 2 samples are matched (i.e. the same subject has 2 results, one on each group \[paired data\]). The method selected is also dependent on whether your proportion is close to 0 or 1 (or near to the 0.5 midpoint), and your sample size. -For more technical derivation and reasons for use for each of the methods listed below, see the corresponding [SAS page](https://psiaims.github.io/CAMIS/SAS/ci_for_prop.html). +For more technical derivation and reasons why you would use one method above another see the corresponding [SAS page](https://psiaims.github.io/CAMIS/SAS/ci_for_prop.html). + +The tables below provide an overview of findings from R & SAS, for calculation of CIs, for a Single Sample Proportion and for calculation of a difference between 2 matched pair proportions or 2 independent sample proportions. ## General Comparison Table For Single Sample Proportions -The following table provides an overview of the results comparing between R and SAS for Single Sample Proportions and independent 2 sample proportions . See the corresponding [SAS page](https://psiaims.github.io/CAMIS/SAS/ci_for_prop.html) and [R page](https://psiaims.github.io/CAMIS/R/ci_for_prop.html) for results showing a single set of data run through both SAS and R. +See the corresponding [SAS page](https://psiaims.github.io/CAMIS/SAS/ci_for_prop.html) and [R page](https://psiaims.github.io/CAMIS/R/ci_for_prop.html) for results showing a single set of data run through both SAS and R. ++--------------------------------------------------------------------+----------------+------------------+----------------------------------------+ | Analysis of One Sample Proportion | Supported in R | Supported in SAS | Results Match | -|-----------------|-----------------|-----------------|---------------------| ++====================================================================+================+==================+========================================+ | Clopper-Pearson Exact | Yes {cardx} | Yes (default) | Yes | ++--------------------------------------------------------------------+----------------+------------------+----------------------------------------+ | Normal approximation (Wald Method) | Yes {cardx} | Yes (default) | Yes | ++--------------------------------------------------------------------+----------------+------------------+----------------------------------------+ | Normal approximation (Wald Method) with continuity correction | Yes {cardx} | Yes | Yes | ++--------------------------------------------------------------------+----------------+------------------+----------------------------------------+ | Wilson (Score, Altman, Newcombe) method | Yes {cardx} | Yes | Yes | ++--------------------------------------------------------------------+----------------+------------------+----------------------------------------+ | Wilson (Score, Altman, Newcombe) method with continuity correction | Yes {cardx} | Yes | Yes | ++--------------------------------------------------------------------+----------------+------------------+----------------------------------------+ | Agresti Coull | Yes {cardx} | Yes | Yes | ++--------------------------------------------------------------------+----------------+------------------+----------------------------------------+ | Jeffreys Bayesian HPD | Yes {cardx} | Yes | Yes | ++--------------------------------------------------------------------+----------------+------------------+----------------------------------------+ | midp | Yes {PropCIs} | Yes | results match to the 3rd decimal place | ++--------------------------------------------------------------------+----------------+------------------+----------------------------------------+ | Blaker | Yes {PropCIs} | Yes | results match to the 5th decimal place | ++--------------------------------------------------------------------+----------------+------------------+----------------------------------------+ | Wilson Stratified score | Yes {cardx} | No | NA | ++--------------------------------------------------------------------+----------------+------------------+----------------------------------------+ + +## General Comparison Table For Two Matched Samples Proportions + ++------------------------------------------------------+-------------------+---------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Analysis of Two Matched Sample Proportions | Supported in R | Supported in SAS | Notes | ++======================================================+===================+=============================================================================================+===========================================================================================================================================================+ +| Exact method | Yes {ExactCIdiff} | No | | ++------------------------------------------------------+-------------------+---------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Normal approximation (Wald Method) | No | No (proc freq does CIs for the risk difference, not the difference between two proportions) | Using the equations provided in the [SAS page](https://psiaims.github.io/CAMIS/SAS/ci_for_prop.html), You could do this programatically in either package | ++------------------------------------------------------+-------------------+---------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ +| Wilson (Score method or the Altman, Newcombe method) | No | No (proc freq does CIs for the risk difference, not the difference between two proportions) | Using the equations provided in the [SAS page](https://psiaims.github.io/CAMIS/SAS/ci_for_prop.html), You could do this programatically in either package | ++------------------------------------------------------+-------------------+---------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ + +Calculating the Normal approximation and Wilson methods by hand and comparing it to the Exact method gave similar results for the 1 example demonstrated indicating as long as the proportion of responders is not close to 0 or 1, then the faster computation of the approximation methods may be easier to implement than the exact method and produce similar results. Hence {ExactCIdiff} is not recommended for most scenarios. + ++-------------+----------------------------------------------------------------------------------------------+--------------+--------------+ +| Method Name | Calculated Using matched pair example from R & SAS pages | Lower 95% CI | Upper 95% CI | ++=============+==============================================================================================+==============+==============+ +| Exact | R | -0.00339 | 0.38065 | ++-------------+----------------------------------------------------------------------------------------------+--------------+--------------+ +| Normal | by hand using equation from [SAS page](https://psiaims.github.io/CAMIS/SAS/ci_for_prop.html) | 0.00911 | 0.38289 | ++-------------+----------------------------------------------------------------------------------------------+--------------+--------------+ +| Wilson | by hand using equation from [SAS page](https://psiaims.github.io/CAMIS/SAS/ci_for_prop.html) | 0.00032 | 0.36739 | ++-------------+----------------------------------------------------------------------------------------------+--------------+--------------+ ## General Comparison Table For Two Independent Samples Proportions -| Analysis of Two independant Sample Proportions | Supported in R | Supported in SAS | Results Match | -|-----------------|-----------------|-----------------|---------------------| -| Normal approximation (Wald Method) | Yes {cardx} `ard_stats_prop_test function` uses `stats::prop.test` | Yes (default) | Yes, but documentation for stats::prop.test says it's using newcombe method so more research needed given results match wald method? | -| Normal approximation (Wald Method) with continuity correction | Yes {cardx} `ard_stats_prop_test function` uses `stats::prop.test` | Yes | Unknown. | -| Wilson (Score, Altman, Newcombe) method | Unknown - more research needed | Yes | SAS results match by hand calculation | -| Wilson (Score, Altman, Newcombe) method with continuity correction | Unknown - more research needed | Yes | SAS results match by hand calculation | ++--------------------------------------------------------------------+--------------------------------------------------------------------+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+ +| Analysis of Two Independant Sample Proportions | Supported in R | Supported in SAS | Results Match | ++====================================================================+====================================================================+==================+==================================================================================================================================================+ +| Normal approximation (Wald Method) | Yes {cardx} `ard_stats_prop_test function` uses `stats::prop.test` | Yes (default) | Yes and results match by hand calculation | +| | | | | +| | | | Note that documentation for stats::prop.test says it's using newcombe method. However, the results match the Normal Approximation (wald) method. | ++--------------------------------------------------------------------+--------------------------------------------------------------------+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+ +| Normal approximation (Wald Method) with continuity correction | Yes {cardx} as per above but with correct=TRUE | Yes | Yes | +| | | | | +| | | | Note that documentation for stats::prop.test says it's using newcombe method. However, the results match the Normal Approximation (wald) method. | ++--------------------------------------------------------------------+--------------------------------------------------------------------+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+ +| Wilson (Score, Altman, Newcombe) method | No | Yes | SAS results match by hand calculation | ++--------------------------------------------------------------------+--------------------------------------------------------------------+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+ +| Wilson (Score, Altman, Newcombe) method with continuity correction | No | Yes | SAS results match by hand calculation | ++--------------------------------------------------------------------+--------------------------------------------------------------------+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+ ## Prerequisites: R Packages diff --git a/R/ci_for_prop.qmd b/R/ci_for_prop.qmd index d85f5694..f1fd0463 100644 --- a/R/ci_for_prop.qmd +++ b/R/ci_for_prop.qmd @@ -4,7 +4,15 @@ title: "Confidence Intervals for Proportions in R" ## Introduction -There are several ways to calculate a confidence interval (CI) for a proportion. You need to select the method based on if you have a 1 sample proportion (1 proportion calculated from 1 group of subjects), or if you have 2 samples and you want a CI for the difference in the 2 proportions. The difference in proportion can come from either 2 independent samples (different subjects in each of the 2 groups), or can be matched (the same subject with 1 result in 1 group and 1 result in the other group \[paired data\]). +The methods to use for calculating a confidence interval (CI) for a proportion depend on the type of proportion you have. + +- 1 sample proportion (1 proportion calculated from 1 group of subjects) + +- 2 sample proportions and you want a CI for the difference in the 2 proportions. + + - If the 2 samples come from 2 independent samples (different subjects in each of the 2 groups) + + - If the 2 samples are matched (i.e. the same subject has 2 results, one on each group \[paired data\]). The method selected is also dependent on whether your proportion is close to 0 or 1 (or near to the 0.5 midpoint), and your sample size. @@ -48,9 +56,11 @@ adcibc %>% **The {cardx} package** is an extension of the {cards} package, providing additional functions to create Analysis Results Data Objects (ARDs)^1^. It was developed as part of {NEST} and pharmaverse. This package requires the binary endpoint to be a logical (TRUE/FALSE) vector or a numeric/integer coded as (0, 1) with 1 (TRUE) being the success you want to calculate the confidence interval for. +See [here](R:%20Functions%20for%20Calculating%20Proportion%20Confidence%20Intervals) for full description of the {cardx} proportions equations. + If calculating the CI for a difference in proportions, the package requires both the response and the treatment variable to be numeric/integer coded as (0, 1) (or logical vector). -Instead of the code presented below, you can use `ard_categorical_ci(data, variables=resp, method ='wilson')` for example. This invokes the code below but returns an analysis results dataset format as the output. Methods included are waldcc, wald, clopper-pearson, wilson, wilsoncc, strat_wilson, strat_wilsoncc, agresti-coull and jeffreys. +Instead of the code presented below, you can use `ard_categorical_ci(data, variables=resp, method ='wilson')` for example. This invokes the code below but returns an analysis results dataset (ARD) format as the output. Methods included are waldcc, wald, clopper-pearson, wilson, wilsoncc, strat_wilson, strat_wilsoncc, agresti-coull and jeffreys for one-sample proportions and methods for 2 independent samples, however currently does not have a method for 2 matched proportions. Code example: `proportion_ci_clopper_pearson(,conf.level=0.95) %>% as_tibble()` @@ -83,11 +93,11 @@ Code example for Clopper-pearson:\ 1) x = 0 (0% responders), in which case the lower limit does not match. 2) x = n (100% responders), in which case the upper limit does not match. -Because of the relationship between the binomial distirbution and the beta distribution. This package uses quantiles of the beta distribution to derive exact confidence intervals. +Because of the relationship between the binomial distribution and the beta distribution. This package uses quantiles of the beta distribution to derive exact confidence intervals. $$ B(\alpha/2;x, n-x+1) < p < B(1-\alpha/2; x+1, n-x)$$ -RBesT equations are: \ +RBesT equations are:\ pLow \<- qbeta(Low, r + (r == 0), n - r + 1)\ pHigh \<- qbeta(High, r + 1, n - r + ((n - r) == 0)) @@ -97,7 +107,9 @@ pHigh \<- qbeta(High, r + 1, n - r) `BinaryExactCI(x= , n=,alpha=0.05)` -It is currently unclear why the RBesT script has the logical conditions (r==0) and ((n-r)==0. Therefore, we currently do not recommend using this package and suggest cardx or Hmisc is used instead. +It is currently unclear why the RBesT script has the logical conditions (r==0) and ((n-r)==0. Therefore, we currently do not recommend using this package and suggest cardx or Hmisc is used instead. For updates about this, see the [issue](https://github.com/Novartis/RBesT/issues/21) + +**The {ExactCIdiff} package** produces CIs for two dependent proportions (matched pairs) and two independent proportions (unmatched pairs). ## Methods for Calculating Confidence Intervals for a single proportion using cardx @@ -199,11 +211,59 @@ proportion_ci_jeffreys(act2,conf.level=0.95) %>% ``` ``` -## Methods for Calculating Confidence Intervals for a matched pair proportion +## Methods for Calculating Confidence Intervals for a matched pair proportion using {ExactCIdiff} + +For more information about the detailed methods for calculating confidence intervals for a matched pair proportion see [here](https://psiaims.github.io/CAMIS/SAS/ci_for_prop.html#methods-for-calculating-confidence-intervals-for-a-matched-pair-proportion). When you have 2 measurements on the same subject, the 2 sets of measures are not independent and you have matched pair of responses. + +To date we have not found an R package which calculates a CI for matched pair proportions using the normal approximation or Wilson methods although they can be done by hand using the equations provided on the SAS page link above. + +**The {ExactCIdiff} package** produces exact CIs for two dependent proportions (matched pairs), claiming to be the first package in R to do this method. However, it should only be used when the sample size is not too large as it can be computationally intensive.\ +NOTE that the {ExactNumCI} package should not be used for this task. More detail on these two packages can be found [here](RJ-2013-026.pdf). + +Using a cross over study as our example, a 2 x 2 table can be formed as follows: + ++-----------------------+---------------+---------------+--------------+ +| | Placebo\ | Placebo\ | Total | +| | Response= Yes | Response = No | | ++=======================+===============+===============+==============+ +| Active Response = Yes | r | s | r+s | ++-----------------------+---------------+---------------+--------------+ +| Active Response = No | t | u | t+u | ++-----------------------+---------------+---------------+--------------+ +| Total | r+t | s+u | N = r+s+t+u | ++-----------------------+---------------+---------------+--------------+ + +: The proportions of subjects responding on each treatment are: + +Active: $\hat p_1 = (r+s)/n$ and Placebo: $\hat p_2= (r+t)/n$ -When you have 2 measurements on the same subject, the 2 sets of measures are not independent and you have matched pair of responses. +Difference between the proportions for each treatment are: $D=p1-p2=(s-t)/n$ -This section is work in progress. For more information about methods for calculating confidence intervals for a matched pair proportion see [here](https://psiaims.github.io/CAMIS/SAS/ci_for_prop.html#methods-for-calculating-confidence-intervals-for-a-matched-pair-proportion) +Suppose : + ++-----------------------+---------------+---------------+------------------+ +| | Placebo\ | Placebo\ | Total | +| | Response= Yes | Response = No | | ++=======================+===============+===============+==================+ +| Active Response = Yes | r = 20 | s = 15 | r+s = 35 | ++-----------------------+---------------+---------------+------------------+ +| Active Response = No | t = 6 | u = 5 | t+u = 11 | ++-----------------------+---------------+---------------+------------------+ +| Total | r+t = 26 | s+u = 20 | N = r+s+t+u = 46 | ++-----------------------+---------------+---------------+------------------+ + +Active: $\hat p_1 = (r+s)/n$ =35/46 =0.761 and Placebo: $\hat p_2= (r+t)/n$ = 26/46 =0.565 + +Difference = 0.761-0.565 = 0.196, then PairedCI() function can provide an exact confidence interval as shown below + +(-0.00339 to 0.38065) + +```{r} +#| eval: FALSE +#ExactCIdiff: PairedCI(s, r+u, t, conf.level = 0.95) +CI<-PairedCI(15, 25, 6, conf.level = 0.95)$ExactCI +CI +``` ## Methods for Calculating Confidence Intervals for 2 independent samples proportion using {cardx} @@ -215,23 +275,37 @@ For more technical information see the corresponding [SAS page](https://psiaims. #### Example code -`cardx::ard_stats_prop_test function` uses `stats::prop.test` which also allows a continuity correction to be applied. More research is needed into this method. +`cardx::ard_stats_prop_test function` uses `stats::prop.test` which also allows a continuity correction to be applied. -Although this website [here](https://rdrr.io/r/stats/prop.test.html) and this one [here](https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/prop.test) both reference Newcombe for the CI that this function uses, replication of the results by hand and compared to SAS show that the results below match the Normal Approximation (Wald method) not the Newcome method? Further research is needed into this topic. +Although this website [here](https://rdrr.io/r/stats/prop.test.html) and this one [here](https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/prop.test) both reference Newcombe for the CI that this function uses, replication of the results by hand and compared to SAS show that the results below match the Normal Approximation (Wald method). Both the Treatment variable (ACT,PBO) and the Response variable (Yes,No) have to be numeric (0,1) or Logit (TRUE,FALSE) variables. -```{R} -adcibc2<-select(adcibc,trtn,respn) -cardx::ard_stats_prop_test(data=adcibc2, by=trtn, variables=respn, conf.level = 0.95, correct=FALSE) -cardx::ard_stats_prop_test(data=adcibc2, by=trtn, variables=respn, conf.level = 0.95, correct=TRUE) +The prop.test default with 2 groups, is the null hypothesis that the proportions in each group are the same and a 2-sided CI. + +```{r} + +indat1<- adcibc2 %>% + select(AVAL,TRTP) %>% + mutate(resp=if_else(AVAL>4,"Yes","No")) %>% + mutate(respn=if_else(AVAL>4,1,0)) %>% + mutate(trt=if_else(TRTP=="Placebo","PBO","ACT"))%>% + mutate(trtn=if_else(TRTP=="Placebo",1,0)) %>% + select(trt,trtn,resp, respn) + +# cardx package required a vector with 0 and 1s for a single proportion CI +# To get the comparison the correct way around Placebo must be 1, and Active 0 + +indat<- select(indat1, trtn,respn) +cardx::ard_stats_prop_test(data=indat, by=trtn, variables=respn, conf.level = 0.95, correct=FALSE) +cardx::ard_stats_prop_test(data=indat, by=trtn, variables=respn, conf.level = 0.95, correct=TRUE) ``` ### Wilson Method (Also known as the Score method or the Altman, Newcombe method^3^ ) For more technical information see the corresponding [SAS page](https://psiaims.github.io/CAMIS/SAS/ci_for_prop.html). -Further research is needed into this topic. +We have not yet found a package in R which produces these confidence intervals for 2 independent proportions. ## References diff --git a/R/logistic_regr.qmd b/R/logistic_regr.qmd index c40d744f..45352e58 100644 --- a/R/logistic_regr.qmd +++ b/R/logistic_regr.qmd @@ -1,5 +1,5 @@ --- -title: "Logistic Regression" +title: "Logistic Regression in R" --- ```{r} @@ -8,7 +8,7 @@ title: "Logistic Regression" library(tidyverse) ``` -In binary logistic regression, there is a single binary dependent variable, coded by an indicator variable. For example, if we respresent a response as 1 and non-response as 0, then the corresponding probability of response, can be between 0 (certainly not a response) and 1 (certainly a response) - hence the labeling ! +In binary logistic regression, there is a single binary dependent variable, coded by an indicator variable. For example, if we represent a response as 1 and non-response as 0, then the corresponding probability of response, can be between 0 (certainly not a response) and 1 (certainly a response) - hence the labeling ! The logistic model models the log-odds of an event as a linear combination of one or more independent variables (explanatory variables). If we observed $(y_i, x_i),$ where $y_i$ is a Bernoulli variable and $x_i$ a vector of explanatory variables, the model for $\pi_i = P(y_i=1)$ is @@ -31,7 +31,9 @@ glimpse(lung) # Model Fit -We analyze the weight loss in lung cancer patients in dependency of age, sex, ECOG performance score and calories consumed at meals. +We analyze the event of weight gain (or staying the same weight) in lung cancer patients in dependency of age, sex, ECOG performance score and calories consumed at meals. In the original data, a positive number for the `wt.loss` variable is a weight loss, negative number is a gain. We start by dichotomising the response such that a result \>0 is a weight loss, \<= weight gain and creating a factor variable `wt_grp`. + +One of the most important things to remember is to ensure you tell R what your event is ! We want to model Events / Non-events, and hence your reference category for `wt_grp` dichotomous variable below is the weight loss level. Therefore, by telling R that your reference category is weight loss, you are effectively telling R that your Event = Weight Gain ! ```{r} lung2 <- survival::lung %>% @@ -39,6 +41,8 @@ lung2 <- survival::lung %>% wt_grp = factor(wt.loss > 0, labels = c("weight loss", "weight gain")) ) +#specify that weight loss should be used as baseline level (i.e we want to model weight gain as the event) +lung2$wt_grp <- relevel(lung2$wt_grp, ref='weight loss') m1 <- glm(wt_grp ~ age + sex + ph.ecog + meal.cal, data = lung2, family = binomial(link="logit")) summary(m1) @@ -53,15 +57,17 @@ exp(confint(m1)) # Model Comparison -To compare two logistic models, one tests the difference in residual variances from both models using a $\chi^2$-distribution with a single degree of freedom (here at the $5$% level): +To compare two logistic models, the `residual deviances` (-2 \* log likelihoods) are compared against a $\chi^2$-distribution with degrees of freedom calculated using the difference in the two models' parameters. Below, the only difference is the inclusion/exclusion of age in the model, hence we test using $\chi^2$ with 1 df. Here testing at the $5$% level. ```{r} m2 <- glm(wt_grp ~ sex + ph.ecog + meal.cal, data = lung2, family = binomial(link="logit")) summary(m2) -anova(m1, m2, test = "Chisq") +anova(m1, m2, test = "LRT") ``` +Stackexchange [here](https://stats.stackexchange.com/questions/59879/logistic-regression-anova-chi-square-test-vs-significance-of-coefficients-ano) has a good article describing this method and the difference between comparing 2 models using the likelihood ratio tests versus using wald tests and Pr\>chisq (from the maximum likelihood estimate). Note: `anova(m1, m2, test = "Chisq")` and using `test="LRT"` as above are synonymous in this context. + # Prediction Predictions from the model for the log-odds of a patient with new data to experience a weight loss are derived using `predict()`: diff --git a/SAS/ci_for_prop.qmd b/SAS/ci_for_prop.qmd index 55df0756..efefbcc5 100644 --- a/SAS/ci_for_prop.qmd +++ b/SAS/ci_for_prop.qmd @@ -4,7 +4,15 @@ title: "Confidence intervals for Proportions in SAS" ## Introduction -There are several ways to calculate a confidence interval (CI) for a proportion. You need to select the method based on if you have a 1 sample proportion (e.g 1 proportion calculated from 1 group of subjects), or if you have 2 samples and you want a CI for the difference in the 2 proportions. The difference in proportion can come from either 2 independent samples (e.g different subjects in each of the 2 groups), or can be matched (e.g the same subject with 1 result in 1 group and 1 result in the other group \[paired data\]). +The methods to use for calculating a confidence interval (CI) for a proportion depend on the type of proportion you have. + +- 1 sample proportion (1 proportion calculated from 1 group of subjects) + +- 2 sample proportions and you want a CI for the difference in the 2 proportions. + + - If the 2 samples come from 2 independent samples (different subjects in each of the 2 groups) + + - If the 2 samples are matched (i.e. the same subject has 2 results, one on each group \[paired data\]). The method selected is also dependent on whether your proportion is close to 0 or 1 (or near to the 0.5 midpoint), and your sample size. @@ -214,16 +222,16 @@ In all these cases, the calculated proportions for the 2 groups are not independ Using a cross over study as our example, a 2 x 2 table can be formed as follows: -+-----------------------+---------------+---------------+---------------+ -| | Placebo\ | Placebo\ | Total | -| | Response= Yes | Response = No | | -+=======================+===============+===============+===============+ -| Active Response = Yes | r | s | r+s | -+-----------------------+---------------+---------------+---------------+ -| Active Response = No | t | u | t+u | -+-----------------------+---------------+---------------+---------------+ -| Total | r+t | s+u | N = r+s+t+u | -+-----------------------+---------------+---------------+---------------+ ++-----------------------+---------------+---------------+--------------+ +| | Placebo\ | Placebo\ | Total | +| | Response= Yes | Response = No | | ++=======================+===============+===============+==============+ +| Active Response = Yes | r | s | r+s | ++-----------------------+---------------+---------------+--------------+ +| Active Response = No | t | u | t+u | ++-----------------------+---------------+---------------+--------------+ +| Total | r+t | s+u | N = r+s+t+u | ++-----------------------+---------------+---------------+--------------+ The proportions of subjects responding on each treatment are: @@ -231,6 +239,19 @@ Active: $\hat p_1 = (r+s)/n$ and Placebo: $\hat p_2= (r+t)/n$ Difference between the proportions for each treatment are: $D=p1-p2=(s-t)/n$ +Suppose : + ++-----------------------+---------------+---------------+------------------+ +| | Placebo\ | Placebo\ | Total | +| | Response= Yes | Response = No | | ++=======================+===============+===============+==================+ +| Active Response = Yes | r = 20 | s = 15 | r+s = 35 | ++-----------------------+---------------+---------------+------------------+ +| Active Response = No | t = 6 | u = 5 | t+u = 11 | ++-----------------------+---------------+---------------+------------------+ +| Total | r+t = 26 | s+u = 20 | N = r+s+t+u = 46 | ++-----------------------+---------------+---------------+------------------+ + ### Normal Approximation Method (Also known as the Wald or asymptotic CI Method) In large random samples from independent trials, the sampling distribution of the difference between two proportions approximately follows the normal distribution. Hence the SE for the difference and 95% confidence interval can be calculated using the following equations. @@ -241,6 +262,14 @@ $D-z_\alpha * SE(D)$ to $D+z_\alpha * SE(D)$ where $z_\alpha$ is the $1-\alpha/2$ quantile of a standard normal distribution corresponding to level $\alpha$, +D=(15-6) /46 = 0.196 + +SE(D) = 1/ 46 \* sqrt (15+6- (((15+6)\^2)/46) ) = 0.0956 + +Lower CI= 0.196 - 1.96 \*0.0956 = 0.009108 + +Upper CI = 0.196 + 1.96 \* 0.0956 = 0.382892 + ### Wilson Method (Also known as the Score method or the Altman, Newcombe method^7^ ) Derive the confidence intervals using the Wilson Method equations above for each of the individual single samples 1 and 2. @@ -257,41 +286,71 @@ Otherwise we calculate A, B and C, and $\phi=C / sqrt A$ In the above: $A=(r+s)(t+u)(r+t)(s+u)$ and $B=(ru-st)$ -To calculate C follow the table below. +To calculate C follow the table below. n=sample size. -+---------------------------+----------------+ | Condition of B | Set C equal to | -+===========================+================+ +|---------------------------|----------------| | If B is greater than n/2 | B - n/2 | -+---------------------------+----------------+ | If B is between 0 and n/2 | 0 | -+---------------------------+----------------+ | If B is less than 0 | B | -+---------------------------+----------------+ Let D = p1-p2 (the difference between the observed proportions of responders) -The Confidence interval for the difference between two population proportions is: $D - sqrt((p_1-l_1)^2)-2\phi(p_1-l_1)(u_2-p_2)+(u_2-p_2)^2 )$ to +The Confidence interval for the difference between two population proportions is: $D - sqrt((p_1-l_1)^2-2\phi(p_1-l_1)(u_2-p_2)+(u_2-p_2)^2 )$ to + +$D + sqrt((p_2-l_2)^2-2\phi(p_2-l_2)(u_1-p_1)+(u_1-p_1)^2 )$ + +First using the Wilson Method equations for each of the individual single samples 1 and 2. + +| | Active | Placebo | +|----------|------------|------------| +| a | 73.842 | 55.842 | +| b | 13.728 | 11.974 | +| c | 99.683 | 99.683 | +| Lower CI | 0.603 = L1 | 0.440 = L2 | +| Upper CI | 0.878 = U1 | 0.680 = U2 | + +$A=(r+s)(t+u)(r+t)(s+u)$ = 9450000 -$D + sqrt((p_2-l_2)^2)-2\phi(p_2-l_2)(u_1-p_1)+(u_1-p_1)^2 )$ +B=10 + +C= 0 (as B is between 0 and n/2) + +$\phi$ = 0. + +Hence the middle part of the equation simplies to 0, and becomes simply: + +Lower CI = $D - sqrt((p_1-l_1)^2+(u_2-p_2)^2 )$ = 0.196 - sqrt \[ (0.761-0.603)\^2 + (0.680-0.565) \^2 \] + +Upper CI = $D + sqrt((p_2-l_2)^2+(u_1-p_1)^2 )$ = 0.196 + sqrt \[ (0.565-0.440)\^2 + (0.878-0.761) \^2 \] + +CI= 0.00032 to 0.367389 ## Example Code using PROC FREQ -SAS Proc Freq has 3 methods for analysis of paired data (Common risk difference). +Unfortunately, SAS does not have a procedure which outputs the confidence intervals for matched proportions. -The default method is Mantel-Haenszel confidence limits. SAS can also Score (Miettinen-Nurminen) CIs and Stratified Newcombe CIs (constructed from stratified Wilson Score CIs). +Instead it calculates the risk difference = (r / (r+s) - t / (t+u) ) -See [here](https://support.sas.com/documentation/cdl/en/procstat/67528/HTML/default/viewer.htm#procstat_freq_details63.htm) for equations. +Which in the example above is: 20/ (20+15) - 6 / (6+5) = 0.0296 -```{R} -#| eval: false +This is more applicable when you have exposed / non-exposed groups looking at who has experienced the outcome. SAS Proc Freq has 3 methods for analysis of paired data using a Common risk difference. + +The default method is Mantel-Haenszel confidence limits. SAS can also Score (Miettinen-Nurminen) CIs and Stratified Newcombe CIs (constructed from stratified Wilson Score CIs). See [here](https://support.sas.com/documentation/cdl/en/procstat/67528/HTML/default/viewer.htm#procstat_freq_details63.htm) for equations. +As you can see below, this is not a CI for difference in proportions of 0.196, it is a CI for the risk difference of 0.0260. So must be interpreted with much consideration. -proc freq data=adcibc order=data; -table trt*resp/commonriskdiff(cl=MH NEWCOMBE); +```{R} +#| eval: false +proc freq data=adcibc; +table act*pbo/commonriskdiff(cl=MH); run; ``` +```{r echo=FALSE, fig.align='center', out.width="50%"} +knitr::include_graphics("../images/ci_for_prop/riskdiff_matched.png") +``` + ## Methods for Calculating Confidence Intervals for 2 independent samples proportion This [paper](https://www.lexjansen.com/wuss/2016/127_Final_Paper_PDF.pdf)^8^ described many methods for the calculation of confidence intervals for 2 independent proportions. The most commonly used are: Wald with continuity correction and Wilson with continuity correction. The Wilson method may be more applicable when sample sizes are smaller and/or the proportion is closer to 0 or 1. @@ -330,7 +389,7 @@ Let l2 = Lower CI for sample 2, and u2 be the upper CI for sample 2. Let D = p1-p2 (the difference between the observed proportions) -The Confidence interval for the difference between two population proportions is: $$ D - sqrt((p_1-l_1)^2)+(u_2-p_2)^2 )\quad to\quad D + sqrt((p_2-l_2)^2)+(u_1-p_1)^2 ) $$ +The Confidence interval for the difference between two population proportions is: $$ D - sqrt((p_1-l_1)^2+(u_2-p_2)^2) \quad to\quad D + sqrt((p_2-l_2)^2+(u_1-p_1)^2 ) $$ ## Example Code using PROC FREQ @@ -380,4 +439,4 @@ knitr::include_graphics("../images/ci_for_prop/binomial_2sampleCI_CC.png") 7. D. Altman, D. Machin, T. Bryant, M. Gardner (eds). Statistics with Confidence: Confidence Intervals and Statistical Guidelines, 2nd edition. John Wiley and Sons 2000. -8. https://www.lexjansen.com/wuss/2016/127_Final_Paper_PDF.pdf +8. diff --git a/SAS/logistic-regr.qmd b/SAS/logistic-regr.qmd index 67c06764..663499c6 100644 --- a/SAS/logistic-regr.qmd +++ b/SAS/logistic-regr.qmd @@ -153,7 +153,13 @@ meal_cal 1 -0.00083 0.000435 3.6770 0.0552 ---------------------------------------------------------------------------- ``` -NOTE: the chi-square test summary of backward elimination, p=0.6250 is different to the results in R, which gave a difference in deviance of -0.24046, p=0.6239. This difference is currently being investigated. +NOTE: the chi-square test summary of backward elimination, p=0.6250 is different to the results in R, which gave a difference in deviance of -0.24046, p=0.6239. + +This is because R was doing a difference in -2 \* Log Likelihood test comparing the model with sex + ph_ecog + meal_cal + age vs sex + ph_ecog + meal_cal, based on $\chi^2$ distribution with 1 df. + +However, the backward elimination process in SAS uses the residual sums of squares and the F statistic. Starting with the full model, it removes the parameter with the least significant F statistic until all effects in the model have F statistics significant as a certain level. The F statistic is calculated as: + +$$F=\frac{(RSS_{p-k}-RSS_p)/k}{RSS_p /(n-p-k)}$$ where RSS = Residual sums of squares, n=number of observations in the analysis, p=number of parameters in fuller model (exc. intercept), k=number of degrees of freedom associated with the effect you are dropping, $$RSS_p$$ =RSS for the fuller model, $$RSS_{p-k}$$ = RSS for the reduced model. ### Parameterization of model effects (categorical covariates) in SAS diff --git a/SAS/rmst.qmd b/SAS/rmst.qmd new file mode 100644 index 00000000..55775ea4 --- /dev/null +++ b/SAS/rmst.qmd @@ -0,0 +1,126 @@ +--- +title: "Restricted Mean Survival Time (RMST) in SAS" +output: html_document +date: last-modified +date-format: D MMMM, YYYY +--- + +SAS have a User's Guide for RMSTREG Procedure [here](https://support.sas.com/documentation/onlinedoc/stat/151/rmstreg.pdf) which explains RMST analysis. + +There are two things you need to be aware of in the SAS documentation. + +**Issue 1:** page 8615 SAS says it expects the event indicator (Status) to be 1=event (death time) and 0=censor. If you follow this guidance, then you must ensure that you use: + +`model time*status(0)` as the model to ensure SAS knows that 0 is the censored observation. + +This is a little confusing, because firstly the information in brackets is asking for what is the indicator of censored observations, which is contrary to the name of the variable being 'status' ! + +In other survival procedures, we often have a variable `cnsr` which we set to 1=censored or 0=event, and hence we use `model time*cnsr(1)`. We find this more straight forward that using `status` and hence that is used throughout this example. + +```{r, echo=FALSE, fig.align='center', out.width="50%"} +knitr::include_graphics("../images/rmst/SASrmstreg2.png") +``` + +**Issue 2:** page 8616 tells us that if we omit the option `tau=xx` then SAS sets tau using the largest **event** time. However, what SAS actually does is use the largest `time` from either events or censored observations which will result in an incorrect analysis. Therefore, you must calculate tau yourself (using events only) and include it as an option in the SAS code. + +```{r, echo=FALSE, fig.align='center', out.width="50%"} +knitr::include_graphics("../images/rmst/SASrmstreg1.png") +``` + +## Data used + +We are using the lung_cancer.csv dataset found [here](CAMIS/data%20at%20main%20ยท%20PSIAIMS/CAMIS) with some manipulation as shown below. + +We just create a `cnsr` variable to use in the analysis which has 165 events, and 63 censored values. + +```{eval = FALSE} +data adcibc (keep=age sex trt time cnsr); + set lung_cancer; + if _N_<=100 then trt="PBO"; + else trt="Active"; + if status=1 then cnsr=1; + else cnsr=0; +run; +``` + +The data consist of: + +- time - Time(days) to event + +- cnsr - 1=censored, 0=event + +- age - Age of subject + +- sex - 1=male, 2 = female + +- trt - PBO or Act + +- ref- column of 1's just used for sorting + +For example: + +| time | cnsr | trt | age | sex | ref | +|------|------|-----|-----|-----|-----| +| 279 | 1 | Act | 64 | 1 | 1 | +| 276 | 1 | Act | 52 | 2 | 1 | +| 79 | 0 | Act | 64 | 2 | 1 | +| 654 | 0 | PBO | 68 | 2 | 1 | + +## Example Code using proc rmstreg + +Firstly we have to calculate tau from the data we have (using events only - cnsr=0). As explained in issue 2 above, if you do not do this SAS uses both events and censored observations to calculate tau, which is incorrect. The below calculates `tau` as 883 (highest event time). Following the calculation of tau, we then fit the proc rmstreg as shown. + +```{r eval=FALSE} +proc sort data=adcibc (where=(cnsr=0)) out=timord; + by ref time; +run; + +data adcibc2; + set adcibc ; + by ref time; + if last.ref then call symput("_tau",put(time,best8.)); +run; + +%put &_tau; + +proc rmstreg data=adtte tau=&_tau; +class trtp sex; + model aval*cnsr(1) =trtp sex age /link=linear method=ipcw (strata=trtp); + lsmeans trtp/pdiff=control('Placebo') cl alpha=0.05; +ods output lsmeans=lsm diffs= diff; +Run; + +``` + +To ensure you have the cnsr/event flag the right way around and tau set correctly, check the output closely. As you can see in the images below, tau=883 and number of events = 165 which is correct. + +```{r, echo=FALSE, fig.align='center', out.width="50%"} +knitr::include_graphics("../images/rmst/rmstreg_output1.png") +``` + +The above model results in a difference in expected value of the time-to-event (Active-Placebo) of -57 days. + +```{r, echo=FALSE, fig.align='center', out.width="50%"} +knitr::include_graphics("../images/rmst/rmstreg_output2.png") +``` + +However, fitting the analysis without the tau=XX option, you can see that the output shows tau =1022 which is the highest censored observation. This can radically change the analysis and should not be used. In this example, the difference in expected value of the time-to-event for Active-Placebo estimated at -65 days. + +```{r eval=FALSE} + +proc rmstreg data=adtte ; +class trtp sex; + model aval*cnsr(0) =trtp sex age /link=linear method=ipcw (strata=trtp); + lsmeans trtp/pdiff=control('Placebo') cl alpha=0.05; +ods output lsmeans=lsm diffs= diff; +Run; + +``` + +```{r, echo=FALSE, fig.align='center', out.width="50%"} +knitr::include_graphics("../images/rmst/rmstreg_output3.png") +``` + +```{r, echo=FALSE, fig.align='center', out.width="50%"} +knitr::include_graphics("../images/rmst/rmstreg_output4.png") +``` diff --git a/images/ci_for_prop/riskdiff_matched.png b/images/ci_for_prop/riskdiff_matched.png new file mode 100644 index 00000000..c2d35028 Binary files /dev/null and b/images/ci_for_prop/riskdiff_matched.png differ diff --git a/images/rmst/SASrmstreg1.png b/images/rmst/SASrmstreg1.png new file mode 100644 index 00000000..3779fcba Binary files /dev/null and b/images/rmst/SASrmstreg1.png differ diff --git a/images/rmst/SASrmstreg2.png b/images/rmst/SASrmstreg2.png new file mode 100644 index 00000000..063ef8ab Binary files /dev/null and b/images/rmst/SASrmstreg2.png differ diff --git a/images/rmst/rmstreg_output1.png b/images/rmst/rmstreg_output1.png new file mode 100644 index 00000000..75ce711d Binary files /dev/null and b/images/rmst/rmstreg_output1.png differ diff --git a/images/rmst/rmstreg_output2.png b/images/rmst/rmstreg_output2.png new file mode 100644 index 00000000..d74500d0 Binary files /dev/null and b/images/rmst/rmstreg_output2.png differ diff --git a/images/rmst/rmstreg_output3.png b/images/rmst/rmstreg_output3.png new file mode 100644 index 00000000..2306b111 Binary files /dev/null and b/images/rmst/rmstreg_output3.png differ diff --git a/images/rmst/rmstreg_output4.png b/images/rmst/rmstreg_output4.png new file mode 100644 index 00000000..738275ff Binary files /dev/null and b/images/rmst/rmstreg_output4.png differ