Skip to content

Commit

Permalink
documentation (#23)
Browse files Browse the repository at this point in the history
* net survival (excluding test)

* might be better to explain the theory in one place and the interface in another

* adding refs Ederer II

* test doc

* p-value

* autodocs updtae

* ref autodocs ?

* some improvements

* merging examples with getting started

* fix error

* fix error

* ref + example

* Better name for Graffeo Reference

* fixup index a bit

* Theory partial rewrite

* grammar

* update

* error

* bold marks for observations

* typo

* point on covariates

* rephrase

* adding refs

* fix

* some changes

* maybe better

* remove repetition

---------

Co-authored-by: Oskar Laverny <[email protected]>
  • Loading branch information
rimhajal and lrnv authored Apr 17, 2024
1 parent b34d2a9 commit 333858f
Show file tree
Hide file tree
Showing 9 changed files with 300 additions and 82 deletions.
2 changes: 1 addition & 1 deletion docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ makedocs(;
),
pages=[
"Home" => "index.md",
"theory.md",
"getting_started.md",
"examples.md",
"references.md",
],
)
Expand Down
92 changes: 91 additions & 1 deletion docs/src/assets/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ @article{PoharPerme2012
doi = {10.1111/j.1541-0420.2011.01640.x},
}
@article{GraffeoTest,
@article{Graffeo2016,
author = {Grafféo, Nathalie and Castell, Fabienne and Belot, Aurélien and Giorgi, Roch},
title = "{A Log-Rank-Type Test to Compare Net Survival Distributions}",
journal = {Biometrics},
Expand All @@ -22,4 +22,94 @@ @article{GraffeoTest
month = {01},
issn = {0006-341X},
doi = {10.1111/biom.12477},
}
@article{Ederer1961,
title={The relative survival rate: a statistical methodology},
author={Ederer, Fred},
journal={Natl. Cancer Inst. Monogr.},
volume={6},
pages={101--121},
year={1961}
}
@article{Ederer1959,
title={The effect of eliminating deaths from cancer in general survival rates, methodological notes 11},
author={Ederer, F and Heise, H},
journal={End Result Evaluation Section, National Cancer Institute},
year={1959}
}
@article{Hakulinen1977,
title={On long-term relative survival rates},
author={Hakulinen, Timo},
journal={Journal of Chronic Diseases},
volume={30},
number={7},
pages={431--443},
year={1977},
publisher={Elsevier}
}
@article{Hakulinen1985,
title={A computer program package for relative survival analysis},
author={Hakulinen, Timo and Abeywickrama, Kamal H},
journal={Computer programs in biomedicine},
volume={19},
number={2-3},
pages={197--207},
year={1985},
publisher={Elsevier}
}
@article{Hakulinen1987,
title={Regression analysis of relative survival rates},
author={Hakulinen, Timo and Tenkanen, Leena},
journal={Journal of the Royal Statistical Society Series C: Applied Statistics},
volume={36},
number={3},
pages={309--317},
year={1987},
publisher={Oxford University Press}
}
@book{FlemingHarington2013,
title = {Counting Processes and Survival Analysis},
author = {Fleming, Thomas R and Harrington, David P},
year = {2013},
volume = {625},
publisher = {John Wiley \& Sons},
}
@book{ABGK1993,
title = {Statistical {{Models Based}} on {{Counting Processes}}},
author = {Andersen, Per Kragh and Borgan, Ornulf and Gill, Richard D. and Keiding, Niels},
year = {1993},
series = {Springer {{Series}} in {{Statistics}}},
publisher = {Springer US},
address = {New York, NY},
doi = {10.1007/978-1-4612-4348-9},
urldate = {2024-02-22},
isbn = {978-0-387-94519-4 978-1-4612-4348-9},
langid = {english},
file = {C:\Users\lrnv\Zotero\storage\TPUTGJHI\Andersen et al. - 1993 - Statistical Models Based on Counting Processes.pdf}
}
@article{PermePavlik2018,
title={Nonparametric relative survival analysis with the R package relsurv},
author={Perme, Maja Pohar and Pavlic, Klemen},
journal={Journal of Statistical Software},
volume={87},
pages={1--27},
year={2018}
}
@article{CharvatBelot2021,
title={Mexhaz: An R package for fitting flexible hazard-based regression models for overall and excess mortality with a random effect},
author={Charvat, Hadrien and Belot, Aur{\'e}lien},
journal={Journal of Statistical Software},
volume={98},
pages={1--36},
year={2021}
}
51 changes: 0 additions & 51 deletions docs/src/examples.md

This file was deleted.

48 changes: 34 additions & 14 deletions docs/src/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,29 +4,49 @@ CurrentModule = NetSurvival

# Getting Started

## Pohar Perme
## Fitting the non parametric estimators

The Pohar Perme[PoharPerme2012](@cite) is a statistical method used in survival analysis to estimate net survival probabilities, particularly designed to handle situations where covariates may change over time. The net survival function is defined as:
```@docs
PoharPerme
```

### Example

$$S_{E}(t) = exp(-\int_0^t\lambda_{E}(u)du)$$
We will illustrate with an example using the dataset `colrec`, which comprises $5971$ patients diagnosed with colon or rectal cancer between 1994 and 2000. This dataset is sourced from the Slovenia cancer registry. Given the high probability that the patients are Slovenian, we will be using the Slovenian mortality table `slopop` as reference for the populational rates. Subsequently, we can apply various non-parametric estimators for net survival analysis.

The $\lambda_E$ is the associated hazard given by :
!!! note "N.B."
Mortality tables may vary in structure, with options such as the addition or removal of specific covariates. To confirm that the mortality table is in the correct format, please refer to the documentation of `RateTables.jl`, or directly extract it from there.

$$\lambda_E (t) = \frac{\sum_{i=1}^{N}S_{E_i}(t)\lambda_{E_i}(t)}{\sum_{i=1}^{N}S_{E_i}(t)}$$
By examining `slopop`, we notice it contains information regarding `age` and `year`, as expected for mortality tables. Additionally, it incorporates the covariate sex, which has two possible entries (`:male` or `:female`).

This weighted average is thus based on the likelihood that an individual remains alive in a hypothetical scenario where the disease is the sole cause of death.
**Pohar Perme**
```@example 1
using NetSurvival, RateTables
pp1 = fit(PoharPerme, @formula(Surv(time,status)~1), colrec, slopop)
```

## Ederer II
## Applying the Grafféo log-rank test

```@docs
GraffeoTest
```

## Grafféo Log-Rank Test
### Example

The Grafféo Log-Rank Test [GraffeoTest](@cite) was constructed as a complement to the Pohar Perme estimator, aiming to compare the net survival functions provided by the latter. The test is designed to compare these functions across multiple groups, including stratified covariables, and to ultimately determine, with the given p-value, which covariables are impactful to the study.
When applying the test to the same data as before, we get:

For this test, we first define the number of deaths caused by the event studied for a time $s$ within the group $h$ noted $N_{E,h}(s)$ and the process of individuals at risk within the same group $h$ at time $s$ noted $Y_h(s)$. Both of these values are weighted with the populational estimated survival for the given patient, same as in Pohar Perme.
```@example 1
test1 = fit(GraffeoTest, @formula(Surv(time,status)~stage), colrec, slopop)
```

The p-value is well under $0.05$, meaning that the different groups identified by the `stage` variable have different survival probabilities. Thus, it should be taken into consideration in the study.

```@example 1
test2 = fit(GraffeoTest, @formula(Surv(time,status)~sex), colrec, slopop)
```

The $(H_0)$ hypothesis tested
For the `sex` variable, we notice that the p-value is above $0.05$ indicating that there isn't a difference between male and female patients.

```@bibliography
Pages = ["getting_started.md"]
Canonical = false
```@example 1
test4 = fit(GraffeoTest, @formula(Surv(time,status)~stage+Strata(sex)), colrec, slopop)
```
32 changes: 21 additions & 11 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,26 +4,36 @@ CurrentModule = NetSurvival

## Introduction

This package serves to provide the necessary tools to perform net survival analysis, a branch of survival analysis dedicated to estimating the probability of survival from a particular event of interest compared to the general public. Some key features in `NetSurvival.jl` are:

- Fitting different non-parametric estimators (Pohar Perme[PoharPerme2012](@cite), Ederer II, ...)
- Applying Grafféo's log-rank test[GraffeoTest](@cite) on different groups, including stratified covariables
- ...
The `NetSurvival.jl` package provides the necessary tools to perform estimations and analysis in the Net Survival field. This specialized branch of Survival Analysis focuses on estimating the probability of survival from a specific event of interest, for example a given cancer, without considering other causes of death. This is especially relevant in the (unfortunately quite common) case where the cause of death indicatrix is either unavailable or untrustworthy. Consequently, the so-called *missing indicatrix* issue forbids the use of standard competitive risks survival analysis methods on these datasets. For that, a few standard estimators were established in the last 50 years, backed by a wide literature.

By integrating observed data from the target population with historical population mortality data (usually sourced from national census datasets), Net Survival allows the extraction of the specific mortality hazard associated with the particular disease, even under the missing indicatrix issue. The concept of relative survival analysis dates back several decades to the seminal article by Ederer, Axtell, and Cutler in 1961 [Ederer1961](@cite) and the one by Ederer and Heise in 1959 [Ederer1959](@cite).

For years, the Hakulinen estimator (1977) [Hakulinen1977](@cite) and the Ederer I and II estimators were widely regarded as the gold standard for non-parametric survival curve estimation. However, the introduction of the Pohar-Perme, Stare, and Estève estimator in 2012 [PoharPerme2012](@cite) resolved several issues inherent in previous estimators, providing a reliable and consistent non-parametric estimator for net survival analysis.

## Features

Standard tools nowadays are composed of R packages, with underlying C and C++ routines, that are hard to read, maintain, and use. This package is an attempt to bring standard relative survival analysis modeling routines to Julia, while providing an interface that is close to the `relsurv` standard, albeit significantly faster and easier to maintain in the future. Our hope is that the junction with classical modeling API in Julia will allow later extensions of the existing modeling methods, with a simple interface for the practitioners.

Some key features in `NetSurvival.jl` are:

- A panel of different non-parametric net survival estimators (Ederer I [Ederer1961](@cite), Ederer II [Ederer1959](@cite), Hakulinen [Hakulinen1977](@cite), Pohar Perme [PoharPerme2012](@cite)) with an interface compliant with Julia's standards.
- Grafféo's log-rank test [Graffeo2016](@cite) to compare net survival curves accross groups, including stratified testing.
- A compact, readable and efficient codebase (up to 1000x less LOC than `relsurv` for the same functionalities), ensuring long-term maintenability.
- Significant performance improvements (up to 50x) compared to the R package `relsurv`.

## Installation

The package is available on Julia's general registry, and can be installed either with the command `Pkg.add("NetSurvival")` or via the Pkg REPL mode:
The package is not yet available on Julia's general registry, and thus can be installed through the following command:

```julia
] add NetSurvival
using Pkg
Pkg.add("https://github.com/JuliaSurv/NetSurvival.jl.git")
```

```@index
```
See the rest of this documentation to have a glimpse of the functionalities!

```@autodocs
Modules = [NetSurvival]
```
# References

```@bibliography
Pages = ["index.md"]
Expand Down
Loading

0 comments on commit 333858f

Please sign in to comment.