Skip to content

Commit

Permalink
[Readme timings] Add timings to readme.md (#36)
Browse files Browse the repository at this point in the history
* Update README.md

* Add benchmarks to the docs

* Add my benchmark results to the

* input my numbers

---------

Co-authored-by: rimhajal <[email protected]>
  • Loading branch information
lrnv and rimhajal authored May 10, 2024
1 parent e8b1751 commit 050b0f6
Show file tree
Hide file tree
Showing 2 changed files with 98 additions and 23 deletions.
39 changes: 34 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
# NetSurvival
# NetSurvival.jl

*A pure-Julia take on standard net survival routines*

[![Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://JuliaSurv.github.io/NetSurvival.jl/stable/)
[![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://JuliaSurv.github.io/NetSurvival.jl/dev/)
Expand All @@ -12,18 +14,18 @@
[![Code Style: Blue](https://img.shields.io/badge/code%20style-blue-4495d1.svg)](https://github.com/JuliaDiff/BlueStyle)


The `NetSurvival.jl` package provides the necessary tools to perform estimations and analysis in the Net Survival field. This specialized branch of Survival Analysis focuses on estimating the probability of survival from a specific event of interest, for example a given cancer, without considering other causes of death. This is especially relevant in the (unfortunately quite common) case where the cause of death indicatrix is either unavailable or untrustworthy. Consequently, the so-called *missing indicatrix* issue forbids the use of standard competitive risks survival analysis methods on these datasets. For that, a few standard estimators were established in the last 50 years, backed by a wide literature.
The `NetSurvival.jl` package provides the necessary tools to perform estimations and analysis in the Net Survival field. This specialized branch of Survival Analysis focuses on estimating the probability of survival from a specific event of interest, for example a given cancer, without considering other causes of death, in the (unfortunately quite common) case where the cause of death indicatrix is unavailable (or e.g. untrustworthy). Consequently, the so-called *missing indicatrix* issue forbids the use of standard competitive risks survival analysis methods on these datasets. Thus, a few standard estimators were established in the last 50 years, backed by a wide literature.

# Features

This package is an attempt to bring standard relative survival analysis modeling routines to Julia, while providing an interface that is close to the `relsurv` standard, albeit significantly faster and easier to maintain in the future.
This package is an attempt to bring standard relative survival analysis modeling routines to Julia, while providing an interface that is close to the R package `relsurv`, albeit significantly faster and easier to maintain in the future. We aim at covering the standard estimators, needed for routines and comparisons, but also to provide the most up to date state of the art.

Some key features in `NetSurvival.jl` are:

- A panel of different non-parametric net survival estimators (Ederer I, Ederer II, Hakulinen, Pohar Perme) with an interface compliant with Julia's standards.
- Grafféo's log-rank test to compare net survival curves accross groups, including stratified testing.
- A compact, readable and efficient codebase (up to 1000x less LOC than `relsurv` for the same functionalities), ensuring long-term maintenability.
- Significant performance improvements (up to 50x) compared to the R package `relsurv`.
- A compact, readable and efficient codebase (up to 100x less LOC than `relsurv` for the same functionalities), ensuring long-term maintenability.
- Significant performance improvements (see below) compared `relsurv`.

# Getting Started

Expand All @@ -36,3 +38,30 @@ Pkg.add("https://github.com/JuliaSurv/NetSurvival.jl.git")

See the rest of this [documentation](https://juliasurv.github.io/NetSurvival.jl/dev/) to have a glimpse of the functionalities!

# Benchmarks

`NetSurvival.jl` is *fast*. Below numbers gives runtime mulitpliers w.r.t. [`R::relsurv`](https://cran.r-project.org/web/packages/relsurv/index.html), computed on a i9-13900 processor. A version of these numbers computed on (even slower) github action's runners are availiable in [our documentation](https://juliasurv.github.io/NetSurvival.jl/dev/benches/), alongside the code needed to re-ran these numbers on your environnement.

The comparison is done on the `colrec` dataset with the `slopop` ratetable. The first numbers compare the timing in the obtention of the net survival curve:

| | **Unstratified**<br>`Surv(time,status)~1` | **Stratified**<br>`Surv(time,status)~sex` |
|--------------------------:|------------------------------:|----------------------------:|
| Pohar Perme | 20.8431 | 20.1461 |
| EdererI | 7.216 | 4.1363 |
| EdererII | 29.2397 | 29.0399 |
| Hakulinen | 23.493 | 15.6676 |

While the second numbers compare the implementation the Grafféo's log-rank-type test:

| | **Unstratified**<br>`Surv(time,status)~stage` | **Stratified**<br>`Surv(time,status)~stage+Strata(sex)` |
|--------------------------:|------------------------------:|----------------------------:|
| Graffeo's LRT | 13.1556 | 18.156 |

*Call to contributions* : If you have access to stata's implementation (which is not free) and want to report timings, do not hesitate to open an issue.


# Contributions are welcome

If you want to contribute to the package, ask a question, found a bug or simply want to chat, do not hesitate to open an issue on this repo. General guidelines on collaborative practices (colprac) are available at https://github.com/SciML/ColPrac.


82 changes: 64 additions & 18 deletions docs/src/benches.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,31 +4,77 @@ CurrentModule = NetSurvival

# Benchmarking results

The following benchmarks are run on github actions continuous integration platform, which is a very slow computing engine. Local experiments suggests performances that are twice as fast on correct hardware -- note that we do not use multithreading at all, but underlying BLAS calls might.
This page provides benchmark results of several standard net survival routines implemented in this package. Note that the runtime also depends on other packages, in particular on [`RateTables.jl`](https://github.com/JuliaSurv/RateTables.jl).

!!! note "Take numbers displayed here carefully"
All the following benchmarks are run on github action contunous integration platform, which is a very slow computing engine. Thus, these numbers may not represent your local performance. A locally ran version of these benchmarks in availiable on the github readme, and the below code blocks can be used to check performance on your own hardware.

## Benchmarks w.r.t. `relsurv`

This first set of benchmark compares standard functionalities with their implementation in `relsurv`. Below numbers gives runtime mulitpliers w.r.t. [`R::relsurv`](https://cran.r-project.org/web/packages/relsurv/index.html), computed on github action CI.

```@example 1
using RCall
using NetSurvival, RateTables, BenchmarkTools
R_bench = @benchmark R"""
relsurv::rs.surv(
survival::Surv(time, stat) ~1,
rmap=list(age = age, sex = sex, year = diag),
data = relsurv::colrec,
ratetable = relsurv::slopop,
method = "pohar-perme",
add.times=1:8149)
"""
jl_bench = @benchmark fit(PoharPerme, @formula(Surv(time,status)~1), colrec, slopop)
ratio = time(minimum(R_bench)) / time(minimum(jl_bench))
using RateTables, NetSurvival, RCall, DataFrames
function test_surv(r_method,::Type{E}, stratified) where E
if stratified
jl = @timed fit(E, @formula(Surv(time,status)~sex), colrec, slopop)
@rput r_method
r = @timed R"""
rez = relsurv::rs.surv(survival::Surv(time, stat) ~ sex, rmap=list(age = age, sex = sex, year = diag), data = relsurv::colrec, ratetable = relsurv::slopop, method = r_method, add.times=1:8149)
"""
else
jl = @timed fit(E, @formula(Surv(time,status)~1), colrec, slopop)
@rput r_method
r = @timed R"""
rez = relsurv::rs.surv(survival::Surv(time, stat) ~ 1, rmap=list(age = age, sex = sex, year = diag), data = relsurv::colrec, ratetable = relsurv::slopop, method = r_method, add.times=1:8149)
"""
end
return r.time / jl.time
end
function test_graffeo(stratified)
if stratified
jl = @timed fit(GraffeoTest, @formula(Surv(time,status)~stage+Strata(sex)), colrec, slopop)
r = @timed R"""
rez = relsurv::rs.diff(survival::Surv(time, stat) ~ stage + survival::strata(sex), rmap=list(age = age, sex = sex, year = diag), data = relsurv::colrec, ratetable = relsurv::slopop)
"""
else
jl = @timed fit(GraffeoTest, @formula(Surv(time,status)~stage), colrec, slopop)
r = @timed R"""
rez = relsurv::rs.diff(survival::Surv(time, stat) ~ stage, rmap=list(age = age, sex = sex, year = diag), data = relsurv::colrec, ratetable = relsurv::slopop)
"""
end
return r.time / jl.time
end
test_all(stratified) = [
test_surv("pohar-perme", PoharPerme, stratified),
test_surv("ederer1", EdererI, stratified),
test_surv("ederer2", EdererII, stratified),
test_surv("hakulinen", Hakulinen, stratified),
test_graffeo(stratified),
]
test_all() = DataFrame(
Algorithm = ["Pohar Perme", "EdererI", "EdererII", "Hakulinen", "Graffeo's LRT"],
unstratified = test_all(false),
stratified = test_all(true)
)
rez = test_all()
rez = test_all() # discard first run.
# note: to obtain the pretty printing from the readme, you need to install PrettyTables.jl and do :
# using PrettyTables
# pretty_table(rez, backend = Val(:markdown))
rez
```




# Benchmarking across time

The folloiwng charts provide a glimpse of `NetSurvival.jl`'s performance along time:
The following charts provide a glimpse of `NetSurvival.jl`'s performance along time, also ran on github CI:

```@raw html
<iframe src="../../benchmarks/" style="height:500px;width:100%;"></iframe>
Expand Down

0 comments on commit 050b0f6

Please sign in to comment.