[Readme timings] Add timings to readme.md (#36)

* Update README.md * Add benchmarks to the docs * Add my benchmark results to the * input my numbers --------- Co-authored-by: rimhajal <[email protected]>
JuliaSurv · May 10, 2024 · 050b0f6 · 050b0f6
1 parent e8b1751
commit 050b0f6
Show file tree

Hide file tree

Showing 2 changed files with 98 additions and 23 deletions.
diff --git a/README.md b/README.md
@@ -1,4 +1,6 @@
-# NetSurvival
+# NetSurvival.jl
+
+*A pure-Julia take on standard net survival routines*
 
 [![Stable](https://img.shields.io/badge/docs-stable-blue.svg)](https://JuliaSurv.github.io/NetSurvival.jl/stable/)
 [![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://JuliaSurv.github.io/NetSurvival.jl/dev/)
@@ -12,18 +14,18 @@
 [![Code Style: Blue](https://img.shields.io/badge/code%20style-blue-4495d1.svg)](https://github.com/JuliaDiff/BlueStyle)
 
 
-The `NetSurvival.jl` package provides the necessary tools to perform estimations and analysis in the Net Survival field. This specialized branch of Survival Analysis focuses on estimating the probability of survival from a specific event of interest, for example a given cancer, without considering other causes of death. This is especially relevant in the (unfortunately quite common) case where the cause of death indicatrix is either unavailable or untrustworthy. Consequently, the so-called *missing indicatrix* issue forbids the use of standard competitive risks survival analysis methods on these datasets.  For that, a few standard estimators were established in the last 50 years, backed by a wide literature.
+The `NetSurvival.jl` package provides the necessary tools to perform estimations and analysis in the Net Survival field. This specialized branch of Survival Analysis focuses on estimating the probability of survival from a specific event of interest, for example a given cancer, without considering other causes of death, in the (unfortunately quite common) case where the cause of death indicatrix is unavailable (or e.g. untrustworthy). Consequently, the so-called *missing indicatrix* issue forbids the use of standard competitive risks survival analysis methods on these datasets. Thus, a few standard estimators were established in the last 50 years, backed by a wide literature.
 
 # Features 
 
-This package is an attempt to bring standard relative survival analysis modeling routines to Julia, while providing an interface that is close to the `relsurv` standard, albeit significantly faster and easier to maintain in the future.
+This package is an attempt to bring standard relative survival analysis modeling routines to Julia, while providing an interface that is close to the R package `relsurv`, albeit significantly faster and easier to maintain in the future. We aim at covering the standard estimators, needed for routines and comparisons, but also to provide the most up to date state of the art. 
 
 Some key features in `NetSurvival.jl` are:
 
 - A panel of different non-parametric net survival estimators (Ederer I, Ederer II, Hakulinen, Pohar Perme) with an interface compliant with Julia's standards. 
 - Grafféo's log-rank test to compare net survival curves accross groups, including stratified testing.
-- A compact, readable and efficient codebase (up to 1000x less LOC than `relsurv` for the same functionalities), ensuring long-term maintenability.
-- Significant performance improvements (up to 50x) compared to the R package `relsurv`.
+- A compact, readable and efficient codebase (up to 100x less LOC than `relsurv` for the same functionalities), ensuring long-term maintenability.
+- Significant performance improvements (see below) compared `relsurv`.
 
 # Getting Started
 
@@ -36,3 +38,30 @@ Pkg.add("https://github.com/JuliaSurv/NetSurvival.jl.git")
 
 See the rest of this [documentation](https://juliasurv.github.io/NetSurvival.jl/dev/) to have a glimpse of the functionalities!
 
+# Benchmarks
+
+`NetSurvival.jl` is *fast*. Below numbers gives runtime mulitpliers w.r.t. [`R::relsurv`](https://cran.r-project.org/web/packages/relsurv/index.html), computed on a i9-13900 processor. A version of these numbers computed on (even slower) github action's runners are availiable in [our documentation](https://juliasurv.github.io/NetSurvival.jl/dev/benches/), alongside the code needed to re-ran these numbers on your environnement. 
+
+The comparison is done on the `colrec` dataset with the `slopop` ratetable. The first numbers compare the timing in the obtention of the net survival curve: 
+
+|  | **Unstratified**<br>`Surv(time,status)~1` | **Stratified**<br>`Surv(time,status)~sex` |
+|--------------------------:|------------------------------:|----------------------------:|
+| Pohar Perme               | 20.8431                       | 20.1461                     |
+| EdererI                   | 7.216                         | 4.1363                      |
+| EdererII                  | 29.2397                       | 29.0399                     |
+| Hakulinen                 | 23.493                        | 15.6676                     |
+
+While the second numbers compare the implementation the Grafféo's log-rank-type test:
+
+|  | **Unstratified**<br>`Surv(time,status)~stage` | **Stratified**<br>`Surv(time,status)~stage+Strata(sex)` |
+|--------------------------:|------------------------------:|----------------------------:|
+| Graffeo's LRT             | 13.1556                       | 18.156                      |
+
+*Call to contributions* : If you have access to stata's implementation (which is not free) and want to report timings, do not hesitate to open an issue.
+
+
+# Contributions are welcome
+
+If you want to contribute to the package, ask a question, found a bug or simply want to chat, do not hesitate to open an issue on this repo. General guidelines on collaborative practices (colprac) are available at https://github.com/SciML/ColPrac.
+
+
diff --git a/docs/src/benches.md b/docs/src/benches.md
@@ -4,31 +4,77 @@ CurrentModule = NetSurvival
 
 # Benchmarking results
 
-The following benchmarks are run on github actions continuous integration platform, which is a very slow computing engine. Local experiments suggests performances that are twice as fast on correct hardware -- note that we do not use multithreading at all, but underlying BLAS calls might. 
+This page provides benchmark results of several standard net survival routines implemented in this package. Note that the runtime also depends on other packages, in particular on [`RateTables.jl`](https://github.com/JuliaSurv/RateTables.jl). 
+
+!!! note "Take numbers displayed here carefully"
+    All the following benchmarks are run on github action contunous integration platform, which is a very slow computing engine. Thus, these numbers may not represent your local performance. A locally ran version of these benchmarks in availiable on the github readme, and the below code blocks can be used to check performance on your own hardware.
+
+## Benchmarks w.r.t. `relsurv`
+
+This first set of benchmark compares standard functionalities with their implementation in `relsurv`. Below numbers gives runtime mulitpliers w.r.t. [`R::relsurv`](https://cran.r-project.org/web/packages/relsurv/index.html), computed on github action CI.
 
 ```@example 1
-using RCall
-using NetSurvival, RateTables, BenchmarkTools
-
-R_bench = @benchmark R"""
-relsurv::rs.surv(
-    survival::Surv(time, stat) ~1, 
-    rmap=list(age = age, sex = sex, year = diag), 
-    data = relsurv::colrec, 
-    ratetable = relsurv::slopop, 
-    method = "pohar-perme", 
-    add.times=1:8149)
-"""
-
-jl_bench = @benchmark fit(PoharPerme, @formula(Surv(time,status)~1), colrec, slopop)
-
-ratio = time(minimum(R_bench)) / time(minimum(jl_bench))
+using RateTables, NetSurvival, RCall, DataFrames
+
+function test_surv(r_method,::Type{E}, stratified) where E
+    if stratified
+        jl = @timed fit(E, @formula(Surv(time,status)~sex), colrec, slopop)
+        @rput r_method
+        r = @timed R"""
+            rez = relsurv::rs.surv(survival::Surv(time, stat) ~ sex, rmap=list(age = age, sex = sex, year = diag), data = relsurv::colrec, ratetable = relsurv::slopop, method = r_method, add.times=1:8149)
+        """
+    else
+        jl = @timed fit(E, @formula(Surv(time,status)~1), colrec, slopop)
+        @rput r_method
+        r = @timed R"""
+            rez = relsurv::rs.surv(survival::Surv(time, stat) ~ 1, rmap=list(age = age, sex = sex, year = diag), data = relsurv::colrec, ratetable = relsurv::slopop, method = r_method, add.times=1:8149)
+        """
+    end
+    return r.time / jl.time
+end
+
+function test_graffeo(stratified)
+    if stratified
+        jl = @timed fit(GraffeoTest, @formula(Surv(time,status)~stage+Strata(sex)), colrec, slopop)
+        r = @timed R"""
+        rez = relsurv::rs.diff(survival::Surv(time, stat) ~ stage + survival::strata(sex), rmap=list(age = age, sex = sex, year = diag), data = relsurv::colrec, ratetable = relsurv::slopop)
+        """
+    else 
+        jl = @timed fit(GraffeoTest, @formula(Surv(time,status)~stage), colrec, slopop)
+        r = @timed R"""
+        rez = relsurv::rs.diff(survival::Surv(time, stat) ~ stage, rmap=list(age = age, sex = sex, year = diag), data = relsurv::colrec, ratetable = relsurv::slopop)
+        """
+    end
+    return r.time / jl.time 
+end
+test_all(stratified) = [
+    test_surv("pohar-perme", PoharPerme, stratified),
+    test_surv("ederer1", EdererI, stratified),
+    test_surv("ederer2", EdererII, stratified),
+    test_surv("hakulinen", Hakulinen, stratified),
+    test_graffeo(stratified),
+]
+test_all() = DataFrame(
+    Algorithm = ["Pohar Perme", "EdererI", "EdererII", "Hakulinen", "Graffeo's LRT"], 
+    unstratified = test_all(false), 
+    stratified = test_all(true)
+)
+rez = test_all()
+rez = test_all() # discard first run.
+
+# note: to obtain the pretty printing from the readme, you need to install PrettyTables.jl and do : 
+# using PrettyTables
+# pretty_table(rez, backend = Val(:markdown))
+
+rez
 ```
 
 
+
+
 # Benchmarking across time
 
-The folloiwng charts provide a glimpse of `NetSurvival.jl`'s performance along time: 
+The following charts provide a glimpse of `NetSurvival.jl`'s performance along time, also ran on github CI: 
 
 ```@raw html
 <iframe src="../../benchmarks/" style="height:500px;width:100%;"></iframe>