Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nessie #47

Merged
merged 30 commits into from
May 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ CSV = "0.10"
DataFrames = "1"
Distributions = "0.25"
LinearAlgebra = "1.6"
RateTables = "0.1"
RateTables = "0.1.1"
RCall = "0.14"
StatsAPI = "1"
StatsBase = "0.34"
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ Some key features in `NetSurvival.jl` are:

- A panel of different non-parametric net survival estimators (Ederer I, Ederer II, Hakulinen, Pohar Perme) with an interface compliant with Julia's standards.
- Grafféo's log-rank test to compare net survival curves accross groups, including stratified testing.
- A 'Nessie' function that outputs the estimated sample size by yearly intervals and the average lifespan expectancy left for a given group.
- A compact, readable and efficient codebase (up to 100x less LOC than `relsurv` for the same functionalities), ensuring long-term maintenability.
- Significant performance improvements (see below) compared `relsurv`.

Expand Down
20 changes: 20 additions & 0 deletions docs/src/example.md
Original file line number Diff line number Diff line change
Expand Up @@ -238,3 +238,23 @@ plot(plot1, plot2, layout = (1, 2))
```

Visually, it is almost immediately understood that there are no worthy differences between the two sexes whereas the `age65` variable seems to play a big role.


## Estimated sample size and life expectancy

Given that the age group plays a significant role in the study, we will now estimate the sample size by yearly intervals in order to better compare the age groups.

```@example 2
elt, ess = nessie(@formula(Surv(time,status)~age65), colrec, slopop)
elt
```

The expected life time for the younger patients is significatively higher than for older patients (24.78 years > 10.29 years).

```@example 2
hcat(ess[:,3]...)
```

Finally, the table above represents yearly expected sample sizes for both age groups under 65 and above, with the second column representing the latter. We can see that the sample size decreases for the older patients in a much more dramatic way than for the younger ages.

Unsurprisingly, we can thus conclude that age plays an important role in the study.
16 changes: 16 additions & 0 deletions docs/src/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,22 @@ Under $H_0$, the statistic $U(T)$ is asymptotically $\chi^2(k-1)$-distributed. W
GraffeoTest
```

## Nessie

The Nessie function estimates the sample size by yearly intervals as well as averages an estimated lifespan left for a given group.

This function is highly dependant on the `Life` function taken from the `RateTables.jl` package which you can find documented [here](https://juliasurv.github.io/RateTables.jl/dev/).

The sample size is thus taken by the following formula:

$$ESS(t) = \sum_i^N S_{P_i}(t) * \exp(-\Lambda_{P_i}(t))$$

While the estimated lifepsan is directly taken from the `expectation` function.

```@docs
nessie
```

## References

```@bibliography
Expand Down
3 changes: 2 additions & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@ Some key features in `NetSurvival.jl` are:

- A panel of different non-parametric net survival estimators (Ederer I [Ederer1961](@cite), Ederer II [Ederer1959](@cite), Hakulinen [Hakulinen1977](@cite), Pohar Perme [PoharPerme2012](@cite)) with an interface compliant with Julia's standards.
- Grafféo's log-rank test [Graffeo2016](@cite) to compare net survival curves accross groups, including stratified testing.
- Crude mortality, Expected Sample Size, and other usefull metrics in net survival field.
- Crude mortality, Expected Sample Size, and other useful metrics in net survival field.
- A 'Nessie' function that outputs the estimated sample size by yearly intervals and the average lifespan expectancy left for a given group.
- A compact, readable and efficient codebase (up to 1000x less LOC than `relsurv` for the same functionalities), ensuring long-term maintenability.
- Significant performance improvements (up to 50x) compared to the R package `relsurv`.

Expand Down
2 changes: 1 addition & 1 deletion src/NPNSEstimator.jl
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ function _get_rate_predictors(rt,df)
return prd
end

function StatsBase.fit(::Type{E}, formula::FormulaTerm, df::DataFrame, rt::RateTables.AbstractRateTable) where {E<:NPNSEstimator}
function StatsBase.fit(::Type{E}, formula::FormulaTerm, df::DataFrame, rt::RateTables.AbstractRateTable) where {E<:Union{NPNSEstimator, Nessie}}
rate_predictors = _get_rate_predictors(rt,df)
formula_applied = apply_schema(formula,schema(df))

Expand Down
44 changes: 44 additions & 0 deletions src/Nessie.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
struct Nessie
expected_sample_size::Vector{Float64}
expected_life_time::Float64
grid::Vector{Float64}
function Nessie(T, Δ, age, year, rate_preds, ratetable)
annual_grid = 0:RateTables.RT_DAYS_IN_YEAR:maximum(T)
exp_spl_size = zeros(length(annual_grid))
exp_life_time = 0.0
for i in eachindex(age)
Lᵢ = Life(ratetable[rate_preds[i,:]...], age[i], year[i])
for j in eachindex(annual_grid)
exp_spl_size[j] += ccdf(Lᵢ, annual_grid[j])
end
exp_life_time += expectation(Lᵢ)
end
return new(exp_spl_size, exp_life_time / RateTables.RT_DAYS_IN_YEAR / length(age), annual_grid)
end
end

"""
nessie

To call this function, use the formula below:

nessie(@formula(Surv(time,status)~covariate), data, ratetable)
"""
function nessie(args...)
r = fit(Nessie,args...)
if (typeof(r)<:Nessie)
return r
end
transform!(r, :estimator => ByRow(x-> (x.grid, x.expected_life_time, x.expected_sample_size)) => [:grid, :expected_life_time,:expected_sample_size])
select!(r, Not(:estimator))

lt = deepcopy(r)
select!(lt, Not([:expected_sample_size, :grid]))

select!(r, Not(:expected_life_time))
return lt, r
end

# Maybe not necessary ? No need to clutter the interface too much..
expected_life_time(x::Nessie) = x.expected_life_time
expected_sample_size(x::Nessie) = x.expected_sample_size
5 changes: 2 additions & 3 deletions src/NetSurvival.jl
Original file line number Diff line number Diff line change
Expand Up @@ -13,19 +13,18 @@ using RateTables

include("fetch_datasets.jl")
include("Surv_and_Strata.jl")

include("Nessie.jl")
include("NPNSEstimator.jl")
include("PoharPerme.jl")
include("EdererI.jl")
include("EdererII.jl")
include("Hakulinen.jl")

include("CrudeMortality.jl")

include("GraffeoTest.jl")

export PoharPerme, EdererI, EdererII, Hakulinen
export CrudeMortality
export Nessie, nessie
export fit, confint
export GraffeoTest
export Surv, Strata
Expand Down
30 changes: 28 additions & 2 deletions test/sampletest.jl
Original file line number Diff line number Diff line change
Expand Up @@ -108,11 +108,11 @@ end

# Coompare results with R:
compare_with_R(v1, vR)
compare_with_R(v1_strat, vR_strat) # <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<------------------- This ones fails
compare_with_R(v1_strat, vR_strat)

# Check for equality of the two interfaces:
check_equal(v1,v2)
check_equal(v1_strat,v2_strat) # <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<------------------- This ones fails
check_equal(v1_strat,v2_strat)
end


Expand Down Expand Up @@ -146,4 +146,30 @@ end
err_pop = (r[:population][:est][2:end, :] .- instance.Λₚ[1:end, :]) ./ r[:population][:est][2:end, :]
@test all(abs.(err_causeSpec) .<= 0.01)
@test all(abs.(err_pop) .<= 0.01)
end

@testitem "Assess Nessie" begin
using RateTables
using RCall

R"""
rez = relsurv::nessie(survival::Surv(time, stat) ~ sex, data = relsurv::colrec, ratetable = relsurv::slopop, rmap = list(age = age, sex = sex, year = diag))
mata = t(as.matrix(rez$mata))
povp = rez$povp
"""
r_mata = @rget mata
r_povp = @rget povp
r_male, r_female = r_mata[:,1], r_mata[:,2]

instance = nessie(@formula(Surv(time,status)~sex), colrec, slopop)
jl_male, jl_female = instance[2].expected_sample_size
jl_povp = instance[1].expected_life_time

err_male = (r_male[1:end-1] .- jl_male) ./ r_male[1:end-1]
err_female = (r_female[1:end-1] .- jl_female) ./ r_female[1:end-1]
err_povp = (r_povp .- jl_povp) ./ r_povp

@test all(abs.(err_male) .<= 0.01)
@test all(abs.(err_female) .<= 0.01)
@test all(abs.(err_povp) .<= 0.01)
end
Loading