-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cox Regression #2
Conversation
Here's a code snippet to try it out:
|
I'M JUST SO HAPPY. This is awesome, @piever, thank you! The 0.7 CI failures are due to some macro changes in Base that have been breaking everything. I'll do a more thorough review of this soon. Thanks again! |
I've actually been meaning to generalize the internal Kaplan-Meier function to update a state object that can be specialized for different estimators. I realized when I went to implement cumulative incidence with competing risks that it's a trivial modification of that function. I hadn't looked into Nelson-Aalen yet but if it's in a similar boat then that's all the more reason for a generalization there. 🙂 |
👍 I had started working on Cox regression a while back and I had only done Breslow ties because I'm lazy. I do love that Norm Brewslow himself has said not to use that estimator. 😂 It's just so easy to implement though!
💯
I wonder if we could inspect the As an aside, I'd be interested to hear your opinion of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! The details of the algorithm would take me far too much time to understand, but I have a few more basic comments.
test/runtests.jl
Outdated
@@ -199,3 +201,23 @@ end | |||
@test_throws DimensionMismatch fit(KaplanMeier, [1, 2], [true]) | |||
@test_throws ArgumentError fit(KaplanMeier, Float64[], Bool[]) | |||
end | |||
|
|||
@testset "Cox" begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this deserves more tests, with a few variants of the model covering all common scenarios and corner cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More tests never hurt 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed! I've just noticed that, as ModelFrame
always gives you matrices of Float64
that's the only thing I've been testing, so I should probably make sure that other types behave well. What else do you have in mind?
test/runtests.jl
Outdated
|
||
@test nobs(outcome) == size(rossi, 1) | ||
@test dof(outcome) == 7 | ||
@test loglikelihood(outcome) > nullloglikelihood(outcome) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just hardcode the values in the file.
src/optimization.jl
Outdated
@@ -0,0 +1,17 @@ | |||
function newton_raphson(fgh!, x::AbstractArray{T}; ρ = 0.5, c = 1e-4, tol = 1e-4, max_iter = 1000) where T | |||
grad= zeros(T, length(x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Space before =
here and below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've also noted a few missing spaces after commas here and there.
src/optimization.jl
Outdated
end | ||
x = x + search_dir*step_size | ||
end | ||
error("Reached maximum number of iterations") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use StatsBase.ConvergenceException
.
src/coxph.jl
Outdated
function _coxph(X::AbstractArray{T}, s::AbstractVector; l2_cost = zero(T), kwargs...) where T | ||
c = CoxAux(X, s, l2_cost) | ||
|
||
fgh! = (β,grad,hes, compute_derivatives) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you just pass _cox_fgh!
to newton_raphson
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_cox_fgh!
has one extra argument (the CoxAux type with all the model info). As I plan to use newton_raphson
for other fits related to survival analysis, I prefer to keep it as a generic minimization routine.
src/optimization.jl
Outdated
step_size *= ρ | ||
step_size > 1e-10 || error("Linesearch failed! Problematic Hessian or Gradient?") | ||
end | ||
x = x + search_dir*step_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make sense to update x
in place?
src/coxph.jl
Outdated
end | ||
end | ||
end | ||
y += λ*(β' * β) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use .+=
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both the right- and left-hand sides of +=
are scalars in this case though; note that y
is defined to be zero(T)
and β'β
will be a scalar since it's a RowVector
times a Vector
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how β'β
compares in terms of performance with dot(β, β)
and sum(x->x^2, β)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've done some testing, and the two first forms are essentially of equivalent speed. The sum
approach is quite slower though, which is quite surprising (maybe worth filing a bug in Base?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least on 0.6, I see the opposite:
julia> x = rand(10_000);
julia> @benchmark $x'*$x
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 6.333 μs (0.00% GC)
median time: 7.041 μs (0.00% GC)
mean time: 7.718 μs (0.00% GC)
maximum time: 39.177 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 5
julia> @benchmark dot($x, $x)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 6.545 μs (0.00% GC)
median time: 6.647 μs (0.00% GC)
mean time: 6.931 μs (0.00% GC)
maximum time: 20.233 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 5
julia> @benchmark sum(i->i^2, $x)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 2.808 μs (0.00% GC)
median time: 2.831 μs (0.00% GC)
mean time: 2.912 μs (0.00% GC)
maximum time: 10.364 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 9
Not surprising that the x'x
form is equivalent in speed to dot(x, x)
, since @code_lowered
shows that the former is calling dot. TIL. 🙂
I tried to benchmark on master too but BenchmarkTools seems completely broken on master.
src/coxph.jl
Outdated
|
||
if compute_derivatives | ||
for k1 in 1:length(β) | ||
grad[k1] += 2*λ*β[k1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double space.
src/cox_utils.jl
Outdated
end | ||
|
||
|
||
# fs = index first deaths, ls = index last deaths, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to use a more neutral term than "death". :-) Maybe "time-to-event" or just "time"?
src/cox_utils.jl
Outdated
@@ -0,0 +1,78 @@ | |||
# Compute first and last | |||
firsts(s) = [s[t].status && (t==1 || s[t] > s[t-1]) for t = 1:length(s)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like you could create generators rather than arrays here, that would avoid a few allocations (not sure it matters).
The |
src/cox_utils.jl
Outdated
end | ||
|
||
function CoxSum(values::Array{T,N}, fs, ls) where {T, N} | ||
sums = zeros(T, size(values,1)+1, Base.tail(size(values))...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably just micro-optimizing here, but it might be a little more efficient to do it like
function CoxSum(values::Array{T,N}, fs, ls) where {T,N}
tailsize = Base.tail(size(values))
sums = zeros(T, size(values, 1) + 1, tailsize...)
chunks = zeros(T, length(fs), tailsize...)
CoxSum{T,N}(values, sums, chunks, copy(chunks))
end
test/runtests.jl
Outdated
outcome_coefmat = coeftable(outcome) | ||
|
||
filepath_coefs = joinpath(Pkg.dir("Survival", "test"), "expected_coefmat.jld") | ||
expected_coefmat = JLD.load(filepath_coefs, "expected_coefmat") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way we could just test against hard-coded values here rather than adding a test dependency on JLD? JLD has a tendency to break easily on Julia nightly since it relies a lot on internal details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In GLM.jl we just hardcode the coefficients values as inline arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, also because I keep getting annoying warnings from JLD
, it's probably just better to type in the values in the julia file.
REQUIRE
Outdated
@@ -1,3 +1,4 @@ | |||
julia 0.6.0-pre | |||
StatsBase | |||
Distributions | |||
PositiveFactorizations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with this package. What does it provide that's used here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order for Newton algorithm to work far from a local minimum, when the hessian may not be positive defined, it's good to have an approximation of the inverse hessian that is positive defined, to ensure that the value of the objective function goes down and PositiveFactorizations
is a small package that does just that: I use it here. Just using the inverse of the hessian would probably still work but in a less reliable way. Optim also has it as a dependency for the same reason I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay great, thanks for the explanation.
Thanks for all the feedback!
Nelson-Aalen gives you the cumulative hazard (i.e.
I'm not very fond of the |
I should have addressed most remarks. I've improved the type management, so that now it should be able to accept matrices of any type without giving unexpected method errors (which I guess were one of the corner cases @nalimilan had in mind). The remaining big thing left to do for Cox is to have an estimate of the baseline cumulative hazard, but I guess that should be together with an implementation of Nelson Aalen (which I'd be happy do in a separate PR, after this one is merged). In terms of performance I'm reasonably satisfied with this implementation (approx. 5 times faster than matlab on my colleague's computer on this dataset that I'm testing). The main thing it lacks is multithreading, but I thought that should wait until threading in julia becomes a bit more stable. |
src/optimization.jl
Outdated
@@ -13,5 +13,6 @@ function newton_raphson(fgh!, x::AbstractArray{T}; ρ = one(T)/2, c = 1e-4, tol | |||
end | |||
x .+= search_dir*step_size | |||
end | |||
error("Reached maximum number of iterations") | |||
warn("Reached max iteration number: try increasing `max_iter` or `tol` parameters") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not necessary to warn before throwing an error. The ConvergenceException
text should be reasonably informative.
docs/src/cox.md
Outdated
\lambda(t|X_i) = \lambda_0(t)\exp(X_i\cdot\beta) | ||
``` | ||
|
||
where $\lambda(t|X_i)$ is the estimated hazard for sample $i$, $\lambda_0$ is the baseline hazard, $X_i$ the vector of covariates for sample $i$ and $\beta$ are the coefficients of the regression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documenter uses double backticks, e.g. ``i``
, for inline math rather than TeX-style $i$
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I'm a Documenter newbie: how can I look at the docs locally? I've tried:
MacBook-Pro:docs pietro$ mkdocs serve
INFO - Building documentation...
Config file '/Users/pietro/.julia/v0.6/Survival/docs/mkdocs.yml' does not exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind, I can just julia make.jl
and open the generated index.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, Documenter doesn't use mkdocs
. On the command line, cd
'd into the Survival.jl directory, you can do
julia -e 'include(joinpath("src", "Survival.jl")); include(joinpath("docs", "make.jl"))'
Just make sure you've done Pkg.add("Documenter")
from within Julia first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, you beat me to the explanation. 🙂
That's awesome. Nice work!!
Yeah, agreed. Multithreading is still considered experimental in Julia, though some packages do use it, e.g. GLM. Probably not worth going down that road until it's solid though. Hopefully in Julia 1.0.
Okay thanks, I appreciate your input. Maybe we should just get rid of it then in a future PR. Sometimes convenience is more important than speed. (A good example of that is |
Don't get me wrong: I'm impressed that you got a more efficient version of a
|
I agree completely. Like I said I was never that convinced that it was useful to have |
Please let me know if there is more feedback I need to incorporate. As far as I'm concerned, I'm happy with the current version :) |
The implementation looks great to me. Tests for more corner cases would be good but I'm not really sure what to suggest offhand. Maybe @nalimilan would have some ideas? |
Actually there still is one thing to figure out: the test using lower precision floating points converges with a given accuracy in a given number of iterations on my machine but not on Travis, I should maybe try a larger tolerance to the test. The only other tests that come to mind are:
|
Huh, interesting. Travis runs on x86-64 Linux, dunno what the other specs are for their VMs. What is the machine you're testing that on? |
I was also surprised, this is my versioninfo:
Maybe they have some different libraries (MKL versus openblas or something like that). I guess the only thing to do is lower the accuracy required when fitting with Float32 in the tests to make sure it also passes on Travis. |
This is Travis':
Travis uses the official Julia binaries, so it should be OpenBLAS rather than MKL. Kinda mysterious... I can't imagine that it would be an Ivy Bridge vs. Haswell thing. |
Doesn't sound impossible to me. Maybe we're just unlucky that a small difference affects convergence. Regarding cases to test, I would try to cover a few scenarios, like: model without independent variables, model with a single continuous variable, model with a single categorical variable, model with both. Same for interactions: interaction between continuous variables, interaction between categorical variables, interaction between continuous and categorical. Of course there's no limit to the number of combinations one can try, so you need to stop at some point. :-) |
src/coxph.jl
Outdated
""" | ||
fit(::Type{CoxModel}, M::AbstractMatrix, y::AbstractVector; kwargs...) | ||
|
||
Given a matrix M of predictors and a corresponding vector of events, compute the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
M
src/Survival.jl
Outdated
|
||
abstract type AbstractEstimator end | ||
abstract type NonparametricEstimator <: AbstractEstimator end | ||
|
||
include("eventtimes.jl") | ||
include("kaplanmeier.jl") | ||
include("cox_utils.jl") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ararslan will tell, but why not group all three Cox-related files in a single file? Doesn't sound too big. That will keep the package cleaner when more models will be supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I like the idea of the Cox stuff grouped as a single file.
src/coxph.jl
Outdated
end | ||
|
||
function _cox_fgh!(β, grad, hes, c::CoxAux{T}, compute_derivatives)::T where T | ||
#get relevant quantities to compute negative loglikelihood, gradient and hessian |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more common to put comments about the whole function before it's signature.
Good intuition, there actually is an issue with categorical variables: I'm actually tempted to say that it is a bug in |
Glad to hear that being a pain can sometimes be useful! Actually it's perfectly expected that Anyway the @kleinschmidt Do you have ideas about the best way of supporting this? Maybe we should open an issue in StatsModels.jl. |
I'm happy to give a look in StatsModels and see if there's a way to add an option to avoid the intercept for continuous and categorical variables and try to make that option the default in case of Cox regression via a dependency on StatsModels. A possible design is to add a function in StatsModels:
that determines whether here an intercept should be automatically included or not. Then all we'd need to do would be overload it in Survival with:
But first it'd be good to understand a bit better what's the dependency situation with DataFrames/DataTables/StatModels now and in the immediate future, I'm still a bit confused about it. |
StatsModels used to depend on DataTables, but it's going to switch to DataFrames once it will be ported from DataArrays to Nulls. Anyway, that doesn't really change its principle, which is that packages implementing models do not need to depend on either of these packages, and yet get the formula interface for free.
|
I'm trying to catch up but not clear what the problem with a full rank matrix with
I think there's pretty good consensus about getting rid of the default intercept term, which would make the One possibility is that if you specify contrasts manually it'll use them even if it's non-sensical for a regression analysis (generates an overcomplete or deficient matrix...). Or we could add a flag that would do that. |
No problem, I've just rebased, added tests for categorical and continuous cases (and all three pairwise combinations) and added the correct StatsModels dependency. Should be good to go now. |
Great! I'll let @ararslan do the honours... |
step_size = one(T) | ||
while fgh!(x+search_dir*step_size, grad, hes, false) > y+c*step_size*(grad' * search_dir) | ||
step_size *= ρ | ||
step_size > 1e-10 || error("Linesearch failed! Problematic Hessian or Gradient?") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could get coverage to 100% if we could come up with a test that hits this as well as the ConvergenceException
below. Is that doable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll add a small testset for the Newton algorithm tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@test_throws
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great. Other than that, I think this is good to go. You've done such excellent work here!!
@@ -1,3 +1,5 @@ | |||
julia 0.6 | |||
StatsBase | |||
Distributions | |||
PositiveFactorizations | |||
StatsModels 0.2.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, technically the StatsModels tag should have been 0.3.0 since it introduces a new feature, but whatevs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll have to make a decision about this, but in my book major releases are only needed when breaking backward compatibility. Else, almost all of our releases need to be major ones...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that, per semantic versioning, patch releases (i.e. the z in x.y.z) can only contain fixes which do not introduce new behaviors or change existing behaviors. Minor releases (the y) should be used when introducing new features or when making a backwards-incompatible change only if x = 0. Major releases (the x) should be used for backwards-incompatible changes when x >= 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but before 1.0 we use the patch release as the minor release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I don't actually care that much anyway. 😛
Added tests for Newton's algorithm: should be back to full coverage now. |
I've also added a fix to the docs ( |
Thanks so much for this! |
I've added Cox Regression (from my previous implementation in my own package after some clean up). It took longer than expected but my old code needed quite a bit polishing...
There are a few caveats though:
0+...
for the regression to workVector{EventTime}
andEventTimeVector
, except DataFrames + ModelFrame only work well together withEventTimeVector
. Even if I add someDataFrames.upgrade_vector
method toEventTimeVector
, for some reasonModelFrame
screws it up and only takes the time component of the vector. Also,DataArray
conversion of aEventTimeVector
does not really seem feasible as the data part of aDataArray
is supposed to be anArray
, whichEventTimeVector
is not.Vector{EventTime}
seems to work just fine on the other hand.chaz += d_i / n_i
Maybe it makes more sense that you add this feature as you know better how to integrate it with the rest of the package, otherwise I can give it a shot.
Everything else seems to work well: I get the same results as this paper (check page 6) on a test dataset that I have included in the tests. Performance seems pretty good but I no longer have anything to compare it with as I'm only set up to do Cox Regression in Julia on my machine: if you have some other set up (i.e. R or matlab) I'd be curious to see some benchmarking.
On the more technical side, I'm fitting Cox Regression maximizing the "partial log likelihood" (formulas here) using Efron's method in case of ties. I'm computing value, gradient and hessian and using Newton algorithm with backtracking (implemented in the
optimization.jl
file, it actually is such a simple algorithm that I don't think there's much need to depend on some optimization package and I preferred to simply re-implement it).