Cox Regression #2

piever · 2017-08-08T09:19:18Z

I've added Cox Regression (from my previous implementation in my own package after some clean up). It took longer than expected but my old code needed quite a bit polishing...

There are a few caveats though:

The formula needs to have an explicit 0+... for the regression to work
The code I wrote works both for Vector{EventTime} and EventTimeVector, except DataFrames + ModelFrame only work well together with EventTimeVector. Even if I add some DataFrames.upgrade_vector method to EventTimeVector, for some reason ModelFrame screws it up and only takes the time component of the vector. Also, DataArray conversion of a EventTimeVector does not really seem feasible as the data part of a DataArray is supposed to be an Array, which EventTimeVector is not. Vector{EventTime} seems to work just fine on the other hand.
I'm not sure how to set up my fork to build the docs locally and check that everything is fine when I change/add things, how would that work?
In my package I have the Nelson-Aalen estimator for the cumulative hazard, which can be easily generalized to get an estimate of baseline cumulative hazard for a fitted Cox Model. It really is the same as Kaplan Meier except this line would be replaced by something like:
chaz += d_i / n_i
Maybe it makes more sense that you add this feature as you know better how to integrate it with the rest of the package, otherwise I can give it a shot.

Everything else seems to work well: I get the same results as this paper (check page 6) on a test dataset that I have included in the tests. Performance seems pretty good but I no longer have anything to compare it with as I'm only set up to do Cox Regression in Julia on my machine: if you have some other set up (i.e. R or matlab) I'd be curious to see some benchmarking.

On the more technical side, I'm fitting Cox Regression maximizing the "partial log likelihood" (formulas here) using Efron's method in case of ties. I'm computing value, gradient and hessian and using Newton algorithm with backtracking (implemented in the optimization.jl file, it actually is such a simple algorithm that I don't think there's much need to depend on some optimization package and I preferred to simply re-implement it).

piever · 2017-08-08T11:34:22Z

Here's a code snippet to try it out:

using Survival, DataFrames
filepath = joinpath(Pkg.dir("Survival", "test"), "rossi.csv")
rossi = readtable(filepath)
rossi[:event] = EventTime.(rossi[:week],rossi[:arrest] .== 1)
outcome = coxph(@formula(event ~ 0+ fin+age+race+wexp+mar+paro+prio), rossi; tol = 1e-8)

ararslan · 2017-08-08T18:10:24Z

I'M JUST SO HAPPY. This is awesome, @piever, thank you! The 0.7 CI failures are due to some macro changes in Base that have been breaking everything.

I'll do a more thorough review of this soon. Thanks again!

ararslan · 2017-08-08T18:18:08Z

Maybe it makes more sense that you add this feature as you know better how to integrate it with the rest of the package

I've actually been meaning to generalize the internal Kaplan-Meier function to update a state object that can be specialized for different estimators. I realized when I went to implement cumulative incidence with competing risks that it's a trivial modification of that function. I hadn't looked into Nelson-Aalen yet but if it's in a similar boat then that's all the more reason for a generalization there. 🙂

ararslan · 2017-08-08T18:24:05Z

using Efron's method in case of ties.

👍 I had started working on Cox regression a while back and I had only done Breslow ties because I'm lazy. I do love that Norm Brewslow himself has said not to use that estimator. 😂 It's just so easy to implement though!

I don't think there's much need to depend on some optimization package and I preferred to simply re-implement it

💯

The formula needs to have an explicit 0+... for the regression to work

I wonder if we could inspect the Formula object and remove an intercept if it exists.

As an aside, I'd be interested to hear your opinion of EventTimeVector in general. I'm not super convinced that it's all that convenient or useful and a while back I considered just ditching it altogether.

nalimilan

Looks great! The details of the algorithm would take me far too much time to understand, but I have a few more basic comments.

nalimilan · 2017-08-08T20:28:04Z

test/runtests.jl

@@ -199,3 +201,23 @@ end
    @test_throws DimensionMismatch fit(KaplanMeier, [1, 2], [true])
    @test_throws ArgumentError fit(KaplanMeier, Float64[], Bool[])
 end
+
+@testset "Cox" begin


I think this deserves more tests, with a few variants of the model covering all common scenarios and corner cases.

More tests never hurt 🙂

Agreed! I've just noticed that, as ModelFrame always gives you matrices of Float64 that's the only thing I've been testing, so I should probably make sure that other types behave well. What else do you have in mind?

nalimilan · 2017-08-08T20:30:29Z

test/runtests.jl

+
+    @test nobs(outcome) == size(rossi, 1)
+    @test dof(outcome) == 7
+    @test loglikelihood(outcome) > nullloglikelihood(outcome)


I'd just hardcode the values in the file.

nalimilan · 2017-08-08T20:34:01Z

src/optimization.jl

@@ -0,0 +1,17 @@
+function newton_raphson(fgh!, x::AbstractArray{T}; ρ = 0.5, c = 1e-4, tol = 1e-4, max_iter = 1000) where T
+    grad= zeros(T, length(x))


Space before = here and below.

I've also noted a few missing spaces after commas here and there.

nalimilan · 2017-08-08T20:34:33Z

src/optimization.jl

+        end
+        x = x + search_dir*step_size
+    end
+    error("Reached maximum number of iterations")


Could use StatsBase.ConvergenceException.

nalimilan · 2017-08-08T20:35:33Z

src/coxph.jl

+function _coxph(X::AbstractArray{T}, s::AbstractVector; l2_cost = zero(T), kwargs...) where T
+    c = CoxAux(X, s, l2_cost)
+
+    fgh! = (β,grad,hes, compute_derivatives) ->


Can't you just pass _cox_fgh! to newton_raphson?

_cox_fgh! has one extra argument (the CoxAux type with all the model info). As I plan to use newton_raphson for other fits related to survival analysis, I prefer to keep it as a generic minimization routine.

nalimilan · 2017-08-08T20:36:39Z

src/optimization.jl

+            step_size *= ρ
+            step_size > 1e-10 || error("Linesearch failed! Problematic Hessian or Gradient?")
+        end
+        x = x + search_dir*step_size


Wouldn't it make sense to update x in place?

nalimilan · 2017-08-08T20:40:44Z

src/coxph.jl

+            end
+        end
+    end
+    y += λ*(β' * β)


Both the right- and left-hand sides of += are scalars in this case though; note that y is defined to be zero(T) and β'β will be a scalar since it's a RowVector times a Vector.

I wonder how β'β compares in terms of performance with dot(β, β) and sum(x->x^2, β).

I've done some testing, and the two first forms are essentially of equivalent speed. The sum approach is quite slower though, which is quite surprising (maybe worth filing a bug in Base?).

At least on 0.6, I see the opposite:

julia> x = rand(10_000); julia> @benchmark $x'*$x BenchmarkTools.Trial: memory estimate: 0 bytes allocs estimate: 0 -------------- minimum time: 6.333 μs (0.00% GC) median time: 7.041 μs (0.00% GC) mean time: 7.718 μs (0.00% GC) maximum time: 39.177 μs (0.00% GC) -------------- samples: 10000 evals/sample: 5 julia> @benchmark dot($x, $x) BenchmarkTools.Trial: memory estimate: 0 bytes allocs estimate: 0 -------------- minimum time: 6.545 μs (0.00% GC) median time: 6.647 μs (0.00% GC) mean time: 6.931 μs (0.00% GC) maximum time: 20.233 μs (0.00% GC) -------------- samples: 10000 evals/sample: 5 julia> @benchmark sum(i->i^2, $x) BenchmarkTools.Trial: memory estimate: 0 bytes allocs estimate: 0 -------------- minimum time: 2.808 μs (0.00% GC) median time: 2.831 μs (0.00% GC) mean time: 2.912 μs (0.00% GC) maximum time: 10.364 μs (0.00% GC) -------------- samples: 10000 evals/sample: 9

Not surprising that the x'x form is equivalent in speed to dot(x, x), since @code_lowered shows that the former is calling dot. TIL. 🙂

I tried to benchmark on master too but BenchmarkTools seems completely broken on master.

nalimilan · 2017-08-08T20:41:00Z

src/coxph.jl

+
+    if compute_derivatives
+        for k1 in 1:length(β)
+            grad[k1] +=  2*λ*β[k1]


Double space.

nalimilan · 2017-08-08T20:42:04Z

src/cox_utils.jl

+end
+
+
+# fs = index first deaths, ls = index last deaths,


Would be nice to use a more neutral term than "death". :-) Maybe "time-to-event" or just "time"?

nalimilan · 2017-08-08T20:42:36Z

src/cox_utils.jl

@@ -0,0 +1,78 @@
+# Compute first and last
+firsts(s) = [s[t].status && (t==1 || s[t] > s[t-1]) for t = 1:length(s)]


Looks like you could create generators rather than arrays here, that would avoid a few allocations (not sure it matters).

nalimilan · 2017-08-08T20:50:11Z

The upgrade_vector issue should soon go away when we backport DataTables improvements to DataFrames: the constructor will now just use whatever column type you passed without trying to "upgrade" it. OTOH supporting missing values with EventTimeVector is going to require a custom implementation. Maybe better leave this as an optimization for the compiler, which could decide to store Vector{EventTime} as two separate vectors, one for each of the fields (which should save some memory due to alignment constraints).

ararslan · 2017-08-09T00:04:58Z

src/cox_utils.jl

+end
+
+function CoxSum(values::Array{T,N}, fs, ls) where {T, N}
+    sums = zeros(T, size(values,1)+1, Base.tail(size(values))...)


Probably just micro-optimizing here, but it might be a little more efficient to do it like

function CoxSum(values::Array{T,N}, fs, ls) where {T,N} tailsize = Base.tail(size(values)) sums = zeros(T, size(values, 1) + 1, tailsize...) chunks = zeros(T, length(fs), tailsize...) CoxSum{T,N}(values, sums, chunks, copy(chunks)) end

ararslan · 2017-08-09T00:14:20Z

test/runtests.jl

+    outcome_coefmat = coeftable(outcome)
+
+    filepath_coefs = joinpath(Pkg.dir("Survival", "test"), "expected_coefmat.jld")
+    expected_coefmat = JLD.load(filepath_coefs, "expected_coefmat")


Is there any way we could just test against hard-coded values here rather than adding a test dependency on JLD? JLD has a tendency to break easily on Julia nightly since it relies a lot on internal details.

In GLM.jl we just hardcode the coefficients values as inline arrays.

Good point, also because I keep getting annoying warnings from JLD, it's probably just better to type in the values in the julia file.

ararslan · 2017-08-09T00:17:30Z

REQUIRE

@@ -1,3 +1,4 @@
 julia 0.6.0-pre
 StatsBase
 Distributions
+PositiveFactorizations


I'm not familiar with this package. What does it provide that's used here?

In order for Newton algorithm to work far from a local minimum, when the hessian may not be positive defined, it's good to have an approximation of the inverse hessian that is positive defined, to ensure that the value of the objective function goes down and PositiveFactorizations is a small package that does just that: I use it here. Just using the inverse of the hessian would probably still work but in a less reliable way. Optim also has it as a dependency for the same reason I guess.

Okay great, thanks for the explanation.

piever · 2017-08-09T13:40:11Z

Thanks for all the feedback!

I've actually been meaning to generalize the internal Kaplan-Meier function to update a state object that can be specialized for different estimators. I realized when I went to implement cumulative incidence with competing risks that it's a trivial modification of that function. I hadn't looked into Nelson-Aalen yet but if it's in a similar boat then that's all the more reason for a generalization there.

Nelson-Aalen gives you the cumulative hazard (i.e. -log(survival)) and it has the advantage that generalizing it to a Cox model fit is completely trivial. Some people actually estimate the survival function by exponentiating the Nelson-Aalen estimator. Instead, generalizing Kaplan-Meier to a Cox model fit is slightly more convoluted, especially in case of ties.

As an aside, I'd be interested to hear your opinion of EventTimeVector in general. I'm not super convinced that it's all that convenient or useful and a while back I considered just ditching it altogether.

I'm not very fond of the EventTimeVector to be entirely honest. Creating a Vector{EventTime} is very easy with the dot broadcast syntax, plus it integrates nicely with DataFrames and DataArrays and it already works in case of missing data.

piever · 2017-08-09T16:35:21Z

I should have addressed most remarks. I've improved the type management, so that now it should be able to accept matrices of any type without giving unexpected method errors (which I guess were one of the corner cases @nalimilan had in mind).

The remaining big thing left to do for Cox is to have an estimate of the baseline cumulative hazard, but I guess that should be together with an implementation of Nelson Aalen (which I'd be happy do in a separate PR, after this one is merged).

In terms of performance I'm reasonably satisfied with this implementation (approx. 5 times faster than matlab on my colleague's computer on this dataset that I'm testing). The main thing it lacks is multithreading, but I thought that should wait until threading in julia becomes a bit more stable.

ararslan · 2017-08-09T18:49:28Z

src/optimization.jl

@@ -13,5 +13,6 @@ function newton_raphson(fgh!, x::AbstractArray{T}; ρ = one(T)/2, c = 1e-4, tol
        end
        x .+= search_dir*step_size
    end
-    error("Reached maximum number of iterations")
+    warn("Reached max iteration number: try increasing `max_iter` or `tol` parameters")


Probably not necessary to warn before throwing an error. The ConvergenceException text should be reasonably informative.

ararslan · 2017-08-09T18:52:46Z

docs/src/cox.md

+\lambda(t|X_i) = \lambda_0(t)\exp(X_i\cdot\beta)
+```
+
+where $\lambda(t|X_i)$ is the estimated hazard for sample $i$, $\lambda_0$ is the baseline hazard, $X_i$ the vector of covariates for sample $i$ and $\beta$ are the coefficients of the regression.


Documenter uses double backticks, e.g. ``i``, for inline math rather than TeX-style $i$ .

I see, I'm a Documenter newbie: how can I look at the docs locally? I've tried:

MacBook-Pro:docs pietro$ mkdocs serve INFO - Building documentation... Config file '/Users/pietro/.julia/v0.6/Survival/docs/mkdocs.yml' does not exist.

Never mind, I can just julia make.jl and open the generated index.html

Oh, Documenter doesn't use mkdocs. On the command line, cd'd into the Survival.jl directory, you can do

julia -e 'include(joinpath("src", "Survival.jl")); include(joinpath("docs", "make.jl"))'

Just make sure you've done Pkg.add("Documenter") from within Julia first.

Heh, you beat me to the explanation. 🙂

ararslan · 2017-08-09T19:15:43Z

In terms of performance I'm reasonably satisfied with this implementation (approx. 5 times faster than matlab on my colleague's computer on this dataset that I'm testing)

That's awesome. Nice work!!

The main thing it lacks is multithreading, but I thought that should wait until threading in julia becomes a bit more stable.

Yeah, agreed. Multithreading is still considered experimental in Julia, though some packages do use it, e.g. GLM. Probably not worth going down that road until it's solid though. Hopefully in Julia 1.0.

I'm not very fond of the EventTimeVector to be entirely honest.

Okay thanks, I appreciate your input. Maybe we should just get rid of it then in a future PR. Sometimes convenience is more important than speed. (A good example of that is DataArrays, which are convenient but slow, versus NullableArrays, which are fast but very inconvenient.)

piever · 2017-08-09T19:34:37Z

Okay thanks, I appreciate your input. Maybe we should just get rid of it then in a future PR. Sometimes convenience is more important than speed. (A good example of that is DataArrays, which are convenient but slow, versus NullableArrays, which are fast but very inconvenient.)

Don't get me wrong: I'm impressed that you got a more efficient version of a Vector{EventTime} to behave just like Vector{EventTime} to an extent that, even though I had designed Cox for Vector{EventTime}, when I fed it EventTimeVector it still worked. It's mainly that I believe that a lot of users could be confused by having the two things. Also, having it interact with the rest of the Julia data ecosystem could be tricky:

missing data would have to be reimplemented
I'm not sure how it would work to write/read the EventTimeVector to CSV with the existing tools (i.e. CSV or TextParse or DataFrames.readtable). Instead adding simple parser for EventTime would allow to write it and read it easily just by specifying the type on the csv reader

ararslan · 2017-08-09T19:49:03Z

I agree completely. Like I said I was never that convinced that it was useful to have EventTimeVector. (Plus I never did benchmarks, so who knows whether it was actually more efficient than Vector{EventTime} 😛)

piever · 2017-08-14T16:39:53Z

Please let me know if there is more feedback I need to incorporate. As far as I'm concerned, I'm happy with the current version :)

ararslan · 2017-08-14T18:37:23Z

The implementation looks great to me. Tests for more corner cases would be good but I'm not really sure what to suggest offhand. Maybe @nalimilan would have some ideas?

piever · 2017-08-17T08:11:23Z

Actually there still is one thing to figure out: the test using lower precision floating points converges with a given accuracy in a given number of iterations on my machine but not on Travis, I should maybe try a larger tolerance to the test. The only other tests that come to mind are:

using interaction terms
using a nonzero L2 cost

ararslan · 2017-08-17T16:49:23Z

Actually there still is one thing to figure out: the test using lower precision floating points converges with a given accuracy in a given number of iterations on my machine but not on Travis

Huh, interesting. Travis runs on x86-64 Linux, dunno what the other specs are for their VMs. What is the machine you're testing that on?

piever · 2017-08-17T19:56:43Z

I was also surprised, this is my versioninfo:

julia> versioninfo()
Julia Version 0.6.0
Commit 903644385b (2017-06-19 13:05 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin13.4.0)
  CPU: Intel(R) Core(TM) i5-4278U CPU @ 2.60GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

Maybe they have some different libraries (MKL versus openblas or something like that). I guess the only thing to do is lower the accuracy required when fitting with Float32 in the tests to make sure it also passes on Travis.

ararslan · 2017-08-17T20:13:32Z

This is Travis':

$ julia -e 'versioninfo()'
Julia Version 0.6.0
Commit 9036443 (2017-06-19 13:05 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, ivybridge)

Travis uses the official Julia binaries, so it should be OpenBLAS rather than MKL. Kinda mysterious... I can't imagine that it would be an Ivy Bridge vs. Haswell thing.

nalimilan · 2017-08-29T14:11:01Z

I can't imagine that it would be an Ivy Bridge vs. Haswell thing.

Doesn't sound impossible to me. Maybe we're just unlucky that a small difference affects convergence. Float32 is a fragile type, isn't it? What's funny it that it works fine here on my skylake-avx512 under Linux... Maybe remove one or two variables from the formula?

Regarding cases to test, I would try to cover a few scenarios, like: model without independent variables, model with a single continuous variable, model with a single categorical variable, model with both. Same for interactions: interaction between continuous variables, interaction between categorical variables, interaction between continuous and categorical. Of course there's no limit to the number of combinations one can try, so you need to stop at some point. :-)

nalimilan · 2017-08-29T14:12:58Z

src/coxph.jl

+"""
+    fit(::Type{CoxModel}, M::AbstractMatrix, y::AbstractVector; kwargs...)
+
+Given a matrix M of predictors and a corresponding vector of events, compute the


nalimilan · 2017-08-29T14:15:00Z

src/Survival.jl


 abstract type AbstractEstimator end
 abstract type NonparametricEstimator <: AbstractEstimator end

 include("eventtimes.jl")
 include("kaplanmeier.jl")
+include("cox_utils.jl")


@ararslan will tell, but why not group all three Cox-related files in a single file? Doesn't sound too big. That will keep the package cleaner when more models will be supported.

Yeah, I like the idea of the Cox stuff grouped as a single file.

nalimilan · 2017-08-29T14:16:27Z

src/coxph.jl

+end
+
+function _cox_fgh!(β, grad, hes, c::CoxAux{T}, compute_derivatives)::T where T
+    #get relevant quantities to compute negative loglikelihood, gradient and hessian


It's more common to put comments about the whole function before it's signature.

piever · 2017-08-31T17:08:15Z

Regarding cases to test, I would try to cover a few scenarios, like: model without independent variables, model with a single continuous variable, model with a single categorical variable, model with both. Same for interactions: interaction between continuous variables, interaction between categorical variables, interaction between continuous and categorical. Of course there's no limit to the number of combinations one can try, so you need to stop at some point. :-)

Good intuition, there actually is an issue with categorical variables:
when I do @formula(y ~ 0 + x) and x is categorical, I get one term per value, which is actually overcomplete. Do you know a clean solution?

I'm actually tempted to say that it is a bug in @formula, because even when it is explicit requested to not have an intercept, as a matter of fact it puts back one in case of PooledDataArrays

nalimilan · 2017-08-31T20:56:17Z

Glad to hear that being a pain can sometimes be useful! Actually it's perfectly expected that 0 + x gives a full rank matrix when x is categorical. This is the correct specification e.g. for a linear regression model without a separate intercept (but where effects are included in the dummies).

Anyway the 0 + hack was not ideal for the Cox models, so this seems to indicate we need a deeper solution. Maybe there should be an argument when building the model matrix to say that the model family does not allow estimating the intercept/baseline hazard.

@kleinschmidt Do you have ideas about the best way of supporting this? Maybe we should open an issue in StatsModels.jl.

piever · 2017-09-02T16:03:01Z

I'm happy to give a look in StatsModels and see if there's a way to add an option to avoid the intercept for continuous and categorical variables and try to make that option the default in case of Cox regression via a dependency on StatsModels. A possible design is to add a function in StatsModels:

add_intercept(T) = false

that determines whether here an intercept should be automatically included or not. Then all we'd need to do would be overload it in Survival with:

StatsModels.add_intercept(T::Type{CoxModel}) = true

But first it'd be good to understand a bit better what's the dependency situation with DataFrames/DataTables/StatModels now and in the immediate future, I'm still a bit confused about it.

nalimilan · 2017-09-02T16:33:51Z

But first it'd be good to understand a bit better what's the dependency situation with DataFrames/DataTables/StatModels now and in the immediate future, I'm still a bit confused about it.

StatsModels used to depend on DataTables, but it's going to switch to DataFrames once it will be ported from DataArrays to Nulls. Anyway, that doesn't really change its principle, which is that packages implementing models do not need to depend on either of these packages, and yet get the formula interface for free.

StatsModels.add_intercept is an interesting approach, but that would force packages to depend on StatsModels, and therefore on DataFrames/DataTables. We could define the function in StatsBase instead to avoid this. Better file an issue against StatsModels with this suggestion to discuss it.

kleinschmidt · 2017-09-02T16:41:43Z

I'm trying to catch up but not clear what the problem with a full rank matrix with ~0+x is. Perhaps it would be best to open an issue over at JuliaStats/StatsModels.jl, and explain what the desired behavior is? The current behavior is this;

With ~1+... or ~..., an intercept column is generated and default (dummy coded) contrasts are used for categorical variables, where n-1 indicator columns are generated for n levels)
With ~0+..., no intercept column is generated, and the first categorical variable in a main effect is "promoted" to full-rank dummy coding (n indicator columns for n levels) in order to avoid a rank-deficient model matrix.

I think there's pretty good consensus about getting rid of the default intercept term, which would make the ~x equivalent to ~0+x instead of ~1+x, and get rid of the 0+x altogether. I want to make that change in the first proper release of StatsModels since it's a pretty breaking change...

One possibility is that if you specify contrasts manually it'll use them even if it's non-sensical for a regression analysis (generates an overcomplete or deficient matrix...). Or we could add a flag that would do that.

piever · 2017-12-15T13:00:02Z

No problem, I've just rebased, added tests for categorical and continuous cases (and all three pairwise combinations) and added the correct StatsModels dependency. Should be good to go now.

nalimilan · 2017-12-15T13:08:02Z

Great! I'll let @ararslan do the honours...

ararslan · 2017-12-15T19:43:03Z

src/optimization.jl

+        step_size = one(T)
+        while fgh!(x+search_dir*step_size, grad, hes, false) > y+c*step_size*(grad' * search_dir)
+            step_size *= ρ
+            step_size > 1e-10 || error("Linesearch failed! Problematic Hessian or Gradient?")


We could get coverage to 100% if we could come up with a test that hits this as well as the ConvergenceException below. Is that doable?

Sure, I'll add a small testset for the Newton algorithm tomorrow.

@test_throws

Sounds great. Other than that, I think this is good to go. You've done such excellent work here!!

ararslan · 2017-12-15T19:46:04Z

REQUIRE

@@ -1,3 +1,5 @@
 julia 0.6
 StatsBase
 Distributions
+PositiveFactorizations
+StatsModels 0.2.1


Heh, technically the StatsModels tag should have been 0.3.0 since it introduces a new feature, but whatevs.

We'll have to make a decision about this, but in my book major releases are only needed when breaking backward compatibility. Else, almost all of our releases need to be major ones...

My understanding is that, per semantic versioning, patch releases (i.e. the z in x.y.z) can only contain fixes which do not introduce new behaviors or change existing behaviors. Minor releases (the y) should be used when introducing new features or when making a backwards-incompatible change only if x = 0. Major releases (the x) should be used for backwards-incompatible changes when x >= 1.

Yeah, but before 1.0 we use the patch release as the minor release.

Okay. I don't actually care that much anyway. 😛

piever · 2017-12-16T12:01:17Z

Added tests for Newton's algorithm: should be back to full coverage now.

piever · 2017-12-16T16:14:00Z

I've also added a fix to the docs (fit needed a type signature as both KaplanMeier and Cox are using it in the API). Cox could potentially have more extensive documentation, but I'd rather have this merged first (before it gets "conflicts" again) and add more docs later, especially because there's still a fair amount of stuff to port from AcceleratedFailure.

ararslan · 2017-12-16T19:37:32Z

Thanks so much for this!

nalimilan reviewed Aug 8, 2017

View reviewed changes

ararslan reviewed Aug 9, 2017

View reviewed changes

piever changed the title ~~RFC: Cox Regression~~ Cox Regression Aug 14, 2017

nalimilan reviewed Aug 29, 2017

View reviewed changes

Pietro Vertechi added 19 commits December 15, 2017 12:44

made optim functions non Float64 specific

989a8b8

optimized derivative computation

2bf09ce

wip reorg

a4a69c3

IMPORTANT: corrected bug, now result is correct

bab1ea1

Added some docs, useful functions and tests

d18a6e8

corrected test require

0ebbe13

more docs

d4100f5

added vcov field to CoxModel with test

8211d9e

incorporate feedback

8123307

Added Float32 and Int64 test

8ddc3a9

Clearer error message in case of failed convergence

d01d1d4

Fixed docs

01dd699

joined cox files, increased test tol for Float32

eea8ef7

fix for dataframes 0.11

f777952

remove intercept

6bba9ae

updated tests

7e5f737

updated to latest statsmodels

7aa0182

renamed to drop_intercept

5b288b4

correct statsmodels dependency

523d13c

ararslan reviewed Dec 15, 2017

View reviewed changes

newton tests

18d25a4

fix docs

2986443

ararslan merged commit 3cbcaf0 into JuliaStats:master Dec 16, 2017

piever deleted the cox branch December 19, 2017 01:39

		@@ -0,0 +1,17 @@
		function newton_raphson(fgh!, x::AbstractArray{T}; ρ = 0.5, c = 1e-4, tol = 1e-4, max_iter = 1000) where T
		grad= zeros(T, length(x))

		@@ -0,0 +1,78 @@
		# Compute first and last
		firsts(s) = [s[t].status && (t==1 \|\| s[t] > s[t-1]) for t = 1:length(s)]

Cox Regression #2

Cox Regression #2

Conversation

piever commented Aug 8, 2017 • edited Loading

piever commented Aug 8, 2017

ararslan commented Aug 8, 2017

ararslan commented Aug 8, 2017

ararslan commented Aug 8, 2017

nalimilan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nalimilan commented Aug 8, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piever commented Aug 9, 2017

piever commented Aug 9, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piever Aug 9, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ararslan commented Aug 9, 2017

piever commented Aug 9, 2017

ararslan commented Aug 9, 2017

piever commented Aug 14, 2017

ararslan commented Aug 14, 2017

piever commented Aug 17, 2017

ararslan commented Aug 17, 2017

piever commented Aug 17, 2017

ararslan commented Aug 17, 2017

nalimilan commented Aug 29, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piever commented Aug 31, 2017 • edited Loading

nalimilan commented Aug 31, 2017

piever commented Sep 2, 2017

nalimilan commented Sep 2, 2017

kleinschmidt commented Sep 2, 2017 • edited Loading

piever commented Dec 15, 2017

nalimilan commented Dec 15, 2017

Choose a reason for hiding this comment

piever Dec 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ararslan Dec 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piever commented Dec 16, 2017

piever commented Dec 16, 2017

ararslan commented Dec 16, 2017

piever commented Aug 8, 2017 •

edited

Loading

piever Aug 9, 2017 •

edited

Loading

piever commented Aug 31, 2017 •

edited

Loading

kleinschmidt commented Sep 2, 2017 •

edited

Loading

piever Dec 15, 2017 •

edited

Loading

ararslan Dec 15, 2017 •

edited

Loading