Skip to content

Support for R Dates and POSIXct but no support for handling timezone support #35

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Mar 27, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
500aa93
initial support for rds files
jsams Nov 2, 2017
d9dc2ec
add support for keyword arguments (can at least manually handle list …
jsams Nov 2, 2017
97d47a7
added tests and test data
jsams Nov 21, 2017
78cae2c
update comments in generate_rda
jsams Nov 21, 2017
4a200ff
Merge branch 'master' into rds
jsams Nov 21, 2017
de6e187
update readRDS to use new CodecZlib library
jsams Nov 21, 2017
4c2edd4
update RDS tests to use testsets
jsams Nov 21, 2017
b676c46
remove todo comments
jsams Nov 21, 2017
d0fbcdc
factor out decompress function
jsams Nov 22, 2017
363d47c
minimize testing of rds files
jsams Nov 22, 2017
0c67a70
replace readRDS with load interface
jsams Nov 22, 2017
8d8c1b7
remove readRDS from export list
jsams Nov 22, 2017
0b6d9bf
add tests for convert=true with load of rds files
jsams Nov 22, 2017
74713ef
mention rds support in news
jsams Nov 22, 2017
4547834
add test for isa DataFrame for rds files
jsams Nov 23, 2017
8057bce
support for R Dates and POSIXct, excluding timezone
jsams Nov 24, 2017
0ceae8e
support for NA dates and datetimes
jsams Nov 24, 2017
4c032ba
use constants for referring to R's date and datetime classes
jsams Nov 25, 2017
cb877c3
use TimeZones to support R's POSIXct
jsams Nov 27, 2017
7fafd0a
merge from master and update to using 'missing'
jsams Mar 15, 2018
e35dc7e
Bring in line with requests on PR #35
jsams Mar 16, 2018
342dfa5
move jlvec date/time functions to be with others
jsams Mar 16, 2018
c1bd8db
more reliable lookup of timezone
jsams Mar 16, 2018
f009ad4
more refactoring of r2juliatz, added back a deleted comment
jsams Mar 16, 2018
982520d
update news
alyst Mar 27, 2018
a8da9b3
update conversion table
alyst Mar 27, 2018
676cc88
refactor timezone handling
alyst Mar 27, 2018
e86ee6d
mention this PR in the news
alyst Mar 27, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,12 @@

##### Changes
* add support for `.rds` files (single object data files from R) [#22], [#33]
* add support for `Date` and `POSIXct` (only for timezone codes supported by [TimeZones](https://github.com/JuliaTime/TimeZones.jl)) data [#34], [#35]

[#22]: https://github.com/JuliaStats/RData.jl/issues/22
[#33]: https://github.com/JuliaStats/RData.jl/issues/33
[#34]: https://github.com/JuliaStats/RData.jl/issues/34
[#35]: https://github.com/JuliaStats/RData.jl/issues/35

## RData v0.3.0 Release Notes

Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ convert R objects into Julia equivalents:
| named vector, list | `DictoVec` | `DictoVec` allows indexing both by element index and by its name, just as R vectors and lists |
| vector | `Vector{T}` | `T` is the appropriate Julia type. If R vector contains `NA` values, they are converted to [`missing`](https://github.com/JuliaData/Missings.jl), and the elements type of the resulting `Vector` is `Union{T, Missing}`.
| factor | `CategoricalArray` | [CategoricalArrays.jl](https://github.com/JuliaData/CategoricalArrays.jl) |
| `Date` | `Dates.Date` | |
| `POSIXct` date time | `ZonedDateTime` | [TimeZones.jl](https://github.com/JuliaTime/TimeZones.jl) |
| data frame | `DataFrame` | [DataFrames.jl](https://github.com/JuliaData/DataFrames.jl) |

If conversion to the Julia type is not supported (e.g. R closure or language expression), `load()` will return the internal RData representation of the object (`RSEXPREC` subtype).
1 change: 1 addition & 0 deletions REQUIRE
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ Missings 0.2
CategoricalArrays 0.3
FileIO 0.1.2
CodecZlib 0.4
TimeZones
2 changes: 1 addition & 1 deletion src/RData.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ __precompile__()

module RData

using DataFrames, CategoricalArrays, Missings, CodecZlib, FileIO
using DataFrames, CategoricalArrays, Missings, CodecZlib, FileIO, TimeZones
import DataFrames: identifier
import FileIO: load

Expand Down
4 changes: 4 additions & 0 deletions src/config.jl
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,7 @@ const Hash = Dict{RString, Any}

const emptyhash = Hash()
const emptyhashkey = RString("\0")

const R_Date_Class = ["Date"]
const R_POSIXct_Class = ["POSIXct", "POSIXt"]

74 changes: 73 additions & 1 deletion src/convert.jl
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# converters from selected RSEXPREC to Hash
# They are used to translate SEXPREC attributes into Hash

import TimeZones: istimezone, unix2zdt, ZonedDateTime

function Base.convert(::Type{Hash}, pl::RPairList)
res = Hash()
for i in eachindex(pl.items)
Expand Down Expand Up @@ -51,7 +53,16 @@ function jlvec(::Type{T}, rv::RNullableVector{R}, force_missing::Bool=true) wher
end

# convert R vector into Vector of appropriate type
jlvec(rv::RVEC, force_missing::Bool=true) = jlvec(eltype(rv.data), rv, force_missing)
function jlvec(rv::RVEC, force_missing::Bool=true)
cls = class(rv)
if cls == R_Date_Class
return jlvec(Dates.Date, rv, force_missing)
elseif cls == R_POSIXct_Class
return jlvec(ZonedDateTime, rv, force_missing)
else
return jlvec(eltype(rv.data), rv, force_missing)
end
end

# convert R logical vector (uses Int32 to store values) into Vector{Bool[?]}
function jlvec(rl::RLogicalVector, force_missing::Bool=true)
Expand Down Expand Up @@ -89,6 +100,33 @@ function jlvec(ri::RIntegerVector, force_missing::Bool=true)
end
end

# convert R Date to Dates.Date
function jlvec(::Type{Dates.Date}, rv::RVEC, force_missing::Bool=true)
@assert class(rv) == R_Date_Class
nas = isnan.(rv.data)
if force_missing || any(nas)
dates = Union{Dates.Date, Missing}[isna ? missing : rdays2date(dtfloat)
for (isna, dtfloat) in zip(nas, rv.data)]
else
dates = rdays2date.(rv.data)
end
return dates
end

# convert R POSIXct to ZonedDateTime
function jlvec(::Type{ZonedDateTime}, rv::RVEC, force_missing::Bool=true)
@assert class(rv) == R_POSIXct_Class
tz, validtz = getjuliatz(rv)
nas = isnan.(rv.data)
if force_missing || any(nas)
datetimes = Union{ZonedDateTime, Missing}[isna ? missing : unix2zdt(dtfloat, tz=tz)
for (isna, dtfloat) in zip(nas, rv.data)]
else
datetimes = unix2zdt.(rv.data, tz=tz)
end
return datetimes
end

function sexp2julia(rex::RSEXPREC)
warn("Conversion of $(typeof(rex)) to Julia is not implemented")
return nothing
Expand Down Expand Up @@ -128,3 +166,37 @@ function sexp2julia(rl::RList)
map(sexp2julia, rl.data)
end
end

function rdays2date(days::Real)
const epoch_conv = 719528 # Dates.date2epochdays(Date("1970-01-01"))
Dates.epochdays2date(days + epoch_conv)
end

# gets R timezone from the data attribute and converts it to TimeZones.TimeZone
# see r2juliatz()
function getjuliatz(rv::RVEC, deftz=tz"UTC")
tzattr = getattr(rv, "tzone", [""])[1]
if tzattr == ""
return deftz, true # R will store a blank for tzone
else
return r2juliatz(tzattr, deftz)
end
end

# converts R timezone code to TimeZones.TimeZone
# returns a tuple:
# - timezone (or `deftz` if `rtz` is not recognized as a valid time zone)
# - boolean flag: true if `rtz` is not recognized, false otherwise
function r2juliatz(rtz::AbstractString, deftz=tz"UTC")
valid = istimezone(rtz)
if !valid
warn("Could not determine the timezone of '$(rtz)', treating as $deftz.")
return deftz, false
else
return TimeZone(rtz), true
end
end

function unix2zdt(seconds::Real; tz::TimeZone=tz"UTC")
ZonedDateTime(Dates.unix2datetime(seconds), tz, from_utc=true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if tz != tz"UTC", is from_utc still correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The tests ensure that the time and timezone is preserved for a non-UTC timezone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method definition is a duplicate of that imported from TimeZones

end
66 changes: 65 additions & 1 deletion test/RDS.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ module TestRDS
using Base.Test
using DataFrames
using RData
using TimeZones

testdir = dirname(@__FILE__)

Expand Down Expand Up @@ -42,5 +43,68 @@ module TestRDS
@test eltypes(rdf_decomp) == eltypes(df)
@test isequal(rdf_decomp, df)
end
end

@testset "Test Date conversion" begin
dates = load("$testdir/data/dates.rds")
@test dates[1] == Date("2017-01-01") + Dates.Day.(1:4)
@test dates[2] == Date("2017-01-02")
@test dates[3] isa DictoVec
@test dates[3].data == Date("2017-01-01") + Dates.Day.(1:4)
@test [dates[3].index2name[i] for i in 1:length(dates[3])] == ["A", "B", "C", "D"]
@test dates[4] isa DictoVec
@test dates[4].data == [Date("2017-01-02")]
@test dates[4].index2name[1] == "A"
end

@testset "Test DateTime conversion" begin
datetimes = load("$testdir/data/datetimes.rds")
testdts = ZonedDateTime.(DateTime("2017-01-01T13:23") + Dates.Second.(1:4),
TimeZone("UTC"))
@test datetimes[1] == testdts
@test datetimes[2] == testdts[1]
@test datetimes[3] isa DictoVec
@test datetimes[3].data == testdts
@test [datetimes[3].index2name[i] for i in 1:length(datetimes[3])] == ["A", "B", "C", "D"]
@test datetimes[4] isa DictoVec
@test datetimes[4].data == [testdts[1]]
@test datetimes[4].index2name[1] == "A"
end

@testset "Test Date and DateTime in a DataFrame" begin
rdfs = load("$testdir/data/datedfs.rds")
df = DataFrame(date=Date("2017-01-01") + Dates.Day.(1:4),
datetime=ZonedDateTime.(DateTime("2017-01-01T13:23") + Dates.Second.(1:4),
tz"UTC"))
@test length(rdfs) == 2
@test rdfs[1] isa DataFrame
@test rdfs[2] isa DataFrame
@test eltypes(df) == eltypes(rdfs[1])
@test eltypes(df) == eltypes(rdfs[2])
@test isequal(df[1, :], rdfs[1])
@test isequal(df, rdfs[2])
end

@testset "Test NA Date and DateTime conversion" begin
dates = load("$testdir/data/datesNA.rds")

testdates = [Date("2017-01-01") + Dates.Day.(1:4); missing]
@test all(dates[1] .=== testdates)

testdts = [ZonedDateTime.(DateTime("2017-01-01T13:23") + Dates.Second.(1:4), tz"UTC");
missing]
@test all(dates[2] .=== testdts)
end

@testset "Test DateTime timezones" begin
# tz"CST" is not supported by TimeZones.jl
datetimes = @test_warn "Could not determine the timezone of 'CST', treating as UTC." begin
load("$testdir/data/datetimes_tz.rds")
end
# assumes generate_rda.R was generated on system set to PST!
@test datetimes[1] == ZonedDateTime(DateTime("2017-01-01T21:23"), tz"UTC")
# should be tz"CST", but gets substituted to tz"UTC"
# FIXME update the test when CST is supported
@test datetimes[2] == ZonedDateTime(DateTime("2017-01-01T13:23"), tz"UTC")
@test datetimes[3] == ZonedDateTime(DateTime("2017-01-01T13:23"), tz"America/Chicago")
end
end
Binary file added test/data/datedfs.rds
Binary file not shown.
Binary file added test/data/dates.rds
Binary file not shown.
Binary file added test/data/datesNA.rds
Binary file not shown.
Binary file added test/data/datetimes.rds
Binary file not shown.
Binary file added test/data/datetimes_tz.rds
Binary file not shown.
26 changes: 26 additions & 0 deletions test/generate_rda.R
Original file line number Diff line number Diff line change
Expand Up @@ -55,3 +55,29 @@ save(test.cmpfun0, test.cmpfun1, test.cmpfun2, file = "data/cmpfun.rda")
x <- factor(c("a", "b", "c"))
y <- ordered(x, levels=c("b", "a", "c"))
save(x, y, file="data/ord.rda")

dates = as.Date("2017-01-01") + 1:4
datetimes = as.POSIXct("2017-01-01 13:23", tz="UTC") + 1:4
dateNAs = list(c(dates, NA), c(datetimes, NA))
saveRDS(dateNAs, file="data/datesNA.rds")
datelst = list(dates, dates[1])
names(dates) = LETTERS[1:length(dates)]
datelst = c(datelst, list(dates), list(dates[1]))
saveRDS(datelst, file="data/dates.rds")
dtlst = list(datetimes, datetimes[1])
names(datetimes) = LETTERS[1:length(datetimes)]
dtlst = c(dtlst, list(datetimes), list(datetimes[1]))
saveRDS(dtlst, file="data/datetimes.rds")
datedfs = list(data.frame(date=dates[1], datetime=datetimes[1]),
data.frame(date=dates, datetime=datetimes))
saveRDS(datedfs, file="data/datedfs.rds")

# the first element here is assumed to be in the local timezone but is saved in
# UTC time, without any timezone attribute. When R reads it, it assumes local time.
# So the test associated with this first datapoint is going to assume which timezone
# the data is generated in! (PST/-8)
saveRDS(list(as.POSIXct("2017-01-01 13:23"),
as.POSIXct("2017-01-01 13:23", tz="CST"),
as.POSIXct("2017-01-01 13:23", tz="America/Chicago")),
file="data/datetimes_tz.rds")