Skip to content

Commit

Permalink
Allows creating data.table rowwisely (#4291)
Browse files Browse the repository at this point in the history
* add a new argument .rowwise to data.table()
which allows creating a data.table object rowwisely

* reimplment rowwiseDT (uses `name=` synax)

* export rowwiseDT()

* update the docs using rowwiseDT

* update NEWs

* update and improve the tests

* avoid using the base function name as the variable name

* use ncols to avoid name collision to base::ncol

* doc rowwiseDT() in a seperated Rd file

* sprintf -> gettextf

* remove the key argument

* tweak the doc

* tweak the error message

* re-site NEWS

* modernize: stopf()

* grammar, style

* trailing ws

* Give a proper shout-out to tibble::tribble()

* correct message text in test

* move to new R/ file

* slightly simpler, remove lambda

* don't repeat calculation of nrows (also more readable)

* also list() up 0-length

---------

Co-authored-by: Michael Chirico <[email protected]>
Co-authored-by: Michael Chirico <[email protected]>
  • Loading branch information
3 people authored Sep 6, 2024
1 parent 97980d9 commit b566822
Show file tree
Hide file tree
Showing 6 changed files with 91 additions and 1 deletion.
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ exportClasses(data.table, IDate, ITime)
##

export(data.table, tables, setkey, setkeyv, key, "key<-", haskey, CJ, SJ, copy)
export(rowwiseDT)
export(setindex, setindexv, indices)
export(as.data.table,is.data.table,test.data.table)
export(last,first,like,"%like%","%ilike%","%flike%","%plike%",between,"%between%",inrange,"%inrange%", "%notin%")
Expand Down
19 changes: 19 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,25 @@

1. In `DT[, variable := value]`, when value is class `POSIXlt`, we automatically coerce it to class `POSIXct` instead, [#1724](https://github.com/Rdatatable/data.table/issues/1724). Thanks to @linzhp for the report, and Benjamin Schwendinger for the fix.

## NEW FEATURES

1. New function `rowwiseDT()` for creating a data.table object "row-wise", often convenient for readability of small, literally-defined tables. Thanks to @shrektan for the suggestion and PR and @tdeenes for the idea of the `name=` syntax. Inspired by `tibble::tribble()`.

```r
library(data.table)
rowwiseDT(
a=,b=,c=, d=,
1, 2, "a", 2:3,
3, 4, "b", list("e"),
5, 6, "c", ~a+b,
)
#> a b c d
#> <num> <num> <char> <list>
#> 1: 1 2 a 2,3
#> 2: 3 4 b e
#> 3: 5 6 c ~a + b
```

## BUG FIXES

1. Using `print.data.table()` with character truncation using `datatable.prettyprint.char` no longer errors with `NA` entries, [#6441](https://github.com/Rdatatable/data.table/issues/6441). Thanks to @r2evans for the bug report, and @joshhwuu for the fix.
Expand Down
23 changes: 23 additions & 0 deletions R/rowwiseDT.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
rowwiseDT = function(...) {
x = substitute(list(...))[-1L]
if (is.null(nms <- names(x)))
stopf("Must provide at least one column (use `name=`). See ?rowwiseDT for details")
header_pos = which(nzchar(nms))
if (any(nzchar(x[header_pos])))
stopf("Named arguments must be empty")
if (!identical(header_pos, seq_along(header_pos)))
stopf("Header must be the first N arguments")
header = nms[header_pos]
ncols = length(header)
body = lapply(x[-header_pos], eval, envir = parent.frame())
nrows = length(body) %/% ncols
if (length(body) != nrows * ncols)
stopf("There are %d columns but the number of cells is %d, which is not an integer multiple of the columns", ncols, length(body))
# make all the non-scalar elements to a list
needs_list = lengths(body) != 1L
body[needs_list] = lapply(body[needs_list], list)
body = split(body, rep(seq_len(nrows), each = ncols))
ans = rbindlist(body)
setnames(ans, header)
ans
}
22 changes: 22 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -19063,3 +19063,25 @@ test(2280.3, foo(), error="Internal error in foo: broken")
# fwrite respects dec=',' for sub-second timestamps, #6446
test(2281.1, fwrite(data.table(a=.POSIXct(0.001)), dec=',', sep=';'), output="1970-01-01T00:00:00,001Z")
test(2281.2, fwrite(data.table(a=.POSIXct(0.0001)), dec=',', sep=';'), output="1970-01-01T00:00:00,000100Z")

# rowwisely creating a data.table
DT = rowwiseDT(
A=, B=,
1, "a"
)
test(2282.01, DT, data.table(A = 1, B = "a"))
# error if named argument is not empty
test(2282.02, rowwiseDT(A=,B=2,1,2), error="Named arguments must be empty")
# error if no header
test(2282.03, rowwiseDT(1,"a"), error = "Must provide at least one column")
# error if body is not multiple length of the header
test(2282.04, rowwiseDT(A=,B=,C=, 1,"a",2,"b",3), error = "There are 3 columns but the number of cells is 5, which is not an integer multiple of the columns")
# create list element automatically
test(2282.05, rowwiseDT(A=,list(1)), data.table(A=list(1)))
test(2282.06, rowwiseDT(A=,1:2), data.table(A=list(1:2)))
test(2282.07, rowwiseDT(A=,double()), data.table(A=list(double())))
# error if named argument is in the middle
test(2282.08, rowwiseDT(A=,B=,1,2,C=,4), error="Header must be the first N arguments")
# evaluate arguments in the correct frame
ncols = 1e6
test(2282.09, rowwiseDT(A=,ncols), data.table(A=ncols))
2 changes: 1 addition & 1 deletion man/data.table.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ column called \code{keep} containing \code{TRUE} and this is correct behaviour;
\code{POSIXlt} is not supported as a column type because it uses 40 bytes to store a single datetime. They are implicitly converted to \code{POSIXct} type with \emph{warning}. You may also be interested in \code{\link{IDateTime}} instead; it has methods to convert to and from \code{POSIXlt}.
}
\seealso{ \code{\link{special-symbols}}, \code{\link{data.frame}}, \code{\link{[.data.frame}}, \code{\link{as.data.table}}, \code{\link{setkey}}, \code{\link{setorder}}, \code{\link{setDT}}, \code{\link{setDF}}, \code{\link{J}}, \code{\link{SJ}}, \code{\link{CJ}}, \code{\link{merge.data.table}}, \code{\link{tables}}, \code{\link{test.data.table}}, \code{\link{IDateTime}}, \code{\link{unique.data.table}}, \code{\link{copy}}, \code{\link{:=}}, \code{\link{setalloccol}}, \code{\link{truelength}}, \code{\link{rbindlist}}, \code{\link{setNumericRounding}}, \code{\link{datatable-optimize}}, \code{\link{fsetdiff}}, \code{\link{funion}}, \code{\link{fintersect}}, \code{\link{fsetequal}}, \code{\link{anyDuplicated}}, \code{\link{uniqueN}}, \code{\link{rowid}}, \code{\link{rleid}}, \code{\link{na.omit}}, \code{\link{frank}} }
\seealso{ \code{\link{special-symbols}}, \code{\link{data.frame}}, \code{\link{[.data.frame}}, \code{\link{as.data.table}}, \code{\link{setkey}}, \code{\link{setorder}}, \code{\link{setDT}}, \code{\link{setDF}}, \code{\link{J}}, \code{\link{SJ}}, \code{\link{CJ}}, \code{\link{merge.data.table}}, \code{\link{tables}}, \code{\link{test.data.table}}, \code{\link{IDateTime}}, \code{\link{unique.data.table}}, \code{\link{copy}}, \code{\link{:=}}, \code{\link{setalloccol}}, \code{\link{truelength}}, \code{\link{rbindlist}}, \code{\link{setNumericRounding}}, \code{\link{datatable-optimize}}, \code{\link{fsetdiff}}, \code{\link{funion}}, \code{\link{fintersect}}, \code{\link{fsetequal}}, \code{\link{anyDuplicated}}, \code{\link{uniqueN}}, \code{\link{rowid}}, \code{\link{rleid}}, \code{\link{na.omit}}, \code{\link{frank}}, \code{\link{rowwiseDT}} }
\examples{
\dontrun{
example(data.table) # to run these examples yourself
Expand Down
25 changes: 25 additions & 0 deletions man/rowwiseDT.Rd
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
\name{rowwiseDT}
\alias{rowwiseDT}
\title{ Create a data.table row-wise }
\description{
\code{rowwiseDT} creates a \code{data.table} object by specifying a row-by-row layout. This is convenient and highly readable for small tables.
}
\usage{
rowwiseDT(...)
}
\arguments{
\item{...}{ Arguments that define the structure of a \code{data.table}. The column names come from named arguments (like \code{col=}), which must precede the data. See Examples. }
}
\value{
A \code{data.table}. The default is for each column to return as a vector. However, if any entry has a length that is not one (e.g., \code{list(1, 2)}), the whole column will be converted to a list column.
}
\seealso{
\code{\link{data.table}}
}
\examples{
rowwiseDT(
A=,B=, C=,
1, "a",2:3,
2, "b",list(5)
)
}

0 comments on commit b566822

Please sign in to comment.