-
Fixed a regression where
NULL
elements were not being correctly dropped innew_df()
. -
New factor functions
levels_rename
,levels_add
,levels_rm
,levels_lump
andlevels_count
. -
overview
cols are abbreviated to save visual space and histograms are printed by default. -
levels_drop
was not working correctly and has been fixed.
-
New functions
cheapr_var
andcheapr_rev
. -
get_breaks
has been improved and a few small bugs have been fixed. -
as_discrete
gains a new argumentinf_label
. -
Safety improvements to
as_discrete
. -
Removed internal C++ functions as package installation was failing for some machines.
-
New scalar functions have been added and some renamed. Most are now prefixed with 'val_' or 'na_' in the case of
NA
specific scalar functions. -
New cheap functions for binning continuous data into discrete bins. These include
get_breaks
,as_discrete
andbin
.get_breaks
finds 'pretty' break-points of numeric data very quickly.as_discrete
converts numeric data to discrete categories as a factor.bin
is a low-level function for binning numeric data into the correct bins. It can also efficiently return the corresponding break values instead of the break indices throughcodes = FALSE
. -
New function
na_insert
to randomly insertNA
values into a vector. -
New function
vector_length
as a hybrid betweenlength
andnrow
. -
gcd
andscm
now make use of 64-bit integers internally and can accept 'integer64' objects.scm
used to returnNA
once the 32-bit integer limit of 2^31 - 1 was reached if the input was an integer vector. This has now been increased to the 64-bit integer limit, which is approximately 9.223372e+18 and errors if that limit is exceeded. -
'integer64' objects are now lightly supported. They are not supported in any sequence functions or in the 'set_math' functions.
-
New functions
new_df
andnamed_list
. -
All factor levels utilities now begin with the prefix 'levels_'.
-
New cheap factor functions
as_factor
,levels_add_na
,levels_drop_na
,levels_drop
andlevels_reorder
. -
lag_
now usesmemmove
where possible. -
Fixed an issue where
lag_(x)
was materialising x twice if x was an ALTREP integer sequence.
-
Range based subsetting, e.g.
sset(x, 1:10)
should now be faster asmemmove
is used where possible. -
New functions
val_count
andwhich_val
for common scalar operations. -
Some functions gain a 'names' argument.
-
Replaced calls to
STRING_PTR
withSTRING_PTR_RO
to satisfy R package check results. -
lag_
should now be somewhat faster. -
Fixed a small bug in
lag2_
that would produce incorrect results when supplying a vector of lags and an order vector.
-
A signed integer overflow bug in
lag2_
has been fixed. This occurred when supplyingNA
lags. -
lag2_
no longer fills the names of named vectors when thefill
value is supplied.
-
New function
recycle
to help recycle R objects to a common size. -
The
set
functions that update by reference are now ALTREP aware and take a copy when the input is an ALTREP object. -
New function
lag2_
as a generalised solution for complex lags. It supports dynamic lag vectors, lags using an order vector, and custom run lengths. It doesn't support updating by reference or long vectors.
-
New function
lag_
for very fast lags and leads on vectors and data frames. It includes aset
argument allowing users to create a lagged vector by reference without copies. -
set_round
has been amended to improve floating point accuracy.
-
New 'set' Math operations inspired by 'data.table' and 'collapse' that transform data by reference.
-
Fixed an inconsistency of when
sequence_()
would error when supplied with a zero-length size argument. -
Fixed a protection stack imbalance in
count_val(x)
whenx
isNULL
. -
sset
has been optimised for wide data frames with many variables. It is also faster when applied to a data frame with dates, date-times and factors. -
In
sset
, wheni
is a logical vector it must match the length of x. -
sset
can now handle 'ALTREP' compact real sequences as well.
-
sset
is now parallelised wheni
is an 'ALTREP' compact integer sequence, e.g.sset(x, 1:10)
. -
sset
now has an internal range-based subset method for 'ALTREP' integer sequences made using:
for example. -
New function
count_val
as a cheaper alternative to e.g.sum(x == val)
. -
Negative indexing in
sset
has been improved. It is also now partially parallelised. -
Setting
recursive
to false should now be faster. -
'overview' objects gain an additional list element "print_digits" which is passed to the print method in order to correctly round the summary statistics without affecting the 'cheapr.digits' option globally.
-
factor_
andna_rm
now handle data frames. -
A bug in
sset.data.table
that caused further set calculations to produce warnings has been fixed. -
is_na.POSIXlt
andsset.POSIXlt
have been rewritten to handle unbalanced 'POSIXlt' objects.
-
New function
sset
to consistently subset data frame rows and vectors in general. -
overview
now always returns an object of class "overview". It also returns the number of observations instead of rows so that it makes sense for vector summaries as well as data frame summaries. -
sequence_
has been optimised and rewritten in C++. It now only checks for integer overflow when bothfrom
andby
are integer vectors. -
The internal function
list_as_df
has been rewritten in C++.
-
New function
overview
as a cheaper alternative tosummary
. -
All of the
NA
handling functions now fall back to usingis.na
if an appropriate method cannot be found. -
More support has been added for all objects with an
is.na
method.
-
is_na
has been added as an S3 generic function which is parallelised and internally falls back onis.na
if there are no suitable methods. -
Additional list utility functions have been added.
-
Limited support for
vctrs_rcrd
objects has been added again. -
num_na
and similar functions no longer treat empty data frame rows as single observations but instead return the total number ofNA
values in the data frame. -
Fixed a bug in
row_na_counts
andcol_na_counts
that would cause the session to crash when a column variable was a list. -
For the time being, vctrs 'vctrs_rcrd' objects are no longer supported though this support may be re-added in the future.
- CRAN submission accepted.