-
Notifications
You must be signed in to change notification settings - Fork 985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
List size for large data.tables #6609
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6609 +/- ##
=======================================
Coverage 98.60% 98.60%
=======================================
Files 79 79
Lines 14518 14518
=======================================
Hits 14316 14316
Misses 202 202 ☔ View full report in Codecov by Sentry. |
R/tables.R
Outdated
@@ -5,7 +5,7 @@ type_size = function(DT) { | |||
# for speed and ram efficiency, a lower bound by not descending into character string lengths or list items | |||
# if a more accurate and higher estimate is needed then user can pass object.size or alternative to mb= | |||
# in case number of columns is very large (e.g. 1e6 columns) then we use a for() to avoid allocation of sapply() | |||
ans = 0L | |||
ans = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ans = 0 | |
ans = 0.0 |
we like to really emphasize "this is not integer"
inst/tests/other.Rraw
Outdated
@@ -766,3 +766,10 @@ if (loaded[["nanotime"]]) { | |||
# respect dec=',' for nanotime, related to #6446, corresponding to tests 2281.* | |||
test(31, fwrite(data.table(as.nanotime(.POSIXct(0))), dec=',', sep=';'), output="1970-01-01T00:00:00,000000000Z") | |||
} | |||
|
|||
# test for bug#6607 | |||
local({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- no need for :::
- you filter to one NAME, somewhere redundant to then test that value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're using local() to reduce the scope of what gets included in tables() right?
if so I'd either
- emphasize that in a comment (using local is otherwise unusual in our suite)
- assign to a specific new.env() for clarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I'll make these changes.
inst/tests/other.Rraw
Outdated
.e <- new.env() ## to not populate the .GlobalEnv | ||
.e[["DT"]] <- as.data.table(lapply(1:15,function(i) runif(20e6))) | ||
res <- tables(env=.e) | ||
test(32, res[, .(NAME,NROW,NCOL,MB)], data.table(NAME="DT",NROW=20000000L,NCOL=15L,MB=2288)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, 2288.0 again for emphasis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great! please add a NEWS item fitting yourself and add yourself to DESCRIPTION
Modified NEWS and DESCRIPTION files to include details pertaining to the PR.
Please let me know if I've missed anything. |
Great suggestion by @ben-schwen!
LGTM, TY! I'll leave it to Michael or Toby to merge and invite you as project member. |
Thanks @vlulla! I sent you an invite to join as a project member. That will make collaboration easier going forward, in particular allowing you to create branches on this repo directly (no more fork needed). |
.e[["DT"]] <- as.data.table(lapply(1:15,function(i) runif(20e6))) | ||
res <- tables(env=.e) | ||
test(32, res[, .(NAME,NROW,NCOL,MB)], data.table(NAME="DT",NROW=20000000L,NCOL=15L,MB=2288.0)) | ||
rm(.e, res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI generally we don't do the rm()
clean-up. test.data.table()
will do so at the end of the suite, and interactions among tests are rare enough that we don't bother. You are right that strictly speaking it is better to have tests more hermetic where possible.
Fixes #6607