Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List size for large data.tables #6609

Merged
merged 7 commits into from
Nov 13, 2024
Merged

Conversation

vlulla
Copy link
Contributor

@vlulla vlulla commented Nov 9, 2024

Fixes #6607

Copy link

codecov bot commented Nov 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.60%. Comparing base (e308dcb) to head (40e4ac3).
Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6609   +/-   ##
=======================================
  Coverage   98.60%   98.60%           
=======================================
  Files          79       79           
  Lines       14518    14518           
=======================================
  Hits        14316    14316           
  Misses        202      202           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

R/tables.R Outdated
@@ -5,7 +5,7 @@ type_size = function(DT) {
# for speed and ram efficiency, a lower bound by not descending into character string lengths or list items
# if a more accurate and higher estimate is needed then user can pass object.size or alternative to mb=
# in case number of columns is very large (e.g. 1e6 columns) then we use a for() to avoid allocation of sapply()
ans = 0L
ans = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ans = 0
ans = 0.0

we like to really emphasize "this is not integer"

@@ -766,3 +766,10 @@ if (loaded[["nanotime"]]) {
# respect dec=',' for nanotime, related to #6446, corresponding to tests 2281.*
test(31, fwrite(data.table(as.nanotime(.POSIXct(0))), dec=',', sep=';'), output="1970-01-01T00:00:00,000000000Z")
}

# test for bug#6607
local({
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • no need for :::
  • you filter to one NAME, somewhere redundant to then test that value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're using local() to reduce the scope of what gets included in tables() right?

if so I'd either

  • emphasize that in a comment (using local is otherwise unusual in our suite)
  • assign to a specific new.env() for clarity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I'll make these changes.

.e <- new.env() ## to not populate the .GlobalEnv
.e[["DT"]] <- as.data.table(lapply(1:15,function(i) runif(20e6)))
res <- tables(env=.e)
test(32, res[, .(NAME,NROW,NCOL,MB)], data.table(NAME="DT",NROW=20000000L,NCOL=15L,MB=2288))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, 2288.0 again for emphasis

Copy link
Member

@MichaelChirico MichaelChirico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! please add a NEWS item fitting yourself and add yourself to DESCRIPTION

Modified NEWS and DESCRIPTION files to include details pertaining to the PR.
@vlulla
Copy link
Contributor Author

vlulla commented Nov 9, 2024

Please let me know if I've missed anything.

NEWS.md Outdated Show resolved Hide resolved
inst/tests/other.Rraw Outdated Show resolved Hide resolved
@ben-schwen
Copy link
Member

ben-schwen commented Nov 10, 2024

LGTM, TY!

I'll leave it to Michael or Toby to merge and invite you as project member.

@MichaelChirico
Copy link
Member

MichaelChirico commented Nov 13, 2024

Thanks @vlulla! I sent you an invite to join as a project member. That will make collaboration easier going forward, in particular allowing you to create branches on this repo directly (no more fork needed).

.e[["DT"]] <- as.data.table(lapply(1:15,function(i) runif(20e6)))
res <- tables(env=.e)
test(32, res[, .(NAME,NROW,NCOL,MB)], data.table(NAME="DT",NROW=20000000L,NCOL=15L,MB=2288.0))
rm(.e, res)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI generally we don't do the rm() clean-up. test.data.table() will do so at the end of the suite, and interactions among tests are rare enough that we don't bother. You are right that strictly speaking it is better to have tests more hermetic where possible.

@MichaelChirico MichaelChirico merged commit 03c647f into Rdatatable:master Nov 13, 2024
9 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tables function displays NA for size of largish table
3 participants