Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

print.data.table warns when options(datatable.show.indices=TRUE) and an index exists #6806

Open
aitap opened this issue Feb 9, 2025 · 8 comments

Comments

@aitap
Copy link
Contributor

aitap commented Feb 9, 2025

Observed while reading the source("data.table-Ex") output for translation purposes; taken from example(print.data.table):

library(data.table)
options(warn = 2, datatable.show.indices=TRUE)
NN <- 200
DT = data.table(grp1 = sample(100, NN, TRUE), grp2 = sample(90, NN, TRUE), grp3 = sample(80, NN, TRUE))
setindex(DT, grp1, grp3) # will warn during printing after this but not before
DT
# Index: <grp1__grp3>
# Error in rbind(abbs, toprint) : 
#   (converted from warning) number of columns of result is not a multiple of vector length (arg 1)

The warning most likely originates here:

toprint = rbind(abbs, toprint)

May be a good exercise in finding the emergency exit from R Inferno. If not, will debug myself soon.

@PradeepFSTdhane123
Copy link

@aitap I am excited to get involved and help with this bug issue. Could you assign it to me? I am looking forward to contributing and collaborating with the team. Thank you!"

@aitap
Copy link
Contributor Author

aitap commented Feb 9, 2025

@PradeepFSTdhane123 You are absolutely welcome to try your hand at preventing the warning. Good luck! Start by reproducing the warning with options(warn=2, error=recover) set.

@aitap aitap changed the title print.data.table may warn when key and index are both present print.data.table may warn when an index is present Feb 9, 2025
@Mukulyadav2004
Copy link

Mukulyadav2004 commented Feb 10, 2025

Hi , i was checking out this issue , i am not able to reproduce it .

> NN = 200
> options(warn = 2, error = recover)
> DT = data.table(grp1 = sample(100, NN, TRUE),grp2 = sample(90, NN, TRUE),grp3 = sample(80, NN, TRUE))
> setkey(DT, grp1, grp2)
> setindex(DT, grp1, grp3) 
> DT
Key: <grp1, grp2>
Index: <grp1__grp3>
      grp1  grp2  grp3
     <int> <int> <int>
  1:     1    72    17
  2:     1    78    59
  3:     2    60    14
  4:     3    15    12
  5:     3    48     6
 ---                  
196:    98    57    25
197:    99    55    80
198:    99    65    70
199:   100     2    80
200:   100     9    46

can you please explain.

@aitap
Copy link
Contributor Author

aitap commented Feb 10, 2025 via email

@aitap aitap changed the title print.data.table may warn when an index is present print.data.table warns when options(datatable.show.indices=TRUE) and an index exists Feb 10, 2025
@aitap aitap changed the title print.data.table warns when options(datatable.show.indices=TRUE) and an index exists print.data.table warns when options(datatable.show.indices=TRUE) and an index exists Feb 10, 2025
@Mukulyadav2004
Copy link

Hi @aitap ,

I’ve been investigating the issue related to index printing in data.table, where setting an index causes a warning due to mismatched dimensions when printing.
Specifically, the error occurs in:
rbind(abbs, toprint) : number of columns of result is not a multiple of vector length (arg 1)

My Approach:
To handle this, I modified the way indices are appended to the printed output by ensuring they align with the number of columns in the data table. Padding each index line with empty string to have data column's count(index info in the first column and others empty).
And then combine padding indices with the main data using rbind.

I’d love to get your thoughts on this approach. If you think it’s viable, I can refine it further and submit a PR. Let me know if you have any feedback or suggestions!

Thanks,
Mukul

@aitap
Copy link
Contributor Author

aitap commented Feb 13, 2025

This is a viable approach. You can also save the column classes from toprint before it is formatted but after the indices are attached to it using cbind(), then later use this information when combining everything to print. Use whichever approach feels the simplest.

@Mukulyadav2004
Copy link

Hi @aitap ,

I have been working on improving the print.data.table function to address the issue with options(datatable.show.indices = TRUE). Initially, I attempted to modify the implementation by adding an empty string and then using rbind to integrate it with the main data. However, this approach didn’t yield the expected results.

I shifted to a new approach, where I created a header string and added the index information to it. Below is the code snippet that I removed from print.data.table:

if (show.indices) {
  if (is.null(indices(x))) {
    show.indices = FALSE
  } else {
    index_dt <- as.data.table(attributes(attr(x, 'index')))
    print_names <- paste0("index", if (ncol(index_dt) > 1L) seq_len(ncol(index_dt)) else "", ":", sub("^__", "", names(index_dt)))
    setnames(index_dt, print_names)
  }
}

And here is the updated code that I wrote to replace the above:

    indices <- names(attr(x, "index", exact = TRUE))  
    if (length(indices)) {
      cleaned_indices <- gsub("^__|_", ", ", indices)  
      cleaned_indices <- sub(", $", "", cleaned_indices)  
    
      # Create header string 
      header <- paste0("Indices: ", paste(cleaned_indices, collapse = ", "))
      if (exists("header", envir = parent.frame(), inherits = FALSE)) {
        # Match data.table's existing header handling
        assign("header", c(get("header", envir = parent.frame()), header), 
               envir = parent.frame())
      }
    }
    else {
      show.indices <- FALSE  
   }
 }

This new implementation successfully resolves the error related to options(datatable.show.indices = TRUE). However, when running the full test suite, I encountered 6 failing tests out of 11,759. These failing tests are all in tests/tests.Rraw and correspond to test numbers 2264.1, 2264.2, 2264.3, 2264.4, 2264.7, and 2264.8.

I am unsure what is causing these specific test failures. Could you kindly take a look and provide any insights or suggestions on what might be causing this issue?

Thank you for your time and guidance,
Mukul

@aitap
Copy link
Contributor Author

aitap commented Feb 14, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants