Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow a character string in which argument of data.table:::[.data.table #6496

Open
Kamgang-B opened this issue Sep 14, 2024 · 2 comments
Open

Comments

@Kamgang-B
Copy link
Contributor

This is a feature request.

I find which argument quite confusing/counterintuitive when joining and returning i row numbers in x[i, which=NA, ...].

A join and which argument can interact in four different ways as shown below:

x = data.table(a=1:3, x=c(NA, 10, NA))
i = data.table(a=2:5, y=c(20, 10, 20, 30))

x
       a     x
   <int> <num>
1:     1    NA
2:     2    10
3:     3    NA

i
       a     y
   <int> <num>
1:     2    20
2:     3    10
3:     4    20
4:     5    30

x[i, on="a", which=TRUE]     # (a): ok
[1]  2  3 NA NA
x[!i, on="a", which=TRUE]    # (b): ok
[1] 1
x[i, on="a", which=NA]       # (c): counterintuitive
[1] 3 4
x[!i, on="a", which=NA]      # (d): counterintuitive
[1] 1 2

(a): row numbers of x that i matches to.
(b): row numbers of x that no i matches to.
(c): row numbers of i that have no match to x. The fact that i is not prefixed with ! makes it counterintuive.
(d): row numbers of i that have a match to x. The use of ! suggests that the cases that have no match are of interest while it is actually the opposite.

I propose to allow a character string in which with four possible values (other propositions are very welcome): c("xmatch", "xnomatch", "imatch", "inomatch") where they correspond to (a), (b), (d), and (c) scenarios, respectively. These values would work as follow:

x[i, on="a", which="xmatch"]     # row number of x that i matches to
x[i, on="a", which="xnomatch"]   # row numbers of x that no i matches to
x[i, on="a", which="imatch"]     # row numbers of i that have a match to x
x[i, on="a", which="inomatch"]   # row numbers of i that have no match to x

So, the character string specified would allow to know the type of join (whether i needs to be prefixed with ! or not) and the data.table whose row numbers should be returned.

With this feature, data.table:::[.data.table would behave as below:

fm = function(x, i, on, which){
  switch(which,
	 xmatch = x[i, on=on, which=TRUE],
	 xnomatch = x[!i, on=on, which=TRUE],
	 inomatch = x[i, on=on, which=NA],
	 imatch = x[!i, on=on, which=NA])
}

fm(A, B, on="a", which="xmatch")
[1]  2  3 NA NA
fm(A, B, on="a", which="xnomatch")
[1] 1
fm(A, B, on="a", which="imatch")
[1] 1 2
fm(A, B, on="a", which="inomatch")
[1] 3 4
@jangorecki
Copy link
Member

Is there anything wrong with swapping places of x and i?

x[i,...]
i[x,...]

@AngelFelizR
Copy link
Contributor

I don't see much benefit in adding a function easy to create with the current tools we know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants