Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I guess there is a problem with rolling joins bit #1642

Closed
kayagorur opened this issue Apr 2, 2024 · 4 comments
Closed

I guess there is a problem with rolling joins bit #1642

kayagorur opened this issue Apr 2, 2024 · 4 comments

Comments

@kayagorur
Copy link

Hello,
I am reading R4DS second edition and learning a great deal from it. I am grateful that you made it available online since I cannot afford it at least for now.

At section 19.5.3 Rolling Joins, figure 19.16, the dots do not match the equation "closest ( key <= key )".
They are true for the opposite "closest ( key >= key )"

I am new to this stuff and of course I might be wrong but would appreciate if you just check it out once more. Since your book is such a valuable source for guys out of this field and trying to figure out data science by themselves I just wanted to share an issue that confused me.

P.S. I have created a GitHub account just to write this message to you :)

With respect

Rolling_join_issue

@kayagorur
Copy link
Author

And one more thing that you may want to consider while you are at it. The same goes for the birthday parties example:

And for each employee we want to find the first party date that comes after (or on) their birthday. We can express that with a rolling join:

I guess to find the party that comes after (or on) their birthday the equation needs to be "closest( birthday <= party)"

I really like to hear your response since understanding this is very important for the analysis I am trying to make. I am trying to filter the lab values that are closest to pre-determined control visit dates of my patients who of course never show up on schedule and miss their appointments regularly by a week or so.

Thank you in advance for your response

@florisvdh
Copy link
Contributor

And one more thing that you may want to consider while you are at it. The same goes for the birthday parties example:

And for each employee we want to find the first party date that comes after (or on) their birthday. We can express that with a rolling join:

I guess to find the party that comes after (or on) their birthday the equation needs to be "closest( birthday <= party)"

I really like to hear your response since understanding this is very important for the analysis I am trying to make. I am trying to filter the lab values that are closest to pre-determined control visit dates of my patients who of course never show up on schedule and miss their appointments regularly by a week or so.

Thank you in advance for your response

(Posted by @kayagorur above)

This specific post is a duplicate of #1610.

@florisvdh
Copy link
Contributor

@kayagorur you're right about the error in fig 19.16. This issue is already a duplicate of #1470. So I suggest to close this one.

@kayagorur
Copy link
Author

Thank you for the response. This is very helpful. Solved my problems and fixed my understanding of the concept. I am closing the issue then. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants