Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document the use of rsubset to preserve missings #313

Open
vjd opened this issue Nov 29, 2021 · 3 comments
Open

document the use of rsubset to preserve missings #313

vjd opened this issue Nov 29, 2021 · 3 comments

Comments

@vjd
Copy link

vjd commented Nov 29, 2021

Trying to subset b==0 while still preserving the missing in b

julia> dd = DataFrame(a = [1, 2, 3], b = [1, missing, 0])
3×2 DataFrame
 Row │ a      b       
     │ Int64  Int64?  
─────┼────────────────
   11        1
   22  missing 
   33        0

julia> @chain dd begin
           @rsubset :b == 0 | ismissing(:b)
       end
1×2 DataFrame
 Row │ a      b      
     │ Int64  Int64? 
─────┼───────────────
   13       0

However,

learned on slack that :b == 0 returns missing and the | propagates missing. What we instead should do is

julia> @chain dd begin
           @rsubset ismissing(:b) || :b == 0
       end

Here the order matters. We would get an error otherwise.

This is a very subtle but important manipulation that it is better to be documented.

@bkamins
Copy link
Member

bkamins commented Nov 29, 2021

or use coalesce(:b == 0, true) which is exactly why coalesce exists.

@pdeffebach
Copy link
Collaborator

I think we should just have a doc section for how to handle missings. @vjd if you have any students who would like to write one, that would be awesome! I can also get to it eventually.

@vjd
Copy link
Author

vjd commented Dec 1, 2021

yes, we will add a section for handling missings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants