-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are the ~18k zero-weight records in puf_weights.csv intentional? #385
Comments
The code at bottom gives:
|
@donboyd5, I wouldn't say it's intentional, rather it's a result of the re-weighting process. A difference between the Aug. 2020 version and the new version is we switched to the solvers in Julia for stage 2. It's possible that the translation introduced a bug that resulted in all of these zero-weight records, or it's possible that this is just what the solver gives us. I'll dig into it a bit and see if I find anything. |
@andersonfrailey, I think it's happening sooner than that because the LP solver in effect is limiting each new weight to be +- ~55-70% (depending on year) of the initial year's weight. So it can only be zero if it started at zero. Before the solver is called, in stage2.py, it reads puf from
and a few lines later has this line:
Here is a screenshot of what is in puf.matched_weight right after reading the puf: As you can see, the zero-weights are at the bottom before we ever get to the solver, so it looks like it is occurring somewhere along the line in creating |
I stepped through the code in createpuf.py. Here are the key lines of code: Everything seemed fine through line 168 - the relevant files all had positive s006 and positive matched_weight. Here is what I get for data.matched_weight and data.s006 right after line 166: But after running line 169, here is what I get for them: The next line replaces na with zero, and away we go. So the problem occurs when we add nonfilers. As you can see, they do not have a I'm not sure what the fix is (maybe they should be given s006 as their matched_weight?), but that appears to be the problem. |
Thanks for looking into this, @donboyd5. I think I see what needs to be fixed. I'll get a PR up when I get it worked out. |
After constructing puf weights using an alternative solver (#381), I noticed that approximately 18k records, all at the bottom of the file, appear to have zero weights for all years. I did not examine this formally - I just looked at the file with a log viewer - but a quick look suggests most/all of these bottom-of-file records have all zero weights.
Thinking I did something wrong, I looked at puf_weights.csv.gz in taxdata and it has the same thing.
I then looked back at puf_weights.csv.gz from Aug 2020 and it does not appear to have zero-weight records, again based on an informal look with a log viewer.
With that as background, would someone be able to tell me:
Many thanks.
The text was updated successfully, but these errors were encountered: