Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Tax-Calculator with latest CPS file and CPS weights #2444

Merged
merged 11 commits into from
Aug 19, 2020

Conversation

Peter-Metz
Copy link
Contributor

@Peter-Metz Peter-Metz commented Jul 29, 2020

This PR brings Tax-Calculator up to speed to the recent updates to the CPS file and CPS weights in PSLmodels/taxdata#314. In addition to changes to the files themselves, nu05 was replaced with nu06.

A few notes:

  • The aggregate benefits and tax information statistics change significantly. I'm not totally apprised of the changes made to the CPS, so I'd appreciate @andersonfrailey input about whether this is expected.
  • I've changed some testing thresholds here and here for the tests to pass. The thresholds change significantly but seem arbitrary in the first place.

cc @MattHJensen

Note: the puf.csv file (needed to pass tests locally) used for this PR was generated with the taxdata Makefile (i.e. make puf-files). As of 8/6, the updated puf.csv has not been distributed to users.

@MattHJensen
Copy link
Contributor

Also cc @MaxGhenis as a frequent user of the file.

@MattHJensen
Copy link
Contributor

I'd like to wait on merging this until @andersonfrailey has an opportunity to take a look at the change in results.

@andersonfrailey
Copy link
Collaborator

Missed when this was opened. I'll take a look later tonight

@andersonfrailey
Copy link
Collaborator

@Peter-Metz I wouldn't worry about the changes in benefit and tax information primarily because we have validation scripts in taxdata to ensure that all income, benefit, and tax unit composition information is accurately represented in the final CPS file. Also, one of the steps in producing the old CPS file was splitting up any tax units that had information that was top-coded into 10 or 15 separate units and randomly changed each tax unit's value for the top coded variable, which would affect unweighted totals. We no longer do this because now instead of putting a hard cap on the variables that are subject to top coding, Census just shuffles them between similar people.

As for any changes the weighted totals, we switched to a new solver for creating the weights, which would explain the differences there.

@MattHJensen MattHJensen changed the title Update Tax-Calculator with latest CPS file and CPS weights [WIP] Update Tax-Calculator with latest CPS file and CPS weights Aug 5, 2020
@MattHJensen MattHJensen closed this Aug 5, 2020
@MattHJensen MattHJensen reopened this Aug 5, 2020
@MattHJensen
Copy link
Contributor

@Peter-Metz, thanks for this PR and sorry that it now has a minor conflict. I added a WIP tag while we wait for that to be resolved. Afterwards, based on @andersonfrailey's feedback, it sounds like this is good to go.

@Peter-Metz
Copy link
Contributor Author

One more question before I resolve the merge conflicts -- several functions that calculate child tax credits in calcfunctions.py rely on nu05. Can @andersonfrailey or @MattHJensen confirm that we are good to switch this to nu06 on a conceptual level?

@MattHJensen
Copy link
Contributor

MattHJensen commented Aug 6, 2020

@Peter-Metz, this is a good point about nu05. Currently it is only used for the computation of ACTC_rt_bonus_under5family and CTC_new_c_under5_bonus, both of which are 0 under current law but are used in the Clinton2016.json reform file and may be used elsewhere by users.

I think it means we have two options:

  1. Keep nu05 and nu06 on both PUF and CPS, and leave the under5 bonus params as they are. This would ensure backwards compatibility for users of those params, but it would increase the size fo the file marginally and perhaps slow us down due to the extra work required to add nu05 back to this CPS file.
  2. Drop nu05, replace the under5_bonus params with new under6_bonus params while adding an error message for under5 usage like we have for cpi_offset, and update Clinton2016.json to use nu06 while noting the discrepancy from her stated policy proposal in the file notes. (This would also mean that the PUF and CPS file updates would likely need to be combined in the same PR.)

I am inclined towards (1), but we could get away w (2) given that we are coming up on a major release. I think (1) is better since the Clinton reform is important as an historical artifact, and, if she proposed a reform depending on nu05, it seems likely that others will do the same.

@MattHJensen
Copy link
Contributor

(I am seeing that I suggested (2) previously. PSLmodels/taxdata#329 (comment))

@MattHJensen
Copy link
Contributor

Kindergarten entrance is most typically based on whether you are 5 in September (3/4 of the way through the year), so I can see a stronger case for policies to use nu06 rather than nu05. https://nces.ed.gov/programs/statereform/tab5_3.asp

@MattHJensen
Copy link
Contributor

MattHJensen commented Aug 6, 2020

I'm also seeing that @martinholmer suggested replacing nu05 with nu06 and moving Clinton2016.json to taxcalc/reforms/archive. PSLmodels/taxdata#311 (comment)

Upon further consideration, I suggest we go ahead and do that as part of following (2) from #2444 (comment). So the difference from (2) is that we archive instead of amending the Clinton proposal.

If someone wants nu05 back in the future for a specific reason, we can add it back.

@Peter-Metz
Copy link
Contributor Author

@MattHJensen I agree with your reasoning to replace nu05 with nu06. The latest commits implement that change

@codecov
Copy link

codecov bot commented Aug 6, 2020

Codecov Report

Merging #2444 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #2444   +/-   ##
=======================================
  Coverage   99.92%   99.92%           
=======================================
  Files          13       13           
  Lines        2582     2582           
=======================================
  Hits         2580     2580           
  Misses          2        2           
Impacted Files Coverage Δ
taxcalc/policy.py 100.00% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5ac6f56...6d20ebd. Read the comment docs.

@Peter-Metz
Copy link
Contributor Author

The GH action to build the jb docs is currently failing while trying to execute recipe05. Traceback is below. In a different PR, I hit this same error locally and was able to fix it by updating jupyter-book.

Is it possible we need to activate the taxcalc-dev environment in the GH action? Or perhaps install jupyter-book with pip install -U jupyter-book? Any other ideas?

cc @jdebacker @MaxGhenis @hdoupe

ValueError                                Traceback (most recent call last)
<ipython-input-2-b85009722bd6> in <module>
    113 
    114 # construct reform-vs-baseline difference table with results for income deciles
--> 115 diff_table = calc1.difference_table(calc2, 'weighted_deciles', 'iitax')
    116 assert isinstance(diff_table, pd.DataFrame)
    117 diff_extract = pd.DataFrame()

~/work/Tax-Calculator/Tax-Calculator/taxcalc/calculator.py in difference_table(self, calc, groupby, tax_to_diff, pop_quantiles)
    524         calc_var_dframe = calc.dataframe(DIFF_VARIABLES)
    525         diff = create_difference_table(self_var_dframe, calc_var_dframe,
--> 526                                        groupby, tax_to_diff, pop_quantiles)
    527         del self_var_dframe
    528         del calc_var_dframe
~/work/Tax-Calculator/Tax-Calculator/taxcalc/utils.py in create_difference_table(vdf1, vdf2, groupby, tax_to_diff, pop_quantiles)
    531         dframe = add_quantile_table_row_variable(
    532             df2, baseline_expanded_income, 10,
--> 533             pop_quantiles=pop_quantiles, decile_details=True)
    534     elif groupby == 'standard_income_bins':
    535         dframe = add_income_table_row_variable(

~/work/Tax-Calculator/Tax-Calculator/taxcalc/utils.py in add_quantile_table_row_variable(dframe, income_measure, num_quantiles, pop_quantiles, decile_details, weight_by_income_measure)
    229     labels = range(1, (num_bins + 1))
    230     dframe['table_row'] = pd.cut(dframe['cumsum_temp'], bin_edges,
--> 231                                  right=False, labels=labels)
    232     dframe.drop('cumsum_temp', axis=1, inplace=True)
    233     return dframe

/usr/share/miniconda/envs/taxcalc-dev/lib/python3.7/site-packages/pandas/core/reshape/tile.py in cut(x, bins, right, labels, retbins, precision, include_lowest, duplicates, ordered)
    271         # GH 26045: cast to float64 to avoid an overflow
    272         if (np.diff(bins.astype("float64")) < 0).any():
--> 273             raise ValueError("bins must increase monotonically.")
    274 
    275     fac, bins = _bins_to_cuts(

ValueError: bins must increase monotonically.

@MaxGhenis
Copy link
Contributor

Looks more likely to be a pandas upgrade issue.

@Peter-Metz
Copy link
Contributor Author

The latest commit changes recipe05 so that it groups the difference table by standard income bins instead of weighted deciles. More about why this change was necessary for the docs to build can be found in #2460.

cc @MattHJensen

@MattHJensen
Copy link
Contributor

@Peter-Metz, could you remove the WIP tag when this is ready for another review?

@Peter-Metz Peter-Metz changed the title [WIP] Update Tax-Calculator with latest CPS file and CPS weights Update Tax-Calculator with latest CPS file and CPS weights Aug 19, 2020
@Peter-Metz
Copy link
Contributor Author

@MattHJensen done

@MattHJensen
Copy link
Contributor

Thanks @Peter-Metz. Merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants