You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The PUF currently splits income between spouses by taking the average split from the CPS by income source.
By compressing the heterogeneity in income splits across couples, this results in significant underestimation of the impact of reforms that individualize tax programs. For example, we found that it understated the cost of Scott Winship's proposal to individualize the EITC by about 2/3: https://policyengine.org/us/blog/winship-individualized-eitc
I'd suggest applying some sort of stochasticity to this imputation. Other imputations are stochastic in some way, e.g. by selecting a value depending on the mean and standard deviation of a distribution. PolicyEngine uses quantile regression forests instead, which I've found to be more accurate. But I'd expect that stochasticity will be more important for this issue than, for example slicing the data more granularly, which would still compress the distribution.
The text was updated successfully, but these errors were encountered:
Interesting analysis and finding re the income splits @MaxGhenis!
In the article, "the Tax-Calculator project determines the average split of income between filer and spouse from the CPS, and applies that equally to PUF records." is not quite accurate -- the taxdata project does this split. Tax-Calculator just computes things tax liability given some data.
The PUF currently splits income between spouses by taking the average split from the CPS by income source.
By compressing the heterogeneity in income splits across couples, this results in significant underestimation of the impact of reforms that individualize tax programs. For example, we found that it understated the cost of Scott Winship's proposal to individualize the EITC by about 2/3: https://policyengine.org/us/blog/winship-individualized-eitc
I'd suggest applying some sort of stochasticity to this imputation. Other imputations are stochastic in some way, e.g. by selecting a value depending on the mean and standard deviation of a distribution. PolicyEngine uses quantile regression forests instead, which I've found to be more accurate. But I'd expect that stochasticity will be more important for this issue than, for example slicing the data more granularly, which would still compress the distribution.
The text was updated successfully, but these errors were encountered: