Skip to content

Commit

Permalink
Merge pull request #2698 from jdebacker/taxsim_validation
Browse files Browse the repository at this point in the history
TAXSIM-35 Validation, "a" files
  • Loading branch information
jdebacker authored Nov 6, 2023
2 parents 497d8ea + dd979a0 commit 6798fba
Show file tree
Hide file tree
Showing 16 changed files with 164 additions and 61 deletions.
2 changes: 1 addition & 1 deletion taxcalc/tests/test_stats_benchmark.csv
Original file line number Diff line number Diff line change
Expand Up @@ -332,4 +332,4 @@
273,taxcalc/tests/test_utils.py::test_read_egg_json,<function test_read_egg_json at 0x7f213a16ddc0>,passed,0.2509480000298936,,,,,,,,,,,,,,,,,,,,,,,,,,-0.11947699999836908
274,taxcalc/tests/test_utils.py::test_create_delete_temp_file,<function test_create_delete_temp_file at 0x7f213a16de50>,passed,0.2693619999263319,,,,,,,,,,,,,,,,,,,,,,,,,,-0.07966200018927339
275,taxcalc/tests/test_utils.py::test_bootstrap_se_ci,<function test_bootstrap_se_ci at 0x7f213a16dee0>,passed,0.42486200004532293,,,,,,,,,,,,,,,,,,,,,,,,,,-0.19468099981168052
276,taxcalc/tests/test_utils.py::test_table_columns_labels,<function test_table_columns_labels at 0x7f213a16df70>,passed,0.11546600012479757,,,,,,,,,,,,,,,,,,,,,,,,,,-0.03554399995664423
276,taxcalc/tests/test_utils.py::test_table_columns_labels,<function test_table_columns_labels at 0x7f213a16df70>,passed,0.11546600012479757,,,,,,,,,,,,,,,,,,,,,,,,,,-0.03554399995664423
21 changes: 21 additions & 0 deletions taxcalc/validation/taxsim35/Differences_Explained.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Explanations of known differences between Tax-Calculator and TAXSIM-35

This document explains the sources of known differences (that exceed $1) between Tax-Calculator and TAXSIM-35. Numerical differences are noted in the {letter}{year}-taxdiffs-actual.csv files in this directory.

## 2017
* No differences greater than $1 (though one obs with an marginal tax rate differences of 7.65 percent)

## 2018
* No differences greater than $1.


## 2019
* There is one record in the "a" file with a difference in the EITC amount of $196.22. This record is of a single, 19 year old filer. This person is below the age of 25 and therefore should receive $0 EITC, which is what Tax-Calculator reports. TAXSIM-35 does not recognize this age threshold and incorrectly assigns this person $196.22 in EITC.
* The same record has a marginal tax rate difference of 7.65 percent, which is the phase in rate for the EITC and thus related to the above issue.

## 2020
* Numerous records in the test files with differences in the recorvery rebate credit amount (RRC). The reasons TAXSIM-35 shows different results vary and include: TAXSIM-35 not counting qualifying children (e.g., file "a", id 7); TAXSIM-35 not differentiating single/head of household filing status (e.g., file "a",id 31); and TAXSIM-35 not counting Economic Impat Payment 2 (e.g., file "a",id 33); TAXSIM-35 counts wrong number of child (e.g., file "a",id 59). Note that some of these are not errors per se, but can be related to different variable inputs in the two models.


## 2021
* In 2021, the Additional Child Tax Credit (ACTC), which historically was the refundable portion of the CTC, was subsumbed by the refundability of the CTC more broadly with the ARPA. Tax-Calculator and TAXSIM-35 handle this differnetly in their model output. Tax-Calculator keeps only the ACTC amount in the variable `c11070`, which is $0 for all filers in 2021. On the other hand, TAXSIM-35 reports the refundable amount of the CTC (whih is equivalent to the ACTC in most years, but not 2021). Hence, we can expect differences in these two models due to different definitions of output variables in that year. The file `process_taxcalc_output.py` makes and adjustment for 2021 to make the output from both models more comparable.
26 changes: 0 additions & 26 deletions taxcalc/validation/taxsim35/a18-taxdiffs-expect.csv

This file was deleted.

26 changes: 0 additions & 26 deletions taxcalc/validation/taxsim35/a19-taxdiffs-expect.csv

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
,# of differing records,max_diff,max_diff_index,max_diff_taxsim_val,max_diff_taxcalc_val
iitax,2,0.5099999999999909,787,459.49,460.0
statetax,0,0.0,no diff,no diff,no diff
payrolltax,0,0.0,no diff,no diff,no diff
mtr_inctax,1,-7.649999999999999,787,17.65,10.0
mtr_state,0,0.0,no diff,no diff,no diff
c00100,0,0.0,no diff,no diff,no diff
e02300,0,0.0,no diff,no diff,no diff
c02500,0,0.0,no diff,no diff,no diff
post_phase_out_pe,0,0.0,no diff,no diff,no diff
phased_out_pe,57,3.637978807091713e-12,806,16601.76,16601.760000000002
c21040,0,0.0,no diff,no diff,no diff
c04470,0,0.0,no diff,no diff,no diff
c04800,0,0.0,no diff,no diff,no diff
taxbc,6,-0.010000000009313226,217,93239.71,93239.7
exemption_surtax,0,0.0,no diff,no diff,no diff
gen_tax_credit,0,0.0,no diff,no diff,no diff
non_refundable_child_odep_credit,0,0.0,no diff,no diff,no diff
c11070,0,0.0,no diff,no diff,no diff
c07180,0,0.0,no diff,no diff,no diff
eitc,1,-0.51,787,0.51,0.0
c62100,0,0.0,no diff,no diff,no diff
amt_liability,0,0.0,no diff,no diff,no diff
iitax_before_credits_ex_AMT,1,-1.4551915228366852e-11,362,65620.21,65620.20999999999
recovery_rebate_credit,0,0.0,no diff,no diff,no diff
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
,# of differing records,max_diff,max_diff_index,max_diff_taxsim_val,max_diff_taxcalc_val
iitax,1,-0.009999999999990905,499,-97.48,-97.49
statetax,0,0.0,no diff,no diff,no diff
payrolltax,0,0.0,no diff,no diff,no diff
mtr_inctax,0,0.0,no diff,no diff,no diff
mtr_state,0,0.0,no diff,no diff,no diff
c00100,0,0.0,no diff,no diff,no diff
e02300,0,0.0,no diff,no diff,no diff
c02500,0,0.0,no diff,no diff,no diff
post_phase_out_pe,0,0.0,no diff,no diff,no diff
phased_out_pe,0,0.0,no diff,no diff,no diff
c21040,0,0.0,no diff,no diff,no diff
c04470,0,0.0,no diff,no diff,no diff
c04800,0,0.0,no diff,no diff,no diff
taxbc,0,0.0,no diff,no diff,no diff
exemption_surtax,0,0.0,no diff,no diff,no diff
gen_tax_credit,0,0.0,no diff,no diff,no diff
non_refundable_child_odep_credit,0,0.0,no diff,no diff,no diff
c11070,0,0.0,no diff,no diff,no diff
c07180,0,0.0,no diff,no diff,no diff
eitc,1,0.009999999999990905,499,97.48,97.49
c62100,0,0.0,no diff,no diff,no diff
amt_liability,0,0.0,no diff,no diff,no diff
iitax_before_credits_ex_AMT,0,0.0,no diff,no diff,no diff
recovery_rebate_credit,0,0.0,no diff,no diff,no diff
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
,# of differing records,max_diff,max_diff_index,max_diff_taxsim_val,max_diff_taxcalc_val
iitax,2,196.22,904,-196.22,0.0
statetax,0,0.0,no diff,no diff,no diff
payrolltax,0,0.0,no diff,no diff,no diff
mtr_inctax,1,-7.65,904,7.65,0.0
mtr_state,0,0.0,no diff,no diff,no diff
c00100,0,0.0,no diff,no diff,no diff
e02300,0,0.0,no diff,no diff,no diff
c02500,0,0.0,no diff,no diff,no diff
post_phase_out_pe,0,0.0,no diff,no diff,no diff
phased_out_pe,0,0.0,no diff,no diff,no diff
c21040,0,0.0,no diff,no diff,no diff
c04470,0,0.0,no diff,no diff,no diff
c04800,0,0.0,no diff,no diff,no diff
taxbc,0,0.0,no diff,no diff,no diff
exemption_surtax,0,0.0,no diff,no diff,no diff
gen_tax_credit,0,0.0,no diff,no diff,no diff
non_refundable_child_odep_credit,0,0.0,no diff,no diff,no diff
c11070,0,0.0,no diff,no diff,no diff
c07180,0,0.0,no diff,no diff,no diff
eitc,2,-196.22,904,196.22,0.0
c62100,0,0.0,no diff,no diff,no diff
amt_liability,0,0.0,no diff,no diff,no diff
iitax_before_credits_ex_AMT,0,0.0,no diff,no diff,no diff
recovery_rebate_credit,0,0.0,no diff,no diff,no diff
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
,# of differing records,max_diff,max_diff_index,max_diff_taxsim_val,max_diff_taxcalc_val
iitax,135,-4000.0,828,25257.0,21257.0
statetax,0,0.0,no diff,no diff,no diff
payrolltax,0,0.0,no diff,no diff,no diff
mtr_inctax,62,5.0,14,24.0,29.0
mtr_state,0,0.0,no diff,no diff,no diff
c00100,0,0.0,no diff,no diff,no diff
e02300,0,0.0,no diff,no diff,no diff
c02500,0,0.0,no diff,no diff,no diff
post_phase_out_pe,0,0.0,no diff,no diff,no diff
phased_out_pe,0,0.0,no diff,no diff,no diff
c21040,0,0.0,no diff,no diff,no diff
c04470,0,0.0,no diff,no diff,no diff
c04800,0,0.0,no diff,no diff,no diff
taxbc,0,0.0,no diff,no diff,no diff
exemption_surtax,0,0.0,no diff,no diff,no diff
gen_tax_credit,0,0.0,no diff,no diff,no diff
non_refundable_child_odep_credit,0,0.0,no diff,no diff,no diff
c11070,0,0.0,no diff,no diff,no diff
c07180,0,0.0,no diff,no diff,no diff
eitc,1,0.009999999999990905,292,368.93,368.94
c62100,0,0.0,no diff,no diff,no diff
amt_liability,0,0.0,no diff,no diff,no diff
iitax_before_credits_ex_AMT,0,0.0,no diff,no diff,no diff
recovery_rebate_credit,133,4000.0,828,150.0,4150.0
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
,# of differing records,max_diff,max_diff_index,max_diff_taxsim_val,max_diff_taxcalc_val
iitax,6,-1.5300000000006548,104,-9933.15,-9934.68
statetax,0,0.0,no diff,no diff,no diff
payrolltax,0,0.0,no diff,no diff,no diff
mtr_inctax,1,80.1,150,30.9,111.0
mtr_state,0,0.0,no diff,no diff,no diff
c00100,0,0.0,no diff,no diff,no diff
e02300,0,0.0,no diff,no diff,no diff
c02500,0,0.0,no diff,no diff,no diff
post_phase_out_pe,0,0.0,no diff,no diff,no diff
phased_out_pe,0,0.0,no diff,no diff,no diff
c21040,0,0.0,no diff,no diff,no diff
c04470,0,0.0,no diff,no diff,no diff
c04800,0,0.0,no diff,no diff,no diff
taxbc,0,0.0,no diff,no diff,no diff
exemption_surtax,0,0.0,no diff,no diff,no diff
gen_tax_credit,0,0.0,no diff,no diff,no diff
non_refundable_child_odep_credit,0,0.0,no diff,no diff,no diff
c11070,0,0.0,no diff,no diff,no diff
c07180,0,0.0,no diff,no diff,no diff
eitc,6,1.5300000000000011,136,56.15,57.68
c62100,0,0.0,no diff,no diff,no diff
amt_liability,0,0.0,no diff,no diff,no diff
iitax_before_credits_ex_AMT,0,0.0,no diff,no diff,no diff
recovery_rebate_credit,0,0.0,no diff,no diff,no diff
16 changes: 10 additions & 6 deletions taxcalc/validation/taxsim35/main_comparison.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,11 @@
import pandas as pd
import tc_sims

CUR_PATH = os.path.abspath(os.path.dirname(__file__))
# check if directory exists, if not create it
if not os.path.isdir(os.path.join(CUR_PATH, "actual_differences")):
os.mkdir(os.path.join(CUR_PATH, "actual_differences"))


def main(letter, year):
# (1) generate TAXSIM-35-formatted output using Tax-Calculator tc CLI
Expand All @@ -26,8 +31,6 @@ def main(letter, year):
# skipinitialspace=True,
index_col=0,
)
print("tax sim head = ", taxsim_df.head())
print("tax calc head = ", taxcalc_df.head())

taxsim_out_cols_map = {
"taxsimid": "RECID",
Expand Down Expand Up @@ -99,7 +102,7 @@ def main(letter, year):
# delim_whitespace=True,
index_col=False,
)
with pd.ExcelWriter(f"{letter}{year}differences.xlsx") as writer:
with pd.ExcelWriter(os.path.join(CUR_PATH, "actual_differences", f"{letter}{year}differences.xlsx")) as writer:
# use to_excel function and specify the sheet_name and index
# to store the dataframe in specified sheet
taxsim_df.to_excel(writer, sheet_name="taxsim", index=False)
Expand Down Expand Up @@ -135,8 +138,9 @@ def main(letter, year):
print(actual_df)

# (3) check for difference between LYY.taxdiffs-actual and LYY.taxdiffs-expect
if os.path.isfile(f"{letter}{year}-taxdiffs-expect.csv"):
expect_df = pd.read_csv(f"{letter}{year}-taxdiffs-expect.csv", index_col=0)
expected_file_name = os.path.join(CUR_PATH, "expected_differences", f"{letter}{year}-taxdiffs-expect.csv")
if os.path.isfile(expected_file_name):
expect_df = pd.read_csv(expected_file_name, index_col=0)

print(actual_df.eq(expect_df))

Expand All @@ -148,7 +152,7 @@ def main(letter, year):
print("This EXPECT file doesn't exist.")

# (4) Write the created df to *.taxdiffs-actual
actual_df.to_csv(f"{letter}{year}-taxdiffs-actual.csv")
actual_df.to_csv(os.path.join(CUR_PATH, "actual_differences", f"{letter}{year}-taxdiffs-actual.csv"))


if __name__ == "__main__":
Expand Down
2 changes: 2 additions & 0 deletions taxcalc/validation/taxsim35/process_taxcalc_output.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,5 +118,7 @@ def write_taxsim_formatted_output(filename, tcvar):
"recovery_rebate_credit"
]
]
# better mapping of to how TAXSIM-35 handles refundalbe credits in 2021
tcvar.loc[tcvar["FLPDYR"] == 2021, "c11070"] = tcvar.loc[tcvar["FLPDYR"] == 2021, "non_refundable_child_odep_credit"]
tcvar.round(decimals=2)
tcvar.to_csv(filename)
7 changes: 5 additions & 2 deletions taxcalc/validation/taxsim35/tests_35.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,12 @@
for year in years:
main_comparison.main(letter, year)

# clean up taxcalc files
# keep taxsim files to avoid download again
# clean up files
for file in CUR_PATH:
for file in glob.glob("*.out*") and glob.glob("*.in*"):
if file.endswith("taxcalc"):
os.remove(file)
if file.endswith("taxsim"):
os.remove(file)
for file in glob.glob("*.in"):
os.remove(file)

0 comments on commit 6798fba

Please sign in to comment.