-
Notifications
You must be signed in to change notification settings - Fork 24
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Split simulations into chunks (#1938)
* Add progress * Add progress * Progress * Add progress in adding parallelisation! * Add final fixes * Add two workers * Add error handling * Fix bug * Add error handling strength * Fix bug * Turn down memory to 32gb * Fix bug * Fix bugs * Download microdata first
- Loading branch information
1 parent
4546ddb
commit 4301f2d
Showing
11 changed files
with
193 additions
and
18 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
- bump: minor | ||
changes: | ||
added: | ||
- Chunking and baseline/reform parallelisation. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
from policyengine_us_data import EnhancedCPS_2024, CPS_2024 | ||
from policyengine_uk_data import EnhancedFRS_2022_23 | ||
|
||
DATASETS = [EnhancedCPS_2024, CPS_2024, EnhancedFRS_2022_23] | ||
|
||
|
||
def download_microdata(): | ||
for dataset in DATASETS: | ||
dataset = dataset() | ||
if not dataset.exists: | ||
dataset.download() | ||
|
||
|
||
if __name__ == "__main__": | ||
download_microdata() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
import time | ||
from tqdm import tqdm | ||
import numpy as np | ||
|
||
|
||
def calc_chunks(variables=None, count_chunks=5, logger=None, sim=None): | ||
for i in range(len(variables)): | ||
if isinstance(variables[i], str): | ||
variables[i] = (variables[i], sim.default_calculation_period) | ||
variables = [ | ||
(variable, time_period) | ||
for variable, time_period in variables | ||
if variable in sim.tax_benefit_system.variables | ||
] | ||
if count_chunks > 1: | ||
households = sim.calculate("household_id", 2024).values | ||
chunk_size = len(households) // count_chunks + 1 | ||
input_df = sim.to_input_dataframe() | ||
|
||
variable_data = { | ||
variable: np.array([]) for variable, time_period in variables | ||
} | ||
|
||
for i in tqdm(range(count_chunks)): | ||
if logger is not None: | ||
pct_complete = i / count_chunks | ||
logger(pct_complete) | ||
households_in_chunk = households[ | ||
i * chunk_size : (i + 1) * chunk_size | ||
] | ||
chunk_df = input_df[ | ||
input_df["household_id__2024"].isin(households_in_chunk) | ||
] | ||
|
||
subset_sim = type(sim)(dataset=chunk_df, reform=sim.reform) | ||
subset_sim.default_calculation_period = ( | ||
sim.default_calculation_period | ||
) | ||
|
||
for variable, time_period in variables: | ||
chunk_values = subset_sim.calculate( | ||
variable, time_period | ||
).values | ||
variable_data[variable] = np.concatenate( | ||
[variable_data[variable], chunk_values] | ||
) | ||
|
||
for variable, time_period in variables: | ||
sim.set_input(variable, time_period, variable_data[variable]) | ||
|
||
return sim |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters