-
Notifications
You must be signed in to change notification settings - Fork 0
Training models & generating data
Vladimir edited this page Feb 7, 2025
·
2 revisions
Now that we have the imputed dataset, training models and generating artificial data is straightforward. Place your file with imputed data imputed_data.csv
into the working data folder first and run the following code
import pandas as pd
from synthwave.synthesizer.postimputation.correction import correct_imputed_data
DATA_PATH = "~/Work/data/"
adults = pd.read_csv(DATA_PATH + "imputed_data.csv", dtype_backend="pyarrow")
adults = correct_imputed_data(adults)
This should correct inconsistencies introduced by imputation.
The following step is to set up the models and pass the data to them:
from synthwave.synthesizer.uk.generator import Syntets
generator = Syntets(adults)
generator.split_data()
generator.restructure_data()
# load dataset
children = pd.read_parquet(DATA_PATH + "children_non_imputed_middle_fidelity.parquet").drop(columns=["id_person"])
# convert data types
children[["ordinal_person_age", "category_person_ethnic_group"]] = children[["ordinal_person_age", "category_person_ethnic_group"]].astype("uint8[pyarrow]")
# drop households with incomplete records
crooked_records = pd.unique(children[children["category_person_ethnic_group"].isna()]["id_household"])
children = children[~children["id_household"].isin(crooked_records)]
# NOTE do not drop duplicates ever, this destroys twins
generator.train_children(children, verbose = True)
generator.drop_id_columns() # comes here because we need ids to learn how children are formed
generator.locate_degenerate_distributions()
generator.convert_types()
generator.init_models(_epochs=5)
generator.attach_constraints()
generator.train()
This parts allows to get models for generation of new households and children when needed.