You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
srinify
changed the title
PAR can't fit if Range constraint includes a sequence_index column
PAR can't fit if Range constraint includes a sequence_index column
Aug 12, 2024
If you have 3 datetime columns (e.g. FirstDate, EventDate, LatestDate) that you want to use in your Range constraint (so that synthetic EventDate values are between the other 2 columns), you can instead create date diff columns to replace FirstDate and LatestDate and model those directly in the SDV without using constraints at all.
Here's some example code that computes date diff columns:
# To replicate my sample data, use first half of the code in the issue body above
# Compute date diff columns, one for the lower bound and one for the upper bound
df['EventDate'] = pd.to_datetime(df['EventDate'])
df['LowerDiff'] = (pd.to_datetime(df['FirstDate']) - pd.to_datetime(df['EventDate'])).dt.days
df['UpperDiff'] = (pd.to_datetime(df['LatestDate']) - pd.to_datetime(df['EventDate'])).dt.days
# Make sure these columns are tagged as numerical in metadata
metadata.update_column(column_name='s_key', sdtype='id') # Sequence Key column
metadata.update_column(column_name='LowerDiff', sdtype='numerical')
metadata.update_column(column_name='UpperDiff', sdtype='numerical')
metadata.set_sequence_index(column_name="EventDate")
metadata.set_sequence_key(column_name="s_key")
synthesizer = PARSynthesizer(metadata2, verbose=True, epochs=5)
synthesizer.fit(df)
synthetic_data = synthesizer.sample(10)
# Cast to datetime if you prefer to keep EventDate as an Object / String dtype column
synthetic_data['FirstDate'] = pd.to_datetime(synthetic_data['EventDate']) + pd.to_timedelta(synthetic_data['LowerDiff'], unit='D')
synthetic_data['LatestDate'] = pd.to_datetime(synthetic_data['EventDate']) + pd.to_timedelta(synthetic_data['UpperDiff'], unit='D')
Environment Details
SDV version: 1.15.0 (Latest)
Error Description
If you try to fit a PARSynthesizer model with a Range constraint that includes a sequence_index column in the logic, you will get a KeyError.
Steps to reproduce
Error:
Colab Notebook to Reproduce
Colab Link
The text was updated successfully, but these errors were encountered: