Closed
Description
Environment Details
Please indicate the following details about the environment in which you found the bug:
- SDV version: 1.12.0
- Python version: 3.12
- Operating System: Linux
Error Description
As originally described by @Ng-ms in #2004: When there was a datetime context column, the min/max bounds for the synthesized data were outside the observed range from the real data. This is causing the BoundaryAdherence score to be <1.0 for that context column.
Steps to reproduce
Note that the dataset is not available for privacy reasons. The SDV team will try to replicate this with SDV demo data.
min_max_scaler = MinMaxScaler()
df[numeric_columns] = min_max_scaler.fit_transform(df[numeric_columns])
df[date_columns] = df[date_columns].apply(pd.to_datetime,format='%d/%m/%Y', errors= 'coerce')
df['pre_date'] = pd.to_datetime(df['pre_date'], unit= 'ns').astype(int)
metadata.set_sequence_index(column_name='visit_date')
synthesizer = PARSynthesizer(metadata,epochs=1000, context_columns= ['pre_date',sex,'Cod',], verbose=True, enforce_min_max_values=True, enforce_rounding=True, cuda=True)
synthesizer.fit(df)
synthetic_data = synthesizer.sample(num_sequences=4000,sequence_length=None)
For this issue let's just focus on the fact that context column pre_date
has a score <1.0. There is a separate issue for the sequence index visit_date
.