Skip to content

PAR Diagnostic is not 1.0 for datetime context columns #2018

Closed
@npatki

Description

@npatki

Environment Details

Please indicate the following details about the environment in which you found the bug:

  • SDV version: 1.12.0
  • Python version: 3.12
  • Operating System: Linux

Error Description

As originally described by @Ng-ms in #2004: When there was a datetime context column, the min/max bounds for the synthesized data were outside the observed range from the real data. This is causing the BoundaryAdherence score to be <1.0 for that context column.

Steps to reproduce

Note that the dataset is not available for privacy reasons. The SDV team will try to replicate this with SDV demo data.

min_max_scaler = MinMaxScaler()
df[numeric_columns] = min_max_scaler.fit_transform(df[numeric_columns])
df[date_columns] = df[date_columns].apply(pd.to_datetime,format='%d/%m/%Y', errors= 'coerce')
df['pre_date'] = pd.to_datetime(df['pre_date'], unit= 'ns').astype(int)
metadata.set_sequence_index(column_name='visit_date')
synthesizer = PARSynthesizer(metadata,epochs=1000, context_columns= ['pre_date',sex,'Cod',], verbose=True, enforce_min_max_values=True, enforce_rounding=True, cuda=True)
synthesizer.fit(df)
synthetic_data = synthesizer.sample(num_sequences=4000,sequence_length=None)

Diagnostic score output:
image

For this issue let's just focus on the fact that context column pre_date has a score <1.0. There is a separate issue for the sequence index visit_date.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions