Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using a datetime column as a context column with PAR Synthesizer #2187

Open
MichaelG-Uke opened this issue Aug 15, 2024 · 2 comments
Labels
bug Something isn't working data:sequential Related to timeseries datasets

Comments

@MichaelG-Uke
Copy link

MichaelG-Uke commented Aug 15, 2024

Environment Details

Please indicate the following details about the environment in which you found the bug:

  • SDV version: 1.15
  • Python version: 3.12
  • Operating System: Linux

Error Description

Using datetime objects in a context column results in the following error:

ValueError: Error: Sampling terminated. No results were saved due to unspecified "output_file_path".
could not convert string to float: '2006-01-01'

Steps to reproduce

!pip install sdv==1.15.0

import pandas as pd
import random
from datetime import datetime, timedelta
from sdv.sequential import PARSynthesizer
from sdv.metadata import SingleTableMetadata

event_start_date = datetime(2024, 1, 1)
event_end_date = datetime(2024, 7, 1)
n = 10

start_dates = [(datetime(2023,9,1)).strftime('%Y-%m-%d') for _ in range(n)]
context_dates = [(event_start_date + timedelta(days=random.randint(0, (event_end_date - event_start_date).days))).strftime('%Y-%m-%d') for _ in range(n)]

s_key = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
val = [51, 53, 54, 55, 56, 12, 13, 14, 15, 16]

df = pd.DataFrame(
    {
        "Date": start_dates,
        "s_key": s_key,
        "val": val
    }
)

metadata = SingleTableMetadata()
metadata.detect_from_dataframe(data=df)
metadata.update_column(column_name='s_key', sdtype='id')
metadata.set_sequence_key(column_name="s_key")

synthesizer = PARSynthesizer(metadata, verbose=True, epochs=5,context_columns=["Date"])

event_context = pd.DataFrame(data={
    "Date": context_dates
})

synthesizer.fit(df)
synthesizer.sample_sequential_columns(context_columns=event_context)
@MichaelG-Uke MichaelG-Uke added bug Something isn't working new Automatic label applied to new issues labels Aug 15, 2024
@srinify srinify self-assigned this Aug 15, 2024
@srinify
Copy link
Contributor

srinify commented Aug 15, 2024

Thanks for raising this @MichaelG-Uke I ran into an error during the synthesizer.fit(df) step itself:

Screenshot 2024-08-15 at 11 38 04 AM

Did you run into your error during fit or during sampling?

@srinify srinify added under discussion Issue is currently being discussed data:sequential Related to timeseries datasets and removed new Automatic label applied to new issues labels Aug 15, 2024
@srinify srinify changed the title Using datetime as context column - cannot convert datetime correctly Error when using a datetime column as a context column Aug 15, 2024
@srinify srinify changed the title Error when using a datetime column as a context column Error when using a datetime column as a context column with PAR Synthesizer Aug 15, 2024
@srinify srinify removed the under discussion Issue is currently being discussed label Aug 27, 2024
@srinify
Copy link
Contributor

srinify commented Aug 27, 2024

I reproduced the error internally in this Colab Notebook: https://colab.research.google.com/drive/1SW5WxJgU5Y2ykmP0t793a5OE-LxKsw5H?authuser=1#scrollTo=sHSODwrsjwZ9

@srinify srinify removed their assignment Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data:sequential Related to timeseries datasets
Projects
None yet
Development

No branches or pull requests

2 participants