Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Date/Datetime #113

Open
elmarx opened this issue Oct 20, 2023 · 1 comment
Open

Fix Date/Datetime #113

elmarx opened this issue Oct 20, 2023 · 1 comment
Assignees

Comments

@elmarx
Copy link
Member

elmarx commented Oct 20, 2023

A user reported something is wrong with dates/datetime, maybe invalid schema, maybe detection is off.

@elmarx elmarx self-assigned this Oct 20, 2023
@oemergenc
Copy link

Hi, something we observed in the schema generation of schema2000, which causes some trouble. It still might be correct what schema2000 does due to the heterogeneous nature of the underlaying data.
For some timestamp field, we got a schema like this:

{
    "name": "myDateTime2",
    "type": [
        {
            "type": "long",
            "logicalType": "timestamp-millis"
        },
        "null",
        "string"
    ]
}

The problematic part is the "string" part, which in turn complicates the analyzed data structure when used in BI tools (in this case Bigquery). See the following screenshot, where we wrote parquet data with a schema generated by schema2000.
Once we load the data in Bigquery and let Bigquery automatically derive the schema, this results look like the following screenshot:
Bildschirmfoto 2023-10-20 um 11 04 33

The underlying schema looks like this:

...
{
    "name": "myDateTime",
    "type": [
        {
            "type": "long",
            "logicalType": "timestamp-millis"
        },
        "null"
    ]
},
{
    "name": "myDateTime2",
    "type": [
        {
            "type": "long",
            "logicalType": "timestamp-millis"
        },
        "null",
        "string"
    ]
}

For myDateTime2 Bigquery inserted new subfields (e.g. member0, member1). These subfield in turn are resulting in more complicated queries. If we could avoid this, that would be really great.
Maybe one could set a command line parameter and schema2000 could somehow avoid adding the additional "string" type??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants