You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Whether empty fields should be treated as NULL. Default depends on the kind of the source connection: false for database connections, true for storage connections.
2) The empty_as_null: true default does not make sense if a file format supports NULLs
Most reasonable formats (such as parquet) support missing values, just like databases.
For such formats it's unexpected that empty strings are silently loaded as NULLs.
Perhaps if empty_as_null is not specified explicitly in the configuration and defaults to true, sling could warn about this behavior (maybe after this transformation was applied at least once?) suggesting the user explicitly chooses either value:
Empty strings from file sources are loaded as NULL values by default. Set empty_as_null source option explicitly to silence this warning.
3) The empty_as_null: true default is surprising even for CSV
While makes some sense in general, consider this scenario:
First copy a table using sling from SRC_DB -> TGT_DB
** this preserves NULL / empty string distinction, as expected
Then (to take a snapshot to work with) change the pipeline to: SRC_DB -> CSV -> TGT_DB
** this exports both NULLs and empty strings as empty strings in CSV, and uploads them to TGT_DB as NULLs
This would work if there was a setting for a target acting as a reverse of nullif: instead of NULL, write the specified placeholder.
Then if the same non-empty placeholder would be specified for nullif and the new setting, both NULLs and empty strings would survive the load process.
For my use-cases I'll just use parquet.
The text was updated successfully, but these errors were encountered:
1)
empty_as_null
documentation is incompletehttps://docs.slingdata.io/concepts/replication/source-options#specification says
In reality this depends on the "kind" of a connection: database connections default to
false
, while file connections default totrue
(source).Perhaps the docs could be changed to:
2) The
empty_as_null: true
default does not make sense if a file format supports NULLsMost reasonable formats (such as parquet) support missing values, just like databases.
For such formats it's unexpected that empty strings are silently loaded as NULLs.
Perhaps if
empty_as_null
is not specified explicitly in the configuration and defaults totrue
, sling could warn about this behavior (maybe after this transformation was applied at least once?) suggesting the user explicitly chooses either value:3) The
empty_as_null: true
default is surprising even for CSVWhile makes some sense in general, consider this scenario:
** this preserves NULL / empty string distinction, as expected
** this exports both NULLs and empty strings as empty strings in CSV, and uploads them to TGT_DB as NULLs
This would work if there was a setting for a target acting as a reverse of
nullif
: instead of NULL, write the specified placeholder.Then if the same non-empty placeholder would be specified for
nullif
and the new setting, both NULLs and empty strings would survive the load process.For my use-cases I'll just use parquet.
The text was updated successfully, but these errors were encountered: