default empty_as_null behavior for non-database sources #520

nickolay · 2025-03-05T15:35:36Z

1) empty_as_null documentation is incomplete

https://docs.slingdata.io/concepts/replication/source-options#specification says

empty_as_null: Whether empty fields should be treated as NULL. Default is true.

In reality this depends on the "kind" of a connection: database connections default to false, while file connections default to true (source).

Perhaps the docs could be changed to:

Whether empty fields should be treated as NULL. Default depends on the kind of the source connection: false for database connections, true for storage connections.

2) The empty_as_null: true default does not make sense if a file format supports NULLs

Most reasonable formats (such as parquet) support missing values, just like databases.

For such formats it's unexpected that empty strings are silently loaded as NULLs.

Perhaps if empty_as_null is not specified explicitly in the configuration and defaults to true, sling could warn about this behavior (maybe after this transformation was applied at least once?) suggesting the user explicitly chooses either value:

Empty strings from file sources are loaded as NULL values by default. Set empty_as_null source option explicitly to silence this warning.

3) The empty_as_null: true default is surprising even for CSV

While makes some sense in general, consider this scenario:

First copy a table using sling from SRC_DB -> TGT_DB
** this preserves NULL / empty string distinction, as expected
Then (to take a snapshot to work with) change the pipeline to: SRC_DB -> CSV -> TGT_DB
** this exports both NULLs and empty strings as empty strings in CSV, and uploads them to TGT_DB as NULLs

This would work if there was a setting for a target acting as a reverse of nullif: instead of NULL, write the specified placeholder.

Then if the same non-empty placeholder would be specified for nullif and the new setting, both NULLs and empty strings would survive the load process.

For my use-cases I'll just use parquet.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

default empty_as_null behavior for non-database sources #520

default empty_as_null behavior for non-database sources #520

nickolay commented Mar 5, 2025

default empty_as_null behavior for non-database sources #520

default empty_as_null behavior for non-database sources #520

Comments

nickolay commented Mar 5, 2025