You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For tables with lots of columns, apparently over 100, Soda does not check over 100 columns.
This is when using the latest version of soda-spark-core.
Having so many columns is, of course an uncommon and unlikely best practice, but does occur. In this case, we receive the data in this form from upstream, and wish to check only the 3 or 4 columns that we use. Selecting only the relevant columns would anticipate the check itself.
# replicating soda errorimportpolarsaspl# faux_lars is a polars fake data generation library I wrote, something like numpy/faker would work just as well probablyfromfaux_larsimportgenerate_lazyframefromsoda.scanimportScanrows=500cols= {"col_"+str(i) : "str"foriinrange(200)}
df=spark.createDataFrame(generate_lazyframe(cols, rows, "en").collect().to_pandas(use_pyarrow_extension_array=False))
yaml_str=f"""checks for example_table: - row_count > 0 - schema: name: Confirm that required columns are present fail: when required column missing: {str(list(cols.keys()))}"""scan=Scan()
check_name="example_scan"scan.set_data_source_name(check_name)
scan.add_spark_session(spark, check_name)
scan.add_sodacl_yaml_str(yaml_str)
df.createOrReplaceTempView("example_table")
scan.execute()
result=scan.build_scan_results()
print(result["hasFailures"])
result["logs"]
For tables with lots of columns, apparently over 100, Soda does not check over 100 columns.
This is when using the latest version of
soda-spark-core
.Having so many columns is, of course an uncommon and unlikely best practice, but does occur. In this case, we receive the data in this form from upstream, and wish to check only the 3 or 4 columns that we use. Selecting only the relevant columns would anticipate the check itself.
To recreate:
Soda records the schema measured as:
And the DQ failure is:
The text was updated successfully, but these errors were encountered: