Fixed:
- using
na_pct_below
from_df
now includes metadatafrom_df
now generates correctna_pct_below
(0.01) for full datasets #63
Changed:
- bumped minimum python version to 3.8
- Support for Python 3.11 #54
- Pydantic migrated to v2
- Allows use of Pandas v2
- Version in metadata
- adds
dfschema
andpandas
version in metadata upon generation (Later will worn if Schema is initialized from json, generated by later version)
- adds
- Renamed
na_limit
tona_pct_below
to make it unambiguous (with backward support) #64 - Added
optional=True
flag for columns. If true, does not raise exception if column is not present - added
dfschema update {existing_schema} {output_schema}
command to upgrade schemas
- relaxed Pydantic requirement to
>=1.9
- Pydantic bumped to
1.10
- Bug Fix: Categorical constraints (
exact_set
,oneof
,include
) now can keeoint
andfloat
values. That expands to legacy schemas as well.
Legacy Schema Aliases (support for legacy schemas):
min_value
now also supportsmin
aliasmax_value
now also supportsmax
aliasoneof
now also supportsone_of
aliasversion
is now correctly moved tometadata
from root on migration- If column schema has both
oneof
andincludes
and they are identical, will replace withexact_set
Testing:
- conftest code improved to showcase bad json on Exception
- multiple v1 schemas were added for testing
- pre-commit setup was updated
- rename
DfSchema.validate_df
toDfSchema.validate
(UNDONE:validate
is reserved by Pydantic object) - updated documentation
- `DfSchema.to_file`, `DfSchema.from_file` proper testing
- CLI command help texts
- added pre-commit install to the repo
- Some benchmarking
- renamed `dfs.validate_df` to `dfs.validate`
- fix column dtype generation/validation bug
- renamed strict_column_set to additionalColumns
- renamed strict_column_order to exactColumnOrder
- Metadata SubObject
- Summary Exception is now collected for specific DfSchema, not via Borg State
- Supports SubSets
- Support reading and writing schemas as yaml
- added
validate_sql
method (based onpd.read_sql
for everything including dtype mapping) - added cli support for schema generation or validation
- support for subsets in
from_df
- support for
str_patterns
(string columns are matched against string prefix / regex patterns )
v1.1.0
- added support for "exact_set" (exact match of categorical values)
- better structure of tests and code
- added
summary
argument. If True, all tests will be ran and errors will be summarized inDataFrameSummaryError
exception. - re-enabled schema generation