pandas-validity

What is it?

pandas-validity is a Python library for the validation of pandas DataFrames. It provides a DataFrameValidator class that serves as a context manager. Within this context, you can perform multiple validations and checks. Any encountered errors are collected and raised at the end of the process. The DataFrameValidator raises a ValidationErrorsGroup exception to summarize the errors.

Installation

You can easily install the latest released version using binary installers from the Python Package Index (PyPI):

pip install pandas-validity

Usage

import pandas as pd
import datetime
from pandas_validity import DataFrameValidator

# Create a sample DataFrame
df = pd.DataFrame(
        {
            "A": [1, 2, 3],
            "B": ["a", None, "c"],
            "C": [2.3, 4.5, 9.2],
            "D": [
                datetime.datetime(2023, 1, 1, 1),
                datetime.datetime(2023, 1, 1, 2),
                datetime.datetime(2023, 1, 1, 3),
            ],
        }
    )

# Define your expectations and data type mappings
expected_columns = ['A', 'B', 'C', 'E']
data_types_mapping = {
            "A": 'float',
            "D": 'datetime'
        }

# Use DataFrameValidator for validation
with DataFrameValidator(df) as validator:
    validator.is_empty()
    validator.has_required_columns(expected_columns)
    validator.has_no_redundant_columns(expected_columns)
    validator.has_valid_data_types(data_types_mapping)
    validator.has_no_missing_data()

Output:

Error occurred: (<class 'pandas_validity.exceptions.ValidationError'>) The dataframe has missing columns: ['E']
Error occurred: (<class 'pandas_validity.exceptions.ValidationError'>) The dataframe has redundant columns: ['D']
Error occurred: (<class 'pandas_validity.exceptions.ValidationError'>) Column 'A' has an invalid data type: 'int64'
Error occurred: (<class 'pandas_validity.exceptions.ValidationError'>) Found 1 missing value: [{'index': 1, 'column': 'B', 'value': None}]
  + Exception Group Traceback (most recent call last):
...
  | pandas_validity.exceptions.ValidationErrorsGroup: Validation errors found: 4. (4 sub-exceptions)
  +-+---------------- 1 ----------------
    | pandas_validity.exceptions.ValidationError: The dataframe has missing columns: ['E']
    +---------------- 2 ----------------
    | pandas_validity.exceptions.ValidationError: The dataframe has redundant columns: ['D']
    +---------------- 3 ----------------
    | pandas_validity.exceptions.ValidationError: Column 'A' has an invalid data type: 'int64'
    +---------------- 4 ----------------
    | pandas_validity.exceptions.ValidationError: Found 1 missing value: [{'index': 1, 'column': 'B', 'value': None}]
    +------------------------------------

The library supports the following data types for validation:

predefined: "str", "int", "float","datetime", "bool"
or any Callable that accepts a data type/dtype object and returns a boolean value to indicate the validation status - example: pd.api.types.is_string_dtype

Development

Prerequisites: poetry for environment management

The source code is currently hosted on GitHub at ohmycoffe/pandas-validity. To get the development version:

git clone [email protected]:ohmycoffe/pandas-validity.git

To install the project and development dependencies:

make install

To run tests:

make test

To view all possible commands, use:

make help

License

This project is licensed under the terms of the MIT license.

Name	Name	Last commit message	Last commit date
Latest commit ohmycoffe build: bump setuptools + drop tox (#6 ) Nov 22, 2024 6c8868c · Nov 22, 2024 History 9 Commits
.github/workflows	.github/workflows	build: bump setuptools + drop tox (#6 )	Nov 22, 2024
src/pandas_validity	src/pandas_validity	build: docs and ci improvements (#3 )	Oct 18, 2023
tests	tests	build: docs and ci improvements (#3 )	Oct 18, 2023
.gitignore	.gitignore	feat: add first version of the lib (#1 )	Sep 13, 2023
.pre-commit-config.yaml	.pre-commit-config.yaml	feat: add first version of the lib (#1 )	Sep 13, 2023
.python-version	.python-version	build: bump setuptools + drop tox (#6 )	Nov 22, 2024
LICENSE	LICENSE	Initial commit	Sep 10, 2023
Makefile	Makefile	build: bump setuptools + drop tox (#6 )	Nov 22, 2024
README.md	README.md	docs: update readme	Oct 18, 2023
codecov.yml	codecov.yml	chore: integrate with codecov	Oct 18, 2023
poetry.lock	poetry.lock	build: bump setuptools + drop tox (#6 )	Nov 22, 2024
pyproject.toml	pyproject.toml	build: bump setuptools + drop tox (#6 )	Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pandas-validity

What is it?

Installation

Usage

Development

License

About

Releases 1

Packages

Contributors 2

Languages

License

ohmycoffe/pandas-validity

Folders and files

Latest commit

History

Repository files navigation

pandas-validity

What is it?

Installation

Usage

Development

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages