Pydantic is a Python library for data validation and settings management using Python type annotations. It allows Koheesio to bring in strong typing and a high level of type safety. Essentially, it allows Koheesio to consider configurations of a pipeline (i.e. the settings used inside Steps, Tasks, etc.) as data that can be validated and structured.
PySpark is a Python library for Apache Spark, a powerful open-source data processing engine. It allows Koheesio to handle large-scale data processing tasks efficiently.