This workshop aims to teach users about Feast, an open-source feature store.
We explain concepts & best practices by example, and also showcase how to address common use cases.
Feast is an operational system for managing and serving machine learning features to models in production. It can serve features from a low-latency online store (for real-time prediction) or from an offline store (for batch scoring).
- Feast does not orchestrate data pipelines (e.g. batch / stream transformation or materialization jobs), but provides a framework to integrate with adjacent tools like dbt, Airflow, and Spark.
- Feast also does not solve other commonly faced issues like data quality, experiment management, etc.
See more details at What Feast is not.
Feast solves several common challenges teams face:
- Lack of feature reuse across teams
- Complex point-in-time-correct data joins for generating training data
- Difficulty operationalizing features for online inference while minimizing training / serving skew
This workshop assumes you have the following installed:
- A local development environment that supports running Jupyter notebooks (e.g. VSCode with Jupyter plugin)
- Python 3.8+
- pip
- Docker & Docker Compose (e.g.
brew install docker docker-compose
)
- Docker & Docker Compose (e.g.
- Module 0 pre-requisites:
- Terraform (docs)
- Either AWS or GCP setup:
- AWS
- AWS CLI
- An AWS account setup with credentials via
aws configure
(e.g see AWS credentials quickstart)
- GCP
- GCP account
gcloud
CLI
- AWS
- Module 1 pre-requisites:
- Java 11 (for Spark, e.g.
brew install java11
)
- Java 11 (for Spark, e.g.
Since we'll be learning how to leverage Feast in CI/CD, you'll also need to fork this workshop repository.
Caveats
- M1 Macbook development is untested with this flow. See also How to run / develop for Feast on M1 Macs.
- Windows development has only been tested with WSL. You will need to follow this guide to have Docker play nicely.
See also: Feast quickstart, Feast x Great Expectations tutorial
These are meant mostly to be done in order, with examples building on previous concepts.
Time (min) | Description | Module |
---|---|---|
30-45 | Setting up Feast projects & CI/CD + powering batch predictions | Module 0 |
15-20 | Streaming ingestion & online feature retrieval with Kafka, Spark, Airflow, Redis | Module 1 |
10-15 | Real-time feature engineering with on demand transformations | Module 2 |
30 | Orchestrated batch/stream transformations using dbt + Airflow with Feast | Module 3 (Snowflake) |
30 | (WIP) Orchestrated batch/stream transformations using dbt + Airflow with Feast | Module 3 (Databricks) |
30 | Book recommender system with dbt + Airflow + Feast | Feast x Book Recommendations (on Databricks) |