Empirical is the fastest way to test different LLMs and model configurations, across all the scenarios that matter for your application.
With Empirical, you can
- Run your test datasets locally against off-the-shelf or custom models
- Compare model outputs on a web UI, and test changes quickly
- Score your outputs with scoring functions
- Run tests on CI/CD
Empirical-TS-demo-video.mp4
Empirical bundles together a test runner and a web app. These can be used through the CLI in your terminal window.
Empirical relies on a configuration file, typically located at empiricalrc.js
which describes the test to run.
In this example, we will ask an LLM to extract entities from user messages and
give us a structured JSON output. For example, "I'm Alice from Maryland" will
become {name: 'Alice', location: 'Maryland'}
.
Our test will succeed if the model outputs valid JSON.
-
Use the CLI to create a sample configuration file called
empiricalrc.js
.npm init empiricalrun # For TypeScript npm init empiricalrun -- --using-ts
-
Run the example dataset against the selected models.
npx empiricalrun
This step requires the
OPENAI_API_KEY
environment variable to authenticate with OpenAI. This execution will cost $0.0026, based on the selected models. -
Use the
ui
command to open the reporter web app and see side-by-side results.npx empiricalrun ui
Edit the empiricalrc.js
file to make Empirical work for your use-case.
- Configure which models to use
- Configure your test dataset
- Configure scoring functions to grade output quality
See development docs.