Offline evaluation via SDK
Introduction
We leverage existing testing frameworks like pytest and Jest so you don’t have to learn a new tool and can directly integrate your existing testing logic.
Features
- pytest, jest, vitest integration
- supports end-to-end testing
The Python SDK 2.0 does not currently support offline evaluation currently. Please install baserun==0.9.36 if you are using the Python SDK
Instruction
Install Baserun SDK
Generate an API key
Create an account at https://app.baserun.ai/sign-up. Then generate an API key for your project in the settings tab. Set it as an environment variable:
Or if using python, set baserun.api_key
to its value:
Add evaluators
Baserun offers a number of pre-built evaluators, as well as the ability to perform custom evaluations with your own prompt or function.
To add an evaluator, simply give the evaluator a name, use baserun.eval.evaluator_name to select which evaluator to use, and pass the expected input variables for the evaluator.
Each evaluator has its own set of expected input variables, so be sure to check the automatic evaluator documentation for the specific evaluator you are using.
In the following example, we use the not_includes
evaluator as an example.
Run test
Running multiple tests
It’s often helpful to exercise the same test over multiple examples. To do this:
In python, we suggest using the parametrize decorator and for larger numbers of examples you can read from a file or other data structure.
In TS/JS, you can use a simple for loop to autogenerate your tests and for larger numbers of examples you can read from a file or other data structure.
See the Testing section for more information.
Click on the link to open the offline evaluation run report in Baserun.
Demo projects
For your reference, we have example apps using LangChain to implement an agent: