Online evaluation

Evalaute an LLM completion.

When running evaluation on an LLM completion, baserun will automatically create a trace to associate the evaluation with the completion. So its recommend to use the baserun.trace decorator when evaluating completions so each trace will have a proper name. In the following example, we use the includes evaluator as an example.

import baserun
import openai

@baserun.trace
def example():
    response = openai.chat.completions.create(
        model="gpt-3.5-turbo",
        temperature=0.7,
        messages=[
            {
                "role": "user",
                "content": "What are three activities to do in Paris?"
            }
        ],
    )
    output = response.choices[0].message.content
    baserun.eval.includes("include_eiffel_tower", output, "Eiffel Tower")
    return output


if __name__ == "__main__":
    baserun.api_key = YOUR_BASERUN_API_KEY_HERE
    openai.api_key = YOUR_OPEANI_API_KEY_HERE

    baserun.init()
    print(example())

Evalaute a trace.

You can use evaluation in combination with Advanced tracing features such as annotation. They allow you to automatically evaluate the output of your code and record the results in the Baserun dashboard alongside completions and custom logs. When you run an eval within a trace, you will see it displayed in the Trace Details panel.

import baserun
import openai

def get_activities():
    response = openai.chat.completions.create(
        model="gpt-3.5-turbo",
        temperature=0.7,
        messages=[
            {
                "role": "user",
                "content": "What are three activities to do on the Moon?"
            }
        ],
    )
    return response.choices[0].message.content

@baserun.trace
def find_best_activity():
    moon_activities = get_activities()
    response = openai.chat.completions.create(
        model="gpt-3.5-turbo",
        temperature=0.7,
        messages=[
            {
                "role": "user",
                "content": "Pick the best activity to do on the moon from the following, including a convincing reason to do so.\n + {moon_activities}"
            }
        ],
    )
    output = response.choices[0].message.content
    baserun.eval.includes("include_eiffel_tower", output, "Eiffel Tower")
    return output


if __name__ == "__main__":
    baserun.api_key = YOUR_BASERUN_API_KEY_HERE
    openai.api_key = YOUR_OPEANI_API_KEY_HERE
    baserun.init()
    print(find_best_activity())

Introduction

Prompt playground

Python SDK 2.0

Get started with SDK

Monitoring

Prompt templates

Evaluation

Testing

Datasets

Fine-tune

Integrations

Online evaluation

Evalaute an LLM completion.

Evalaute a trace.

Introduction

Prompt playground

Python SDK 2.0

Get started with SDK

Monitoring

Prompt templates

Evaluation

Testing

Datasets

Fine-tune

Integrations

​Evalaute an LLM completion.

​Evalaute a trace.

Evalaute an LLM completion.

Evalaute a trace.