Human evaluators

Create a new evaluator
Apply a evaluator
View evaluation results

One major challenge in building products to automate more complex, nuanced tasks is that defining “good” and “bad” performance becomes a major bottleneck. While you may be able to define a few examples of “bad” behavior (e.g. AVs should try and avoid collisions, customer service chatbots should not reference stale information, medical chatbots should not misdiagnose patients), most of the interactions users will have with the system will be much more subjective. Human evaluation enable AI teams to define their own evaluation criteria and collect feedback from human annotators. Human evaluator has 3 types:

Boolean: True or False result.
Number: 1-5 scale.
Enum: Multiple options.

Create a new evaluator

Navigate to the Evaluators tab and click on the 'New evaluator' button

You will be prompted to select an evaluator type. Select the Human Evaluator.

Name the evaluator and select the evaluator type

Click on the 'Create' button

Apply a evaluator

Navigate to a trace or a LLM request, click on the “Evaluate” button on the top right. Fill the form and click on the “Submit” button.

View evaluation results

Now you can see the result inside the monitoring tabs and you can aggregate the results by the evaluation result.

Automatic Evaluators Overview

⌘I

Introduction

Prompt playground

Python SDK 2.0

Get started with SDK

Monitoring

Prompt templates

Evaluation

Testing

Datasets

Fine-tune

Integrations

Human evaluators

Create a new evaluator

Apply a evaluator

View evaluation results

Introduction

Prompt playground

Python SDK 2.0

Get started with SDK

Monitoring

Prompt templates

Evaluation

Testing

Datasets

Fine-tune

Integrations

​Create a new evaluator

​Apply a evaluator

​View evaluation results

Create a new evaluator

Apply a evaluator

View evaluation results