> ## Documentation Index
> Fetch the complete documentation index at: https://docs.baserun.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Offline evaluation via UI

<Frame>
  <img src="https://mintcdn.com/baserun/RoOPWwfCH85oSKT_/images/offline_testing_report.gif?s=06dfd145ce16cd84c1b5a719ee8e2afb" alt="Offline evaluation run report" width="1220" height="720" data-path="images/offline_testing_report.gif" />
</Frame>

Offline evaluation via UI also referring as bulk testing feature, this featrue allows
you to evaluate your prompts without writing any code. You can trigger evaluation
run through UI. No code is required to create or run these evaluations, they happen
completely within Baserun.

## Components of Offline evaluation via UI

An online evaluation is composed of three components:

* One or more [prompt template](/templates/overview) versions
* One or more model configurations
* A dataset of test cases. This is created by uploading a CSV file.

## Creating an Evaluation Run

On the left menu bar, click expand "Evaluation" and click "Evaluation Runs". Here you will see a list of previously-created runs. To create a new run, click "+ New evaluation run".

<Frame>
  <img src="https://mintcdn.com/baserun/sfThaUjl8v9XCswy/images/evaluation-runs.png?fit=max&auto=format&n=sfThaUjl8v9XCswy&q=85&s=e9f0d02a276d72e942df6dc006012b58" alt="Evaluation Runs" width="3570" height="1584" data-path="images/evaluation-runs.png" />
</Frame>

Then, you will be presented with a wizard to create your evaluation run. Here, we are going to be comparing the results between two prompt template versions.

First, you will select the prompt template versions to compare.

<Frame>
  <img src="https://mintcdn.com/baserun/sfThaUjl8v9XCswy/images/create-evaluation-prompt-version.png?fit=max&auto=format&n=sfThaUjl8v9XCswy&q=85&s=397ec216a2a140675ebabcc330c0142f" alt="Prompt Version" width="1618" height="1158" data-path="images/create-evaluation-prompt-version.png" />
</Frame>

Next, you will configure the model to be used in the evaluation.

<Frame>
  <img src="https://mintcdn.com/baserun/sfThaUjl8v9XCswy/images/create-evaluation-model-config.png?fit=max&auto=format&n=sfThaUjl8v9XCswy&q=85&s=58d1c1597fa21828b80e1b92feb55e82" alt="Model Config" width="1598" height="782" data-path="images/create-evaluation-model-config.png" />
</Frame>

Then, you will upload a CSV file containing your testing cases.

<Frame>
  <img src="https://mintcdn.com/baserun/sfThaUjl8v9XCswy/images/create-evaluation-upload-dataset.png?fit=max&auto=format&n=sfThaUjl8v9XCswy&q=85&s=a4c1ebb967bdd7d5b30ebadcab1462fd" alt="Upload Dataset" width="1480" height="632" data-path="images/create-evaluation-upload-dataset.png" />
</Frame>

Finally, you will select the evaluators you wish to run. Here, we will do a simple evaluation to ensure that the completion does not include "AI Language Model".

<Frame>
  <img src="https://mintcdn.com/baserun/sfThaUjl8v9XCswy/images/create-evaluation-select-evaluators.png?fit=max&auto=format&n=sfThaUjl8v9XCswy&q=85&s=86ee3ffb94e6f33f87fd9e054292b2ce" alt="Select Evaluators" width="1534" height="1058" data-path="images/create-evaluation-select-evaluators.png" />
</Frame>

The evaluation run is then run in the Baserun back-end, and the results will be available in a few seconds, depending on the number of test cases and whether the evaluations are model-graded.

<Frame>
  <img src="https://mintcdn.com/baserun/sfThaUjl8v9XCswy/images/create-evaluation-results.png?fit=max&auto=format&n=sfThaUjl8v9XCswy&q=85&s=178d8a38361a5898322e8ebdf41db8da" alt="View Results" width="2558" height="374" data-path="images/create-evaluation-results.png" />
</Frame>
