To validate the behavior of your code you can make assertions like you normally would to cause the test to pass or fail. In the following example, the test will fail if the string “Eiffel Tower” is not present in the LLM output.
Copy
Ask AI
import openaidef test_paris_trip(): response = openai.ChatCompletion.create( model="gpt-3.5-turbo", temperature=0.7, messages=[ { "role": "user", "content": "What are three activities to do in Paris?" } ], ) assert "Eiffel Tower" in response["choices"][0]["message"]["content"]
When working with LLMs, it is also helpful to gain more visibility into the outputs using a series of checks or evaluations. Evaluations capture information about the output but will not cause the test to fail by default. Eval results are then aggregated and displayed in your Baserun dashboard.
In the following example, we add an eval to check if the output has the phrase “AI Language Model”. See the evaluation documentation for more information.
Copy
Ask AI
import baserunimport openaidef test_paris_trip(): response = openai.ChatCompletion.create( model="gpt-3.5-turbo", temperature=0.7, messages=[ { "role": "user", "content": "What are three activities to do in Paris?" } ], ) output = response["choices"][0]["message"]["content"] not_has_ai_language_model = baserun.evals.not_includes("AI Language Model", output, ["AI language model"]) # Optional: will fail the test if "AI language model" is present in the output assert not_has_ai_language_model