When running evaluation on an LLM completion, baserun will automatically create a trace to associate the evaluation with the completion.
So its recommend to use the baserun.trace decorator when evaluating completions so each trace will have a proper name.
In the following example, we use the includes evaluator as an example.
import baserunimport openai@baserun.tracedefexample(): response = openai.chat.completions.create( model="gpt-3.5-turbo", temperature=0.7, messages=[{"role":"user","content":"What are three activities to do in Paris?"}],) output = response.choices[0].message.content baserun.eval.includes("include_eiffel_tower", output,"Eiffel Tower")return outputif __name__ =="__main__": baserun.api_key = YOUR_BASERUN_API_KEY_HERE openai.api_key = YOUR_OPEANI_API_KEY_HERE baserun.init()print(example())
You can use evaluation in combination with Advanced tracing features such as annotation. They allow you to automatically evaluate the output of your code and record the results in the Baserun dashboard alongside completions and custom logs.
When you run an eval within a trace, you will see it displayed in the Trace Details panel.
import baserunimport openaidefget_activities(): response = openai.chat.completions.create( model="gpt-3.5-turbo", temperature=0.7, messages=[{"role":"user","content":"What are three activities to do on the Moon?"}],)return response.choices[0].message.content@baserun.tracedeffind_best_activity(): moon_activities = get_activities() response = openai.chat.completions.create( model="gpt-3.5-turbo", temperature=0.7, messages=[{"role":"user","content":"Pick the best activity to do on the moon from the following, including a convincing reason to do so.\n + {moon_activities}"}],) output = response.choices[0].message.content baserun.eval.includes("include_eiffel_tower", output,"Eiffel Tower")return outputif __name__ =="__main__": baserun.api_key = YOUR_BASERUN_API_KEY_HERE openai.api_key = YOUR_OPEANI_API_KEY_HERE baserun.init()print(find_best_activity())