Update your Python SDK to version 2.0 for enhanced stability and easier integration. Learn more

In addition to automatic evaluations and human evaluations, user feedback is an excellent source for assessing the quality of an LLM application. In Baserun, feedback is collected either as a 0-1 score or in the form of comments and is attached to a trace or an individual LLM request.

Use case

  • Evaluate chatbot response: You might want to ask user rate the chatbot response with a ”👍” / ”👎” or on a numeric scale (e.g. “1-5”) as a way to evaluate customer experience.
  • Create fine-tuning datasets: You might want to collecting positive responses use for fine-tuning your model.
  • Create benchmark testing datasets: Consider collecting negative responses to identify use cases that were not covered in previous development cycles. These user requests can then be added to regression testing suites to evaluate the performance of your next release.

Features

  • Collect score rating
  • Collect comments
  • Support user feedback on traces and LLM requests

User feedback is attached to an LLM request or a trace. Please start by following ‘Get Started with Logging LLM Requests’ or ‘Get Started with Tracing a Workflow’ first. Once you have successfully logged LLM requests or traces in Baserun, continue with the following instructions.

Capturing Annotations

Annotations are a term we use to refer to events and metadata that can be attached to a trace or an individual LLM request. Examples are logs, user feedback, evals, and checks.

User feedback is a type of annotation, alongside checks, automatic evaluations, and human evaluations. To collect customer feedback, you can use the baserun.annotate function.

You can customize your UI in different ways to collect user feedback. Whether it’s through a ”👍” / ”👎” or on a numeric scale (e.g. “1-5”), the customer’s score will be displayed as a 0-1 score on the Baserun dashboard. The end user also has the option to add a comment.

@baserun.trace
def ask_question(question="What is the capital of the US?") -> str:
    completion = client.chat.completions.create(
        model="gpt-4-1106-preview",
        messages=[{"role": "user", "content": question}],
    )
    content = completion.choices[0].message.content

    # Create the annotation
    annotation = baserun.annotate()

    # Capture the user feedback as an annotation
    annotation.feedback(
        name="annotate_feedback", score=0.8, metadata={"comment": "This is correct but not concise enough"}
    )
    annotation.check_includes("openai_chat.content", "Washington", content)
    annotation.log("OpenAI Chat Results", metadata={"result": content, "input": question})

    # Make sure to submit the annotation
    annotation.submit()

To associate these feedback a particular LLM request, you simply need to pass the completion ID from your LLM request. To do so using OpenAI’s SDK, you can do the following:

@baserun.trace
def ask_question(question="What is the capital of the US?") -> str:
    completion = client.chat.completions.create(
        model="gpt-4-1106-preview",
        messages=[{"role": "user", "content": question}],
    )
    content = completion.choices[0].message.content

    # Create the annotation
    annotation = baserun.annotate(completion.id)
    # Pass in the completion ID

    # Capture the user feedback as an annotation
    annotation.feedback(
        name="annotate_feedback", score=0.8, metadata={"comment": "This is correct but not concise enough"}
    )
    annotation.check_includes("openai_chat.content", "Washington", content)
    annotation.log("OpenAI Chat Results", metadata={"result": content, "input": question})

    # Make sure to submit the annotation
    annotation.submit()

Here is how the user feedback will look like in Baserun dashboard:

User feedback