Knowledge agents

EARLY ACCESS

Use the Quality center to test your knowledge agents before going live. Upload sample questions to verify agent responses and use the results to identify issues and improve agent performance.

To access, go to AI Agents > Quality center > Knowledge agents.

For aggregated performance metrics and historical analytics across all your knowledge agents, see Knowledge agent analytics.

Here, you can search for tasks using the search bar, run a new task, and review existing tasks in a table that includes the following information:

Task name
Knowledge agent
Created at
Status
Option to download the .csv file with test details

Process overview

Select the knowledge agent you want to test.
Create and run the task.
Check the status of the task.
When the task is complete, view the results.
Update the knowledge agent as required.

Test

Test the knowledge agent with sample questions to verify it responds as expected. Upload a .csv file with questions, run the task, and review the results.

Prepare test questions

Create a .csv file with sample questions that represent real end user queries.

Example questions for a car dealership agent:

What car models do you have available?
Do you sell new and used cars?
Can I trade in my old car?
What financing options do you provide?

To verify answer accuracy, use ground truth evaluation by adding a reference_answer column alongside each question.

Run the task

On the Infobip web interface, go to AI Agents > Quality center > Knowledge Agents.
Select Run new task to test how your knowledge agent responds to a set of questions. You can review its answers and see which sources it used and why.
In the pop-up:
1. Enter a task name.
2. Select the knowledge agent from the drop-down menu.
3. If needed, download the .csv file with example questions.
4. Upload your .csv file containing the questions.
Select Run task.

Ground truth evaluation (optional)

Ground truth evaluation compares agent responses against expected answers you provide.

Add two columns to your .csv file:

question
reference_answer

When the test runs, each generated answer receives one of these labels:

GOOD - The answer matches the expected answer
INCOMPLETE - Correct overall but missing some details from the expected answer
BAD - Contains incorrect or fabricated information

The label and reasoning for each answer are included in the results file.

View task status

You can view all tasks in the Tasks table section. This section shows a maximum of the 20 most recent tasks of each type.

The following task statuses are available:

Completed: The task is finished and the results are ready to download.
Failed: The task did not complete successfully. Hover over the information icon next to the status to view details.
In progress: The task is currently running.

View the results

When the task is complete, select the linked task name to view the results, or download the .csv file from the Tasks section.

Results analytics

Select the linked task name from the task table to open the results analytics page.

This page includes the following information:

Task name
Date of creation
Tabs for Overview, Expected behavior, and Knowledge gaps

Overview

The Overview tab displays key information such as top problems, statistics, insights, and response latency.

Statistics

The Statistics section includes the following metrics:

Metric	Description
Total interactions	Total number of interactions with knowledge agent.
Average interactions per session	Average number of messages exchanged per session.
Average interactions per user	Average number of messages exchanged per unique user.
Expected behavior rate	Percentage of interactions where the knowledge agent behaved as expected, including successful answers, policy-restricted responses, and not relevant questions.
Knowledge gaps	Percentage of interactions where the knowledge agent could not provide a complete and correct answer due to missing or insufficient documentation. Includes hallucination risks, partially answered questions, and unanswered questions.
Average response time	Average time it takes for the knowledge agent to respond to a user question.
Documents retrieved	Number of distinct documents retrieved to answer user questions.

Insight distribution

The Insights distribution section provides a breakdown of insight categories:

Not answered
Partially answered
Successful
Policy restricted
Not relevant
Hallucination risk

Response latency

The Response latency section shows latency based on configurable settings:

Time range:
- Hourly
- Daily
- Date
Metrics:
- Minimum
- Average
- Maximum
- 25th percentile
- 50th percentile (median)
- 75th percentile
- 90th percentile
- Option to reset to default metrics (Minimum, Average, 90th percentile)

Expected behavior

The Expected behavior tab includes the following metrics:

Metric	Description
Successful	Interactions where the knowledge agent provided correct and useful answers.
Policy restricted	Interactions where the knowledge agent correctly restricted the response based on documentation or system prompt instructions.
Not relevant	Interactions where the user question was not relevant to the knowledge agent. Includes greetings, goodbyes, and queries outside the supported scope.
Partially answered	Interactions where the knowledge agent provided incomplete answers due to insufficient documentation coverage.

Below these metrics, there are tabs with the same names, along with the number of topics for each.

You can select a question to view how the knowledge agent responded, and use the Show context resources option to see the sources used for the answer.

Knowledge gaps

The Knowledge gaps tab includes the following metrics:

Metric	Description
Hallucination risk	Interactions where the knowledge agent introduced assumptions not grounded in documentation or system instructions, which may lead to incorrect or inaccurate answers.
Not answered	Interactions where the knowledge agent could not resolve the user’s question due to missing documentation.

Below these metrics, there are tabs with the same names, along with the number of topics for each.

When you select a topic, you can view the Classification reasoning and the Knowledge agent reply, and use the Show context resources option to see the sources used.

Download task results

You can download task results as a .csv file for further analysis and processing. The downloaded file contains comprehensive data about each interaction, including:

Questions and answers
Context sources used for responses
Performance metrics such as latency and response time
Classification results when using ground truth evaluation
Topic detection and question analysis
Content filter results (when enabled)

The CSV format allows you to perform custom analysis, share results with your team, or integrate the data into external reporting tools.

Update the agent based on results

Use the results to identify and fix issues. Investigate the root cause before making changes.

Issue	Solution
Agent answered correctly but the answer is not in the context	The prompt likely contains the answer. If the answer is wrong, check the knowledge source content.
Agent uses an inappropriate tone	Review and update the prompt in agent settings.
Agent responds in the wrong language	Review and update the prompt in agent settings.
Agent responses are cut off	Increase the output tokens setting.
Agent hallucinates	The knowledge source is likely missing relevant content. Add it.
Agent answers out-of-scope questions	Tighten the prompt to define scope and limitations in agent settings.

For an overview of testing approaches, see Test knowledge agent.

Review test results Analyze