copyright | lastupdated | keywords | subcollection | ||
---|---|---|---|---|---|
|
2025-01-03 |
evaluation |
watson-assistant |
{{site.data.keyword.attribute-definition-list}}
{: #evaluating-the-assistant}
[Plus]{: tag-green} [Enterprise]{: tag-purple} [IBM Cloud Pak for Data]{: tag-cp4d}
{: #evaluating-the-assistant-overview}
You can evaluate and analyze the performance of your assistant by uploading a comprehensive, relevant collection of utterances and sending the utterances to your assistant in a test run.
You can use the Evaluate page of {{site.data.keyword.conversationshort}} to upload a collection of sample utterances and run through the utterances on your assistant in one test run.
When a test run completes, you can view a comprehensive evaluation result. It includes the response routing metrics, conversational search scores (if conversational search is enabled), and response details for any utterance in the uploaded collection. It also includes your assistant settings relevant to the test run.
Evaluation is supported for the draft environment only.
{: #evaluating-the-assistant-before}
To evaluate the Conversational search performance, in the Search Integration window, set the Conversational search toggle to On
. For more information see, Enabling Conversational search.
{: #evaluating-the-assistant-testing-limits}
You can run a maximum of:
- 4 tests per rolling 7-day week per assistant.
- 40 tests per rolling 7-day week per instance.
- 250 messages per test.
{: #evaluating-the-assistant-csv-format}
You must follow these criteria for the CSV file to be uploaded.
- The CSV file must have a single column that includes the text of the utterance to be sent.
- Each row is sent to your assistant sequentially.
- The CSV has no heading row.
{: #evaluating-the-assistant-csv-writing}
-
If your utterance is plain text, it can be written as is. For example,
This is a test utterance
can be written as:This is a test utterance
-
If your utterance contains a comma, you must wrap the line in quotation marks. For example:
Hi, this is a second utterance
must be written as"Hi, this is a second utterance"
-
If your quoted utterance contain quotation marks, those quotation marks must themselves be prefixed by another quotation mark. For example:
I have the "best" plan
must be written as"I have the ""best"" plan"
{: #evaluating-the-assistant-procedure}
To evaluate the response settings of your assistant, perform the following steps.
-
In the {{site.data.keyword.conversationshort}} home page, click Evaluate to open the Evaluate response settings.
-
Click Add file to select the data. You can upload test data set in .csv format.
-
Click Start.
{: #evaluating-the-assistant-conversational-search-scores}
Under Conversation search scores, you can view the scores for extractiveness, retrieval confidence, response confidence, average citations per response and average response length of the whole dataset. For more information, see Conversational search analytics.
{: #evaluating-the-assistant-settings}
Under Settings, you can view your assistant's settings.
{: #evaluating-the-assistant-filter-results}
You can filter the results based on the type of response routing. Click filter icon() and choose the type of response you want to display from the drop-down menu.
On the Response details table, by default you can see the response confidence for each message. Click the settings icon() and choose to display the extractiveness and retrieval confidence for each message from the drop-down menu.
{: #evaluating-the-assistant-export-results}
You can export and save the result of the evaluation. Click the export icon to export the evaluation result table to a .csv file.
The test result of the latest test run is preserved as per the same retention policy of the chat logs. You can also click the reset icon to delete the result at any time before the test result expires. The result is deleted for all the users.