chapter five

5 Improving weak understanding

This chapter covers

Identifying the types of errors a classifier can make
Establishing a baseline of current classifier performance
Using data science methodologies to identify and prioritize improvements
Infusing your traditional AI with generated content to enhance understanding

In this chapter, we will go on a journey of iterative improvement cycles to improve the understanding (or classifier performance) of a conversational solution. Although data science techniques are used, you do not need to be a data scientist to extract meaningful insights about your data using the methodologies presented in this chapter.

5.1 Building your improvement plan

If you built a blind test set using a sample from your production logs, you should have a reliable "representative distribution" test set. This means that the topics that are most frequently asked by your users are represented with corresponding volume in your testing data. This will be a key factor in prioritizing any issues that are surfaced by your test results.

If you are working with the results of a k-fold test (refer to Chapter 4), you won't know for certain which topics are the most important, so the most egregious accuracy scores are a logical starting point. In either case, it's now time to dig into those test results. An improvement plan starts with identifying the biggest problem spots in the bot’s training.

5.1.1 Identify problematic patterns in misunderstood utterances

5.1.2 Incremental improvements

5.1.3 Where to start: identifying the biggest problems

5.1.4 Exercises

5.2 Solving “wrong intent matched”

5.2.1 Improve recall for one intent

5.2.2 Improve precision for one intent

5.2.3 Improve the f-score for one intent

5.2.4 Improve precision & recall for multiple intents

5.2.5 Exercises

5.3 Solving “no intent matched”

5.3.1 Clustering utterances for new intents

5.3.2 When to stop adding intents

5.3.3 Exercises

5.4 Supplementing traditional AI with generative content

5.4.1 Prompting to convey understanding

5.4.2 Exercises

5.5 Summary