7 How accurate is your assistant?

 

This chapter covers

  • Collecting test data for your assistant
  • Assessing the accuracy of your assistant
  • Selecting the best accuracy metric(s) to use for your assistant

AI assistants make predictions based on the way they are trained. How can you tell whether this training is working well? You shouldn’t release an assistant into the wild if you don’t know how well it works. You need to be able to tell whether you are making the assistant smarter or dumber when you change the way you train it.

Fictitious Inc. wants to assess what the conversational AI assistant’s accuracy will be when it goes into production. The best way to test an assistant’s accuracy is to see how well it predicts intents in production, which poses an interesting conundrum. The company doesn’t want to go to production without reasonable accuracy, but it won’t know the assistant’s true accuracy until it is in production.

The best way to handle this conundrum is to train and test your assistant iteratively, as shown in figure 7.1. The virtuous cycle (gather data, train, test, and improve) provides a repeatable methodology to build and improve your AI assistant. It’s OK for Fictitious Inc. to start with imperfect data because the data can be continually improved. You have to start somewhere!

Figure 7.1 The virtuous cycle of testing how accurate your assistant is

7.1 Testing an AI assistant for accuracy

7.1.1 Testing a single utterance

7.1.2 Testing multiple utterances

7.1.3 Selecting a test data set

7.2 Comparing testing methodologies

7.2.1 Blind testing

7.2.2 k-folds cross-validation test

7.3 Selecting the right accuracy metric for the job

Summary