concept golden value in category deep learning

appears as: golden values, golden values
Deep Learning with JavaScript: Neural networks in TensorFlow.js

This is an excerpt from Manning's book Deep Learning with JavaScript: Neural networks in TensorFlow.js.

12.1.2. Testing with golden values

In the previous section, we talked about the unit testing we can do without asserting on a threshold metric value or requiring a stable or convergent training. Now let’s explore the types of testing people often want to run with a fully trained model, starting with checking predictions of particular data points. Perhaps there are some “obvious” examples that you want to test. For instance, for an object detector, an input image with a nice big cat in it should be labeled as such; for a sentiment analyzer, a text snippet that’s clearly a negative customer review should be classified as such. These correct answers for given model inputs are what we refer to as golden values. If you follow the mindset of traditional unit testing blindly, it is easy to fall into the trap of testing trained machine-learning models with golden values. After all, we want a well-trained object detector to always label the cat in an image with a cat in it, right? Not quite. Golden-value-based testing can be problematic in a machine-learning setting because we’re usurping our training, validation, and evaluation data split.

Assuming you had a representative sample for your validation and test datasets, and you set an appropriate target metric (accuracy, recall, and so on), why is any one example required to be right more than another? The training of a machine-learning model is concerned with accuracy on the entire validation and test sets. The predictions for individual examples may vary with the selection of hyperparameters and initial weight values. If there are some examples that must be classified correctly and are easy to identify, why not detect them before asking the machine-learning model to classify them and instead use a non-machine-learning code to handle them? Such examples are used occasionally in natural language processing systems, where a subset of query inputs (such as frequently encountered and easily identifiable ones) are automatically routed to a non-machine-learning module for handling, while the remaining queries are handled by a machine-learning model. You’ll save on compute time, and that portion of the code is easier to test with traditional unit testing. While adding a business-logic layer before (or after) the machine-learning predictor might seem like extra work, it gives you the hooks to control overrides of predictions. It’s also a place where you can add monitoring or logging, which you’ll probably want as your tool becomes more widely used. With that preamble, let’s explore the three common desires for golden values separately.

sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest