chapter five

5 Detecting patterns with unsupervised learning

This chapter covers

Getting the virtual assistant to find patterns in the data
Pro-actively detecting anomalies in application logs
Removing noise from the data to reduce its size

In the previous chapter, we built a basic virtual assistant application trained using supervised shallow learning techniques to perform simple but useful tasks. In this chapter, we will expand its capabilities by adding some new features powered by unsupervised learning techniques. This will showcase the unsupervised learning capabilities of ML.NET.

One of the scenarios where unsupervised learning is appropriate is being able to detect patterns in unstructured data. This is done via a technique known as clustering which we already briefly covered in chapter 3. This is where records are assigned to clusters based on their similarities.

One of the uses of such a technique for a virtual software development assistant is the ability to find similarities in software errors based on their stack trace or error message to quickly find out if the error we are investigating is likely to be related to other errors in the system that we already solved in the past. Another good use is to help us to pre-process training data for a machine learning task. For example, we can find similarities in data to turn it into a labeled training dataset for multiclass classification. In this chapter, we will cover scenarios similar to these.

5.1 Detecting patterns in the data

5.1.1 Adding clustering code

5.1.2 Adding the model consumption code

5.1.3 Adding the model to the virtual assistant

5 Detecting patterns with unsupervised learning

This chapter covers

5.1 Detecting patterns in the data

5.1.1 Adding clustering code

5.1.2 Adding the model consumption code

5.1.3 Adding the model to the virtual assistant

5.1.4 Testing clustering functionality

5.2 Detecting anomalies

5.2.1 Adding anomaly detection code

5.2.2 Adding anomaly detector to the virtual assistant

5.2.3 Testing our anomaly detection logic

5.3 Removing noise in the data

5.4 Project: building our own clustering models

5.4.1 Building a real estate categorization model

5.4.2 Building a code smells categorization model

5.5 Summary