9 Supervised machine learning with Random Forest and XGBoost

 

This chapter covers

  • Introducing supervised machine learning (ML) and how it relates to threat hunting
  • Applying supervised ML for threat hunting
  • The importance of training data sets in supervised ML
  • Acquiring and processing reliable training data sets
  • Practicing threat hunting with supervised ML
  • Evaluating and comparing supervised ML models
  • Comparing of supervised and unsupervised ML

Chapter 8 introduced unsupervised ML and used a k-means clustering model to group similar data points. Investigating events mapped to the small clusters led us to uncover malicious activities. In this chapter, we introduce supervised ML and compare it with unsupervised ML in the context of threat hunting. We identify the prerequisites of operating supervised ML effectively, some of which translate into operation challenges that threat hunters should be aware of.

9.1 Hunting DNS tunneling

9.2 Supervised machine learning

9.2.1 Acquiring the training data set

9.2.2 Analyzing the data set

9.2.3 Extracting the features

9.2.4 Analyzing the features

9.2.5 Reducing features

9.3 Random Forest

9.3.1 Generating the Random Forest model

9.3.2 Testing the Random Forest model

9.3.3 Hunting with the Random Forest model

9.3.4 Downloading DNS events and extracting features

9.3.5 Engaging the model

9.3.6 Investigating events

9.4 XGBoost

9.4.1 Generating the XGBoost model

9.4.2 Testing the XGBoost model

9.4.3 Hunting with the XGBoost model

9.5 Exercises

9.6 Answers to exercises

Summary