This chapter covers
- What’s the real question you’re trying to answer?
- A machine learning scenario without trained data
- The difference between supervised and unsupervised machine learning
- Taking a deep dive into anomaly detection
- Using the Random Cut Forest algorithm
Brett works as a lawyer for a large bank. He is responsible for checking that the law firms hired by the bank bill the bank correctly. How tough can this be, you ask? Pretty tough is the answer. Last year, Brett’s bank used hundreds of different firms across thousands of different legal matters, and each invoice submitted by a firm contains dozens or hundreds of lines. Tracking this using spreadsheets is a nightmare.
In this chapter, you’ll use SageMaker and the Random Cut Forest algorithm to create a model that highlights the invoice lines that Brett should query with a law firm. Brett can then apply this process to every invoice to keep the lawyers working for his bank on their toes, saving the bank hundreds of thousands of dollars per year. Off we go!