chapter eleven

11 Adversarial robustness

Imagine that you are a data scientist at a (fictional) new player in the human resources (HR) analytics space named HireRing. The company creates machine learning models that analyze resumes and metadata in job application forms to prioritize candidates for hiring and other employment decisions. They go in and train their algorithms on each of their corporate clients’ historical data. As a major value proposition, the executives of HireRing have paid extra attention to ensuring robustness to distribution shift and ensuring fairness of their machine learning pipelines and are now starting to focus their problem specification efforts on securing models from malicious acts. You have been entrusted to lead the charge in this new area of machine learning security. Where should you begin? What are the different threats you need to be worried about? What can you do to defend against potential adversarial attacks?

Adversaries are people trying to achieve their own goals to the detriment of the goals of HireRing and their clients, usually in a secretive way. For example, they may simply want to make the accuracy of an applicant prioritization model worse. They may be more sophisticated and want to trick the machine learning system into putting some small group of applicants at the top of the priority list irrespective of the employability expressed in their features while leaving the model’s behavior unchanged for most applicants.

11.1 The different kinds of adversarial attacks

11.1.1 Target

11.1.2 Capability

11.1.3 Goal

11.2 Defenses against poisoning attacks

11.2.1 Data sanitization

11.2.2 Smoothing

11.2.3 Patching

11.3 Defenses against evasion attacks

11.3.1 Denoising input data

11.3.2 Adversarial training

11.3.3 Evaluating and certifying robustness to evasion attacks

11.4 Summary