6 Who you gonna call? Syscall-busters!

 

This chapter covers

  • Observing syscalls of a running process by using strace and BPF
  • Working with black-box software
  • Designing chaos experiments at the syscall level
  • Blocking syscalls by using strace and seccomp

It’s time to take a deep dive—all the way to the OS—to learn how to do chaos engineering at the syscall level. I want to show you that even in a simple system, like a single process running on a host, you can create plenty of value by applying chaos engineering and learning just how resilient that system is to failure. And, oh, it’s good fun too!

This chapter starts with a brief refresher on syscalls. You’ll then see how to do the following:

  • Understand what a process does without looking at its source code
  • List and block the syscalls that a process can make
  • Experimentally test your assumptions about how a process deals with failure

If I do my job well, you’ll finish this chapter with a realization that it’s hard to find a piece of software that can’t benefit from chaos engineering, even if it’s closed source. Whoa, did I just say closed source? The same guy who always goes on about how great open source software is and who maintains some himself? Why would you do closed source? Well, sometimes it all starts with a promotion.

6.1 Scenario: Congratulations on your promotion!

6.1.1 System X: If everyone is using it, but no one maintains it, is it abandonware?

6.2 A brief refresher on syscalls

6.2.1 Finding out about syscalls

6.2.2 Using the standard C library and glibc

6.3 How to observe a process’s syscalls

6.3.1 strace and sleep

6.3.2 strace and System X

6.3.3 strace’s problem: Overhead

6.3.4 BPF

6.3.5 Other options

6.4 Blocking syscalls for fun and profit part 1: strace

6.4.1 Experiment 1: Breaking the close syscall

6.4.2 Experiment 2: Breaking the write syscall

6.5 Blocking syscalls for fun and profit part 2: Seccomp

6.5.1 Seccomp the easy way with Docker

Summary