chapter four

4 Reliability

 

4.1 Fundamentals

4.1.1 Reliability

Reliability is a term that is widely used today. Yet, the more a term is used, the more ambiguous it can become.

In this section, I’m not going to formally define the concept of reliability, but instead, I’m going to share a story that changed my whole career.

A few years ago, I joined a new company in a safety-critical domain: air traffic management. My first day there is one I will probably remember for the rest of my life. We were doing a training session with all the newcomers, seated in a large conference room and casually waiting for the session to begin. In front of us was the trainer.

After a brief introduction, the trainer asked us to make a roundtable to explain where we came from. People took turns explaining their backgrounds. For example, when it was my turn, I mentioned that I came from the insurance industry. Once everyone had shared his experience, the trainer paused for a moment and then said:

“There’s something important that you should all realize by joining our company: if we have a problem, we may not lose money, we may not lose customers, but we may lose lives.”

That very moment was a revelation for me. It was the trigger to understand how reliability can be something absolutely crucial in some domains. From that day forward, I started to become captivated by reliability topics.

4.1.2 Graceful Degradation

4.1.3 Adaptive LIFO

4.1.4 Resilient, Fault-tolerant, Robust, or Reliable?

4.1.5 Fail Open vs. Fail Closed

4.1.6 Soft vs. Hard Dependency

4.2 Analysis

4.2.1 Post Hoc Ergo Propter Hoc

4.2.2 Lurking Variables