Why do modern systems fail despite their incredible capabilities? Understanding the causes of failure can help us prevent the next disaster and even improve our personal and organizational decision-making.
1. Complexity and Tight Coupling Lead to Failure
Modern systems have evolved with advanced capabilities, but these advances also bring complexity and reduced flexibility, making systems more prone to breakdowns. When multiple small errors interact, they cause a domino effect, as seen in notable crises like the Fukushima disaster or the 2008 financial meltdown.
The issue lies in tightly coupled systems, where small issues quickly escalate because there’s minimal buffer between parts. Cooking a Thanksgiving dinner is a small-scale example of tight coupling: if one dish's timing goes off, the entire meal is impacted. Similarly, financial markets and nuclear power plants can experience chain reactions when critical components fail.
Complexity, on the other hand, makes systems hard to understand. Just as it’s difficult to gauge whether a turkey is fully cooked without cutting into it, opaque systems hide potential risks. When complexity and tight coupling combine, failure becomes almost inevitable unless handled wisely.
Examples
- BP’s oil spill stemmed from multiple operational and design failures that spiraled out of control.
- The Fukushima disaster occurred when an unforeseen tsunami overwhelmed inadequately planned defenses.
- The complexity of global financial systems caused hidden risks to collide during the 2008 crisis.
2. Simplify Systems and Add Buffers to Prevent Failure
Failures often occur because systems lack transparency or flexibility. Simplifying mechanisms and adding buffers can provide much-needed safeguards. Anton Yelchin’s tragic accident with his Jeep arose from an overly complicated gear shift that caused confusion. This could have been avoided with a clearer design.
When systems are inherently opaque, like mountaineering or space exploration, buffers like better preparation and flexible scheduling can mitigate risks. Managing the unexpected becomes easier when you troubleshoot early on rather than react during a crisis.
Think of a bakery chain with overly complex supply systems. Instead of fixing the issue, they created a flexible launch schedule to handle inevitable hiccups, ensuring smoother operations. Approaching problems with Perrow’s formula of complexity and coupling provides a framework for identifying vulnerabilities before they snowball.
Examples
- Anton Yelchin’s Jeep accident could have been averted with a simpler, intuitive gear indicator.
- Mountaineers avoid delays and dangers by solving logistical issues well in advance of the final climb.
- A bakery chain avoided systemic breakdown by introducing adaptable launch timelines.
3. Use Structured Decision-Making Tools
We often depend on instincts to make decisions, but this approach falters in complex systems. At Fukushima, engineers underestimated tsunami risks due to overconfidence in their probability calculations. These could have been improved with methods like SPIES, which analyzes a wider range of potential outcomes.
Structured criteria also help by focusing attention on important factors rather than irrelevant distractions. The Ottawa Ankle Rules, for example, helped reduce unnecessary X-rays in Canadian hospitals by creating fixed criteria to determine medical needs.
When human judgment is unreliable, structured tools like SPIES can overcome biases, leading to safer and more accurate decisions, even in unpredictable systems.
Examples
- SPIES tools could have prevented underestimations in Fukushima’s tsunami defenses.
- Ottawa Ankle Rules reduced unnecessary X-rays and medical errors based on clear criteria.
- Structured assessments during hurricane evacuation planning save lives by prioritizing key risks.
4. Pay Attention to Warning Signs
Systems often exhibit small errors before full-scale failure. Ignoring these signs can lead to disasters, as in Washington DC’s Metro crash in 2009, where a neglected sensor error escalated into a fatal accident.
Airlines have mastered the art of "anomalizing"—studying every near-miss to identify systemic weaknesses. They share insights, fix underlying faults, and normalize discussions about errors. This proactive approach has dramatically reduced aviation accidents over the years.
Analyzing warning signals is essential in any system. Businesses, for instance, can use feedback loops or consultants to pinpoint potential industry disruptions or internal risks.
Examples
- DC Metro ignored warning signs after a near-miss in 2005, with disastrous results later.
- Airlines turned near-misses into learning opportunities, cutting fatal crash rates significantly.
- Businesses use advisors to identify competitive and technological risks before they escalate.
5. Encourage Dissent in Teams
Hierarchy and social conformity often silence dissent, even when speaking up could prevent mistakes. In airlines, junior officers rarely challenged senior pilots, contributing to avoidable accidents. Crew Resource Management reversed this by promoting open communication and shared responsibility.
Open leadership can similarly foster innovation and reduce error. For example, allowing team members to offer diverse viewpoints before the leader states their opinion results in more ideas and better solutions.
Encouraging dissent isn’t just good for safety—it results in stronger, more collaborative decisions, as team members feel freer to offer honest feedback without fear.
Examples
- Crew Resource Management training empowered airline crews to prevent crashes.
- Open leadership discussions lead to more innovative and thorough solutions.
- Businesses implementing feedback-friendly cultures avoid groupthink-related pitfalls.
6. Diversity Strengthens Decision-Making
Homogeneous teams often trust each other’s judgments too much, amplifying errors. A study on stock-market trading found diverse teams performed better by questioning each other’s decisions. They made fewer mistakes and avoided market crashes or price bubbles caused by blind trust.
The 2008 financial crash highlights the perils of uniform thinking among decision-makers. Diversity, whether in perspective, experience, or demographics, helps teams approach problems more critically and avoid echo chambers.
Voluntary mentoring programs create healthier workplace diversity by showcasing new talent rather than forcing inclusivity. They effectively reduce bias while promoting mutual growth within organizations.
Examples
- Diverse stock market groups avoided severe crashes compared to homogenous ones.
- Sallie Krawcheck attributed the financial crisis to homogeneous leadership and groupthink.
- Companies with mentoring programs attract and retain more diverse talent through positive reinforcement.
7. Reflection Prevents Plan Continuation Bias
When under pressure, people tend to stick to initial plans, even when they become unwise. This tendency is familiar in aviation accidents and applies to project deadlines as well. Pilots call it "get-there-itis," where the pressure to reach a destination outweighs changing conditions.
Brian Schiff’s refusal to fly Steve Jobs due to dangerous flight conditions demonstrates the importance of pausing and recalculating. Systems benefit when workers are trained to reassess plans under evolving circumstances, prioritizing safety and effectiveness.
Iteration, like that used by hospitals or families, allows for constant re-evaluation and improvement, making it an effective technique across high-stakes systems.
Examples
- "Get-there-itis" leads to airline accidents when pilots ignore updated risks.
- Brian Schiff prioritized safety over client demands, avoiding a risky takeoff.
- Families using iterative discussions resolved chaotic routines effectively.
8. Iteration Ensures Adapting to Change
Modern systems require flexibility to navigate evolving circumstances. ER teams, for example, regularly adapt and iterate based on patients’ changing conditions, moving seamlessly between caregiving and diagnosis.
This process of trial, evaluation, and re-implementation also works in families or businesses. For example, the Starr family’s Agile practices turned chaotic mornings into efficient routines through regular reflection and updates.
Iteration is vital for adaptive systems, ensuring that approaches are continually refined rather than rigidly sticking to outdated plans.
Examples
- ER iteration cycles improve patient outcomes in high-pressure environments.
- Agile practices helped one family revolutionize their weekly routines.
- Feedback loops in tech development create better, more user-friendly products.
9. Thinking About Failure Helps Avoid It
Pre-mortems, where a team imagines failure and backtracks to find causes, can reveal vulnerabilities overlooked in optimistic planning. This approach leverages hindsight to improve foresight and anticipate potential setbacks.
Research shows that pre-mortems often uncover more risks than traditional planning methods. It encourages more precise reasoning about projects, increasing the likelihood of success in complex systems.
Leaders across industries can use pre-mortems to turn risky projects into safer, well-calculated ventures by addressing weak spots proactively.
Examples
- Pre-mortems prevent overconfidence in high-stakes projects.
- Imagining failure helped identify potential glitches in tech startups during design stages.
- Disaster response teams use pre-mortems to speed up recovery planning.
Takeaways
- Practice pre-mortems to uncover vulnerabilities during planning phases by imagining project failures and their causes.
- Introduce diverse perspectives in teams to improve decision-making and challenge group biases.
- Use iteration regularly to refine tasks, focusing on adaptability over rigid adherence to plans.