Managing risk involves addressing the possibility of more than one failure at a time.
Cascading or multiple failures must be considered during risk assessments to ensure hazard controls are adequate. STS-90 Neurolab, the final Spacelab mission, experienced several on-orbit animal habitat hardware failures and problems. Launched aboard the Space Shuttle Columbia in April 1998, the mission focused on the effects of microgravity on one of the most complex and least understood parts of the human body – the nervous system. Primary goals were to conduct basic research in neurosciences and expand understanding of how the nervous system develops and functions in space.
The mission was an overall success, but multiple hardware issues and failures presented challenges. The risks or potential problems associated with these failures were assessed as if they were isolated and unrelated. However, the primary mitigation approach to these risks required a common control, namely the crew’s intervention. The risk of having simultaneous multiple failures was not considered and the mitigation approach proved to be inadequate since the crew’s time was already limited due to the complicated nature of the mission.
Ames Research Center CKO Donald Mendoza on the importance of this lesson learned:
Widely considered to be one of the most ambitious scientific missions NASA has ever flown, Neurolab was an extraordinary 16-day flight dedicated to researching the brain and nervous system. It also served as a prime example of how a series of “minor, acceptable” risks have the potential to cascade to “unacceptable” levels.
Risk assessments that consider only individual failures frequently rely on human intervention as a mitigation. However, in complex, time-intensive missions, humans can quickly become overloaded. The lessons learned from Neurolab are broadly applicable in nearly every failure scenario that requires human intervention.
- Multiple independent projects that come together for flight are typically treated as independent subsystems, each with their own risks. Those subsystems must subsequently be re-evaluated as a single, integrated system.
- When evaluating potential hardware failures, it is important to run multiple scenarios, including combinations of “high probability” failures. Consider what would happen if these were compounded by a “low probability, high severity” failure.
- Simulate whenever possible. Include extenuating circumstances such as fatigue, uncomfortable physical positions and even persistent, blaring alarms.
- When planning missions, strategically combine time-consuming, physically taxing experiments with those that require little or no intervention.
- Prioritize failure modes before they happen, and prepare appropriate checklists. Encourage reliance on automation when practical to lighten cognitive load.