<em>Spotlight on Lessons Learned:</em> Risk Management – Cascading and Worst-case Events

Mission Specialist Dave Williams, wearing a sleep cap that measures electrical impulses from the brain, muscles, eyes and heart, participates in the Sleep Studies experiment during STS-90.

Photo Credit: NASA

Managing risk involves addressing the possibility of more than one failure at a time.

Cascading or multiple failures must be considered during risk assessments to ensure hazard controls are adequate. STS-90 Neurolab, the final Spacelab mission, experienced several on-orbit animal habitat hardware failures and problems. Launched aboard the Space Shuttle Columbia in April 1998, the mission focused on the effects of microgravity on one of the most complex and least understood parts of the human body – the nervous system. Primary goals were to conduct basic research in neurosciences and expand understanding of how the nervous system develops and functions in space.

The mission was an overall success, but multiple hardware issues and failures presented challenges. The risks or potential problems associated with these failures were assessed as if they were isolated and unrelated. However, the primary mitigation approach to these risks required a common control, namely the crew’s intervention. The risk of having simultaneous multiple failures was not considered and the mitigation approach proved to be inadequate since the crew’s time was already limited due to the complicated nature of the mission.

Lesson Number: 1615
Lesson Date: August 29, 2005
Submitting Organization: Ames Research Center

 

HIGHLIGHTS

LESSONS LEARNED

  • Systemwide risk or the total risk resulting from accumulation of several acceptable risks will be masked if a formal, integrated risk management plan is not in place.
  • Aggregated and worst-case scenarios should be considered since the crew or another control mechanism may not be able to respond adequately to cascading or multiple failures.

RECOMMENDATIONS

  • All missions must incorporate a formal, systematic risk management plan that is compatible with the processes identified in NASA Procedural Requirements (NPR) to identify, analyze, plan, track, control and communicate risks. (Applicable NPRs include 7120.5 and 8000.4, which address project management and risk management, respectively.)
  • Specific attention should be given to how risks associated with individual pieces of hardware/software are aggregated with risks of other subsystems to reflect total mission risk.

Consult the lesson learned for complete lists.

 

Ames Research Center CKO Donald Mendoza on the importance of this lesson learned:

Donald Mendoza. Photo Credit: NASA.

Donald Mendoza Photo Credit: NASA

Widely considered to be one of the most ambitious scientific missions NASA has ever flown, Neurolab was an extraordinary 16-day flight dedicated to researching the brain and nervous system. It also served as a prime example of how a series of “minor, acceptable” risks have the potential to cascade to “unacceptable” levels.

Risk assessments that consider only individual failures frequently rely on human intervention as a mitigation. However, in complex, time-intensive missions, humans can quickly become overloaded. The lessons learned from Neurolab are broadly applicable in nearly every failure scenario that requires human intervention.

  • Multiple independent projects that come together for flight are typically treated as independent subsystems, each with their own risks. Those subsystems must subsequently be re-evaluated as a single, integrated system.
  • When evaluating potential hardware failures, it is important to run multiple scenarios, including combinations of “high probability” failures. Consider what would happen if these were compounded by a “low probability, high severity” failure.
  • Simulate whenever possible. Include extenuating circumstances such as fatigue, uncomfortable physical positions and even persistent, blaring alarms.
  • When planning missions, strategically combine time-consuming, physically taxing experiments with those that require little or no intervention.
  • Prioritize failure modes before they happen, and prepare appropriate checklists. Encourage reliance on automation when practical to lighten cognitive load.

 

Read the full lesson learned

Neurolab: Final Report for the Ames Research Center Payload

 

Spotlight on Lessons Learned is a monthly series of articles that feature a valuable lesson along with perspective from NASA’s knowledge management community on why the lesson is important. The full lessons are publicly available in NASA’s Lessons Learned Information System (LLIS).

If you have a favorite NASA lesson learned that belongs in the spotlight, please contact us and be sure to include the LLIS Lesson Number.

About the Author

Share With Your Colleagues