By Fayssal M. Safie
After the Columbia accident, I was asked to lead the statistical data analysis team for the external tank foam in support of the Space Shuttle external tank return-to-flight team.
Two weeks later I gave my first presentation to one of the external tank return-to-flight engineering boards. My initial findings clearly indicated that the manual foam spray process had inadequate process control. As a result, an astonishing number of defects (such as voids) existed at many critical locations on the external tank. The frequency and size of the defects were hard to characterize statistically because of the extreme variability of the process. The results shocked me and the engineering community. It was even more shocking to hear one of the lead engineers say to me, “Dr. Safie, it looks like you are not going to be able to help us.” My quick response was, “No, I am here to help you, and I am helping you as we speak.” After some discussion, I did get the message across. Everybody understood that a process control problem existed, that more data needed to be collected, and that a safe return to flight would depend on process-control improvements.
The difficulties and sensitivities of the manual spray process for the Space Shuttle external tank thermal protection system that contributed to the Columbia accident are a dramatic and tragic example of the potential negative impact of inadequate process control on component reliability and system safety. The thermal protection system is a foam-type material applied to the external tank to maintain cryogenic propellant quality, minimize ice and frost formation, and protect the structure from ascent, plume, and reentry heating. (Although the tank is not reused, the thermal protection system is important during reentry because structural overheating after separation from the orbiter could result in a premature tank breakup with debris landing outside the predicted footprint.)
Integrated Process Control
The reliability of the thermal protection system is broadly defined as its strength versus the stress put on it in flight. High reliability in the thermal protection system means less debris released and fewer hits to the orbiter, reducing system risk. Process control is a critical factor in achieving high reliability and low system risk. In simple terms, the aim of process control is process uniformity and process capability. Adequate process uniformity is critical for adequate and valid characterization of the protection system material, and high process capability is critical to produce the material that can meet the specifications. Good process uniformity and high process capability yield fewer process defects, smaller defect sizes, and good material properties that meet the engineering specifications—the critical ingredients of high reliability.
Engineers frequently think about process control only in terms of statistical process control, which mainly involves control charts with upper and lower limits intended to maintain process within those parameters, but that is only part of what is needed to ensure process quality and reliability. In response to the Columbia accident, the external tank project team formulated an integrated process control plan for the tank’s thermal protection system to ensure that consistent processes would be employed. In addition to statistical process control, the plan involved manufacturing-material control, contamination control, supplier process control, process-change verification control, process monitoring, training and operator certification, and configuration management control. The aims of the plan included standardization of spray techniques, early detection of changes in materials, video reviews, process parameters (for example, for temperature and humidity), data recording, quality control inspection, and comprehensive training for technicians, operators, and quality-control engineers and technicians.
Implementation of the integrated process control plan was not an easy task for the external tank project. No contractual requirements for the plan were in place at that time, additional skills and resources would be required to execute the plan, and many external tanks had been built and sprayed prior to the plan’s creation. Even with these challenges, however, the external tank project successfully implemented the plan, to the extent possible.
Redesigned foam applications were performed within more tightly controlled process environments. Process validation and verification activities determined the optimal temperature and humidity ranges that would produce foam that minimized both the size and number of voids. Thermal protection system spray technicians and quality inspectors performing all complex geometric redesigned thermal protection system operations were trained and certified for spray applications to specific parts and locations. Quality-control inspections were increased to ensure independent verification of critical process steps. Quality personnel either witnessed or verified that an operation had been proficiently performed within a specified time prior to applying foam on a flight article.
As a result of the external tank project implementing the integrated process control plan to redesigned foam areas, the sprayed foam quality was significantly improved. The applied foam had fewer and smaller voids and greater strength and density.
Broader Implications: A Systems Approach
After spending two years analyzing external tank thermal protection system foam data and working with the return-to-flight engineering community, I realized that, in addition to the impact on reliability and system safety, lack of adequate process control could have a devastating impact on our engineering understanding of the failure physics and the validity of our engineering analyses across the board. Engineering models and engineering analyses based on highly variable and unstable data (that is, high sample-to-sample variability) due to lack of adequate controls could lead to erroneous conclusions and poor decisions. Lack of process control could also reduce engineers’ ability to characterize their engineering parameters with a high probability of accuracy to validate their requirements. On many occasions during my support of the external tank thermal protection system return-to-flight team, engineering models did not hold, engineering data could not be characterized, and engineering specs could not be evaluated. A significant source of these difficulties was the inadequate process control of the external tank thermal protection system foam. We simply did not have the consistent, reliable data needed to make these analyses and judgments.
The clearest lesson of the Columbia accident and the external tank thermal protection system foam experience is that understanding the relationship between process control, component reliability, and system safety is critical. This systemic approach needs to be taken at the beginning of the design process, ensuring that we are designing for manufacturability— that the vehicle can be built with the required level of quality and consistency.
Our experience with external tank foam issues has provided critical lessons for the Ares I design community. The Ares I Upper Stage project team has given extensive attention to process design and process control and has involved quality engineers in the early phases of the design process.
It is equally critical to understand potential integrated system failures that start at the component level with no immediate catastrophic or even critical consequences, but propagate through the system across subsystem interfaces to cause a system failure. The Columbia accident showed the importance of integrated system failure analysis. Ares I has been expending significant effort on identifying and evaluating potential integrated system failures using physics-based modeling early in design and development. The thrust oscillation study and the first stage–second stage separation study that provided critical information for management to seek optimum design solutions are examples.
The Columbia accident is a devastating instance of a design problem made worse by a process control problem, ill-defined requirements, and lack of understanding of the external tank foam failure mechanism. Having a good and well-defined set of requirements, understanding the system capabilities and system interactions, understanding the failure physics, and—most importantly—putting in place the process controls that are relevant to the failure physics are critical for designing and manufacturing reliable and safe launch vehicles. Learning the lessons of Columbia is essential to making sure that our future launch vehicles and spacecraft are as safe as we can make them.