An international Earth-observing mission to study the salinity of the ocean surface ended in 2015 when an essential part of the spacecraft’s power and attitude control system stopped operating due to over-testing prior to launch.
The Aquarius/Satélite de Aplicaciones Científicas (SAC)-D satellite observatory, a collaboration between NASA and the Space Agency of Argentina with participation from Brazil, Canada, France and Italy, had a mission-ending failure related to power supply electronics approximately four years after launch. The Aquarius instrument had completed its primary three-year mission and successfully achieved its science objectives as a pathfinder mission to demonstrate that accurate, scientifically significant measurements of salinity could be made from space.
In preparation for the mission, commercial DC/DC converters were tested to the full levels of MIL-PRF-38534, the General Specification for Hybrid Microcircuits, without consideration of the parts’ data sheet limitations. The testing levels not only greatly exceeded the data sheet limit for the parts but were orders of magnitude greater than any levels that could be encountered in the mission. The converters passed the screens and were incorporated into flight hardware, but the parts had been overstressed during testing – ultimately prompting failures on orbit and resulting in the end of the mission.
NASA Goddard Space Flight Center Chief Engineer for Safety and Mission Assurance Jesse Leitner on the importance of this lesson learned:
NASA engineers want a lot of margin and so we have a tendency to over-test things in general without thinking of the implications. And we tend to overly follow some of the rules without thinking about the risk they bring. We especially need to think about the implications of extreme levels of tests. For example, MIL-PRF-38534 demands a 3000g constant acceleration test, which was performed on the Aquarius parts as a screen, even when the part’s data sheet indicated a tolerance to 500g acceleration. 3000g is horrendous and should be challenged every time it’s required.
As we transition to the use of new parts and technologies and more use of commercial parts and other approaches to electronic parts, we really have to understand that different rules apply. Before we start to test the components or the parts that we’re using, we really should understand what their data sheets and specifications say and make sure that we’re not going to over-test them or test them outside of their bounds. There are times when it’s appropriate to test things outside of their bounds, but you have to understand the risks of doing so and you have to acknowledge that you’re doing that.
It’s happened a lot in new EEE parts where we can sometimes blindly follow the specifications when we’re testing parts just because it says in the rule that you’re going to test to these levels. We sometimes do the testing rather than look carefully at what the limitations are and whether the tests at that level are necessary at all. It doesn’t mean it actually is appropriate for your application and that you might not be causing damage to the part and removing its reliability. I like to use a phrase that, ‘You can’t screen reliability into a part, but you can certainly screen it out.’ That’s the danger of testing things outside of the limits of a component that you have and likewise unnecessarily testing things at levels that are not relevant to your application. And this doesn’t just go with parts; this goes with anything.