A Stormy Situation
By Larry Goshorn
At ITT Aerospace, in Fort Wayne, Indiana, we build many different kinds of specialty payloads, including some of the workhorse instruments on NASA and NOAA’s meteorological satellites. These instruments provide many of the pictures that you see on the evening news and The Weather Channel. I like to think we’re not only in the aerospace business, but also in the business of protecting lives and property. We take our responsibility seriously, and that means sometimes we have to make tough decisions.
We’ve got a very good on-orbit history with our instruments. Like most folks in this business, however, we’ve had occasional difficulties during production due to various technical problems. Some years ago, we had a problem like this on the Polar Orbiting Environmental Satellite (POES) instrument program. In this case, our schedules were slipping, threatening the prime spacecraft contractor’s schedule and putting us in a potential cost overrun situation.
This program had been going on long enough that key personnel from the teams at both NASA Goddard and at our offices in Fort Wayne had changed many times. For a while, we had an incompatible mix of personalities, and there was a strained relationship between the project teams. NASA’s confidence in us was eroding, and that was showing in the award fees, which were dropping. The business implications here for a contractor are severe, because award fees can be the only profit on certain types of contracts.
At the time, I was ITT’s Director of NASA Programs and I knew that I needed to take action to improve the situation. I decided to make certain personnel changes in our program management office to provide a more compatible mix. I also assigned additional systems engineering expertise to our team. In short order, the relationship and performance started improving. Things were getting better. Then, the backslide began when a $10M instrument was damaged by electrical overstress during final acceptance testing.
Following the root cause investigation, we thought we understood the problem, and implemented appropriate corrective action. But when we resumed testing, another instrument showed damage. Now we were both confused and in trouble. We had two instruments that were damaged for reasons not understood, and we were uncertain where the overstress had occurred in the testing. Once again, our schedule was threatened.
The team faced internal pressures to hold schedule because ITT was involved in a competition for a new project. Past performance to schedule was a critical element of the competition. Should we try to “limp along” with instrument testing to make at least some level of schedule progress in parallel with troubleshooting the problem — or should we take the more radical approach and shut down all testing while we investigated? What would ITT senior management think if we shut ourselves down when they knew we were already in schedule trouble? What would NASA management think if we shut ourselves down? As the Eagles put it in their song “Hotel California,” the decision “could mean heaven or it could mean hell.” What do you do?
The skies begin to clear
A decision of this magnitude would affect the entire team so everyone’s voice was important in making this decision. I assembled the project team, including technicians, engineers, scientists, and business management — and we discussed the situation. We all agreed that we needed to do the right thing, no matter what. The decision, as you would suspect, was unanimous. We would shut ourselves down while we investigated. We could not put additional flight hardware at risk. While all agreed it was the right thing to do, both NASA and ITT management hoped that the problems would be found and resolved quickly.
We worked many long days trying to understand the causes of the problem using a cooperative team of both ITT and NASA experts. What we found was not just one, but up to three potential causes of electrical overstress. Taking corrective action for one did not correct the others. All of these issues were caused by recent changes made in the test process. Misleading symptoms compounded the problems. The initial electrical overstress that we were subjecting the instruments to resulted in greater stresses and damage once the instrument was powered on. The power supplies of the instrument itself were causing damage due to the first overstress, which was weakening the part.
The investigation showed that we had recently “improved” our test labs to reduce the susceptibility to voltage transients. In keeping with the adage that “one of the biggest causes of problems is solutions,” we found that there were potential grounding issues with the new wiring. In addition, we found that a long interconnect cable on a new piece of test equipment could generate 200 volts of static charge when moved if we did not have an adequate bleed-off path. We also found that this cabling was susceptible to cross-coupling any damaging static charge on one wire to other wires in the cable, potentially causing further stress. All of these issues were factors in our damaged instruments.
After the first instrument was damaged, we stopped the investigation when we found conclusive evidence of a cause and corrected it. What we did not do was dig deeper to investigate the possibilities of multiple causes and eliminate them all. Following this last round of exhaustive troubleshooting and repair activities, which took over two months, the ITT team presented its findings to a NASA review board explaining the issues, the findings, and the corrective actions taken. Our final statement was, “We now feel that it’s safe to resume testing.”
The board agreed with us, and testing was successfully resumed and has been fine ever since. We resumed instrument deliveries and we were able to recover the lost schedule in about ten months. Fortunately, we escaped impacting the spacecraft-level test schedule.
A forecast for success
Because all of us, the government and the contractor, were working together, we were able to take a synergistic approach to problem solving, even in a pressured environment, and to agree on what we were doing and why. Perhaps one of the biggest lessons for the team was that even some of the bleakest-looking situations can be overcome when you combine the right level of leadership, teamwork, and persistence with a few tools from your toolbox. It was not a comfortable decision to make, but it was the right decision to shut ourselves down.
After this episode, our award fee started moving in the right direction, and has returned to the excellent range. The ITT and NASA/NOAA program teams continue to work diligently together in producing some of the best meteorological products in U.S. history.
- Leadership requires courage to make the right decision, even if it is a painful decision.
- Involve the entire team when making critical decisions. “Involvement” means open and honest communications that include internal and external customers.
Would you have shut down the project after the first instrument was damaged, the second one, or only after a third?