By Bill Townsend
As Deputy Director at NASA’s Goddard Space Flight Center, I was responsible for overseeing the launch of the Aura spacecraft from atop a Boeing Delta II rocket out of Vandenburg Air Force Base in July 2004.
Aura is an earth-observing satellite developed to help us study the quality of the air we breathe. It will look at the state of the ozone and the atmospheric composition in regards to the Earth’s changing climate.
I headed to California on July 5, 2004. The plan was that the satellite would launch on the tenth, but we had a few problems getting it off. This was the fifty-ninth launch of my career, and it was also a little different than most of my previous launches. Most of the time it’s weather that postpones a launch; there aren’t usually that many technical issues this late in the game. This time, however, we had several problems, equally split between the launch vehicle and the spacecraft. I remember a member of the crew asking me, “Is this normal?” And in my experience, it wasn’t.
A Wrench in the Works
We had three significant spacecraft issues during the launch campaign. These problems, together with the launch vehicle problems, ended up postponing the launch five days. During that time, the mission management team met 11 times at all hours of the day and night to try to sort things out. I myself held four special reviews.
The first problem was that some tools had been misplaced during final spacecraft closeout, which could present a problem if they were left on the spacecraft during launch. A wrench was lost and then found. Then we realized that we had also lost a flashlight.
The first step to solving this problem was interviewing everyone who had been involved with getting the spacecraft ready for launch. This was a massive effort, even extending to people overseas. Then we were able to make a timeline of activity based on photographic evidence, which was time-tagged and fairly easy to review. Through these measures, we were able to limit the possibility of the flashlight’s whereabouts to an area that made up about 10 percent of the spacecraft.
The Likely Story
We were able to determine that the flashlight had last been seen on a processing table. Something called “scrim,” which is a plastic covering that is taken off the fairing during its installation on the launch vehicle, had been put on the table. The flashlight was also on the table, and it was probably swept off with the trash. We even checked the dump, but the search was futile.
Based on all the evidence and interviews, we were able to put together a story that convinced us that the flashlight was not on the spacecraft. What was really frustrating, however, is that all of this investigation could have been avoided. I found out that there was someone along the way who had noticed immediately that the flashlight had been misplaced. It was a week before they finally came forward, and by that time, the trail was cold. The person didn’t speak up, because he was initially afraid to report it. I realized that there needs to be a clear message sent on this type of thing: No one is going to get in trouble for calling our attention to potential problems. It’s the kind of behavior that we need to encourage.
Shake, But No Rattle
The second problem had to do with a transistor failure in another program. A nickel-plated transistor can had been improperly cleaned, and this created the possibility of particles being generated inside the can. As long as they were smaller than about two thousandths of an inch, we wouldn’t have a problem. But particles were reported to be larger than that.
The screening technique for this type of problem is to vibrate the transistor and listen for particles rattling around. This process is able to detect particles down to one thousandth of an inch. But the parts reported as having particles larger than two thousandths of an inch had passed this test. One of the team members said that the noise should sound like “an acorn in a coffee cup” if it were that large, but there was no noise.
I asked to push back the launch date in order to figure out the problem. It wasn’t a popular decision, but I felt it was a necessary one. I was able to facilitate a discussion between our parts and materials engineers and those from the program that had the problem. It turns out that there had been a miscommunication. The “particles” were actually very quiet flakes. That’s why all the parts passed testing based on an acoustic response — the flakes didn’t make much noise.
We then did a thorough risk assessment for every application of this part in both the spacecraft and the instruments. Thanks to redundancy, we convinced ourselves we were OK. Had we just gone ahead with the launch, it would have been successful. But we wouldn’t have known why it was successful. I felt we needed to take the extra time to figure it out.
The Minority Report
The last problem occurred during the countdown when there was a bit flip in the solid-state recorder. We had seen this occur occasionally on Aura’s sister spacecraft, Aqua, and it wasn’t mission-threatening. But then it happened again one hour from launch. I asked for a summary of the situation, and I was basically told that the spacecraft was ready to fly “as is.” I asked if there was anyone who disagreed with that, and I was told reluctantly that there was one person. His name was Michael.
I got Michael on the net and asked him to explain his opinion about the problem and his reservations. It turned out that he had seen the problem in the test program, but no one was worried because it hadn’t set off any alarms. While he was explaining, there were other people on the net that kept saying we needed to go ahead with the launch. As the conversation progressed, I could feel him getting pressure from the rest of the team and begin to change his mind. I again made the unpopular decision to delay the launch until the issue could be ironed out.
So after speaking to Michael, I went to find out more about the problem, and to talk to my team about possible solutions. Since the problem was on both the A and B sides of the recorder, and in the same word and memory section on each side, we determined that if this became a frequently recurring problem, we would be able to bypass that section of memory and work around it. It sounded like a viable option.
But since Michael was the only one who had been concerned about the problem, I wanted to consult with him before moving forward. He was working the second shift and was asleep. I got him out of bed to hear the solution and to see what he thought of it. He believed that it was airtight. I really felt like this situation was very similar to what had been outlined in the CAIB report: When minority opinion isn’t valued, people are afraid to speak up, and they end up giving in to conformance pressure even though they know there’s a problem. So I wanted to make sure that I took the time to hear the dissenting opinion.
And Patience Is Rewarded
In the end, we launched on July 15, 2004. We managed to work around each of these issues, mainly because we cracked them wide open each time. There’s a personal motto I’ve adopted from the wine industry, and I take it with me to each launch. That motto is “No launch before its time.” It’s the job of the management not to get caught up in the “launch fever” that accompanies the last few hours before liftoff. If there is an issue, no matter how small, it needs to be brought to the table and dealt with. The problem needs to be investigated, the risk needs to be evaluated, and sometimes the best decision is to postpone. Better to hold off five days — or if necessary, even longer — and be sure of success, than be on schedule with failure looming in the background.
Lessons
- In dealing with potential problems, it is essential to get to the bottom of technical questions and understand why things work, not just why they don’t work.
- It is important to hear, evaluate, and respect minority opinion, as well as to protect that minority from the conformance pressure of the majority.
Question
How can you foster a project environment in which people are not afraid to speak up immediately when they notice that there is a problem?
Search by lesson to find more on:
- Communication
- Risk