Back to Top
Fixing What’s Broken

By Frank Snow

Everything looked good as we started the first day of vibration tests on the High Energy Solar Spectrascopic Imager (HESSI). We chose to do our environmental testing at the Jet Propulsion Laboratory (JPL) in California and, so, we had brought our spacecraft there from Spectrum Astro in Arizona.

We planned to launch in July 2000. Heading into March that year we were on schedule, under budget, meeting all of our performance requirements, and ready for the final testing. I remember feeling proud of what the development team, lead by the University of California at Berkeley and its project manager, Peter Harvey, had accomplished in the last two-and-a-half years. We were in the homestretch — or so I thought.

Near the end of the day, it was time for the sign burst test. For 200 milliseconds we would put a non-feedback force on our system, which meant we couldn’t adjust or halt the test in process. Something went wrong, terribly wrong during the sign burst test. As mission manager, I was standing just ten feet away from the spacecraft when this happened. It sounded like a clap of thunder. With the test stopped, we moved in closer to see what had happened — and we knew immediately that we had damaged our spacecraft. How much, we didn’t know.

Once they got our spacecraft off the table, it was fairly obvious what had caused the problem: One of the support bearings on the vibration test bed had failed. This caused an abnormally high level of static friction, which the computer read as mass. When it tried to compensate by increasing vibration, it shook the spacecraft ten times harder than we had planned.

If anyone knows Tom Gavin, Director of Flight Projects at JPL, they know that he likes to share a little piece of information with engineers during reviews: “If you have an anomaly, you’re going to meet a lot of important people.” Well, I started meeting a lot of important people as soon as word spread about our testing disaster.

Three days later, I stood in front of the Mishap Board to open the investigation. The Mishap Board concluded that two primary factors contributed to the accident: the absence of a scheduled maintenance program for the test equipment, and the lack of proper test procedures.

I didn’t just accept responsibility for our mishap; I accepted responsibility for getting the project back on track.

I don’t think that I was alone in thinking about Mishap Boards with trepidation. But I learned a lot of valuable lessons from this one. JPL doesn’t run this particular test very often, and we should have reviewed their test procedures thoroughly before allowing our spacecraft to undergo testing. Because this was a non-feedback test, it should have been standard procedure to run a mass simulation before running the test on our spacecraft, and if we had been thinking straight, we would have required that. (Now, I don’t care who tells me something, I insist on seeing it verified.) In the end, the Board concluded that my team was partially responsible for the accident, and I agreed with them.

Putting HESSI back together again
I didn’t just accept responsibility for our mishap; I accepted responsibility for getting the project back on track. And if I was going to do that, I couldn’t wait for someone to tell us what to do; we simply got to work. Our standard support structure (a machined aluminum main support ring) had broken in two places on each side; the test snapped it. So, the structure had to be replaced. But that was only the beginning of our problems.

Then there were the arrays. This was a solar mission designed to explore the physics of solar flares, and we wanted it up in July for the peak activity of the 11-year solar cycle. If we couldn’t get up in July, we wanted to get up as soon as possible. Solar arrays normally require a long lead time. How could we get new arrays in time? Well, we got Goddard engineering involved. They found some solar cells manufactured for the Iridium constellation, which was now bankrupt.

The next problem we faced was the instrument boxes. We had done a vibration that nobody expected these boxes to see. We went back to the vendors and asked, “If we do an ATP [Acceptance Test Plan], will you re-qualify?” They declined. “Buy another box” was their response. So, I had to fall back on another organization, the Quality Assurance Group, that I had previously seen as little more than an obstacle standing between me and my launch date.The Quality Assurance Group made me an offer: If they could get involved in the Acceptance Test Plan, they would accept the vibration and certify our boxes. That’s what we did.

For months we had operated under the maxim, “If no one tells you to stop, just keep going.”

But our problems weren’t over. Though it didn’t break during the vibration test, two months down the road, our flight cryocooler failed. This was a commercial product that we had flight qualified. We still had about four or five of them, but we had to flight qualify at least one of the remaining coolers. So, we put together a tiger team to do another ATP and get it done as quickly as possible — although it was already clear that we wouldn’t make our launch date, that team worked miraculously, as far as I was concerned, and eventually they brought HESSI back to its original condition.

Of course, this is just the technical part handled by the team. As the mission manager, the person responsible for overseeing all the project’s facets, I had to be off doing other things — including reviews. For months we had operated under the maxim, “If no one tells you to stop, just keep going.” So, we had kept working all along, but if we were to complete our work on HESSI, I needed to have our Recovery Plan approved. So, while all the technical work was progressing, I made our case in front of several review panels.

After an independent panel gave us their stamp of approval in May, the Goddard Program Management Council held a Reconfirmation Readiness Review in June. An independent expert concluded that we probably only stood a 60 percent chance of surviving launch. When you take that to senior management, it’s likely to be considered too high a risk. We had to convince them that we understood the system better than anyone else did. And you know what? They accepted this risk; here again, was another organization that I gained a new appreciation of.

After that, we had a NASA Reconfirmation Review in August, led by Dr. Ed Weiler, then Associate Administrator for Space Science. I had to ask him for the money we needed to get to launch. I gave a presentation and when we got to the slide that showed HESSI before we started the repairs, he told me it was a good thing he hadn’t seen the slides back in March. “I would have cancelled you,” he said. But, in the end, he approved our plan and gave us our money for a February 2001 launch. All in all, I was astonished by the level of support from almost everywhere I turned at NASA when I asked for help in recovering this project.

And even more astonishing
A year after the mishap, we were ready. I remember giving myself a mental pat on the back as I thought about how well we were doing — all things considered. Then we ran into another series of problems.

HESSI was scheduled to be air-launched by a Pegasus rocket (dropped from the belly of an aircraft flying 39,000 feet over the ocean). The Pegasus started running into problems on other launches. Our launch date was pushed back to June. When the time came, we integrated our spacecraft with the Pegasus at Vandenberg Air Force Base in California and then flew across the country to the Kennedy Space Center. We were just four days from launch when there was another Pegasus failure — this one on a DoD mission. We were put on hold.

We pulled out, went back to Vandenberg to wait it out, and put HESSI in storage. But this time Mother Nature decided to test us. A major rainstorm swept through the area, and they had to call out troops to sandbag our facility because the floods were rising. The water kept rising — so, in the middle of the night, in the middle of the flood, in the middle of the rainstorm, we moved HESSI to another building across a swelling creek.

We got a launch date in February 2002. It took that long to resolve the various problems with the Pegasus and to get a new place in the launch queue. Finally, we brought HESSI back to Kennedy Space Center. Of course, with our luck, we came in the middle of another rainstorm. We were waved off the first time and couldn’t land. So we had to circle the landing strip with lightening flashing around us until, finally, we saw a gap in the weather. We were ready to land.

Then we got a radio call from our airstrip, “There’s an alligator out there on the strip. You can’t land.”

At this point, none of us could be astonished by much. We got someone on the ground to go out and escort the alligator off the skid strip. Finally, we landed — another crisis averted.

But then we had to wait for things to dry out, because our ground system control had been hit by the rainstorm. If I hadn’t wondered if HESSI was in someway cursed, this was enough to make me consider the possibility: Things began to dry up, but our ground support equipment had been inundated with toads. We had to go out there, of course, and get rid of all the toads and put plastic strips around everything so the toads wouldn’t come back. We finally got to our launch date, the fifth of February, and we were thinking, well, what’s going to happen today?

Countdown
I’ll tell you what happened that day. As they say, it was time to “open the book” four hours before launch. So, we opened the book — and we were red. One of our ground antennas had gone down. It was mandatory for launch. We started working that problem, at the same time we had to work a series of battery temperature problems. We did all of this on the skid strip waiting to get our launch off.

Finally, we got the antenna back and got waivers on the battery. We got the plane up in the air. We were within two minutes of our drop zone, when I heard the launch manager give the abort command. Excessive static on voice communication with the drop plane caused the abort. After correcting the problem, we flew around and headed back to the drop zone. We had only one more opportunity.

If you’ve ever been involved in a situation like this, you’re listening to four or five different channels at once on your headset. You can hear everyone else talking about any problems they see. I was listening to all those voices as our plane was about four minutes from drop, and I looked back down at my telemetry and saw that the temperature on the battery had finally gone down to the right spec. All of sudden everything went quiet on the net.

All I could hear then was the launch countdown. It went smooth. The Pegasus was dropped with HESSI abroad, and in eleven minutes we were in orbit.The only thing I could think at that point was that the gods must have gotten tired of beating on us.They finally smiled on the little spacecraft that would not give up.

It’s been more than two years now since launch, and the scientists are extremely happy with their science. While they’ve studied solar flares and even taken a look at the Crab Nebula, I’ve had ample opportunity to reflect back on our trials with HESSI.

What saved us, time and again? We refused to give up. But besides tapping reservoirs of perseverance, I also learned to tap what I now like to call a project’s hidden resources. I learned to work with and get help from organizations that I usually didn’t think of as “resources.” I’m talking about Mishap and Failure Review boards, program management councils, and the like. Before HESSI, I tended to think of them as mountains in the road. But when I was in a deep enough hole with little margins to play with, I started to see them in a different light. I asked for help, and I got it.

Lessons

  • You can never say too much about the value of persistence in the face of adversity. All projects suffer setbacks. Sometimes the difference between succeeding and failing on a project is an inexhaustible supply of persistence.
  • When confronted by problematic situations, a project manager with the determination to succeed identifies and makes use of all available resources. That may include looking at governing organizations in a new light.

Question
In a crisis situation such as the one described at the beginning of the story, what would you say to a Mishap Board or Failure Review Board to gain their confidence that you could lead your team to overcome this setback?

Search by lesson to find more on:

  • Testing
  • Reviews
  • Risk

 

About the Author

 Frank Snow Frank Snow has been a member of the NASA Explorer Program at Goddard Space Flight Center since 1992. He was the Ground Manger for the Advanced Composition Explorer (ACE), and mission manager for the Reuven Ramaty High Energy Solar Spectroscopic Explorer (RHESSI) and the Galaxy Explorer (GALEX). He began his career with NASA in 1980.The HESSI project described here in Snow’s story was renamed after launch in honor of Dr. Reuven Ramaty, a Goddard Space Flight scientist until his death in April 2001. A pioneer in the fields of solar physics, nuclear astrophysics, cosmic rays, and gamma-ray astronomy, Ramaty served as co-investigator and a founding member of the HESSI team. “He was a genius,” Snow remembers. “And, the truth is, we wouldn’t be where we are today if it weren’t for Dr. Ramaty. He really believed in this project and he kept pushing and pushing to keep it alive.”Now known as RHESSI, the mission continues to deliver solar flare data studied by scientists the world over. RHESSI was the first space mission to be named after a NASA scientist.

About the Author

Share With Your Colleagues