By Don Cohen
COHEN: I want to talk about your take on the 7120.5D processes and requirements, but let’s start with the Mars program experience that has shown you how projects work.
MANNING: In late ’92 or early ’93, Brian Muirhead, the flight systems spacecraft manager of Mars Pathfinder, said, “I need a chief engineer who can deal with the software and electronics of this mission, because we’re doing new things.” Brian and Tony Spear, the Pathfinder project manager, were able to pull together a very young and energetic team peppered with old, wise people. Mars Pathfinder was among the first of the faster, better, cheaper missions. We modified the old way of doing business, trying to streamline it to make it faster and cheaper.
COHEN: It’s hard to do more than two of “faster, better, cheaper.”
MANNING: The trouble is, you need all three. Brian tried to get people very disciplined not just about cost control but scope control. You just can’t make things cheaper by whipping people; you have to adjust scope, make things as simple as you can, but not too simple. For a while we were too simple. For instance, we didn’t have a radar to detect when we were getting close to the ground. We had a fifty-meter cable with a little touchdown sensor that sent a signal up to inflate the airbags, but the lander might land before the tether did, so we said that wasn’t going to work. We added a low-budget radar we found. We tried to use a lot of commercial stuff, which was a mixed story. Sometimes you spend more money convincing yourself that the off-the-shelf instrument will work than you would have spent had you built a unit ten times more expensive. Mars Pathfinder landed the fourth of July 1997, the first airbag lander, with its little Sojourner rover. For a while, Sojourner was treated like a parasite.
COHEN: In what sense?
MANNING: It was this funny thing bolted to the inside of the lander; we didn’t see where it would take us. The faster, better, cheaper mission paradigm was an experiment; the landing system was an experiment; the science that you could do with a rover was experimental. In the grand scheme of things, Mars Pathfinder will not be counted as the greatest scientific mission, but it broke a logjam. We hadn’t been to Mars since the very successful but very expensive Viking missions of the late seventies. The notion was that you couldn’t land on Mars without a substantial budget, maybe in the billions of dollars. So a $270 million mission, which was what Pathfinder turned out to be—including the launch vehicle, the operations, the science, the spacecraft, and the rover—seemed unbelievably low. The view that you could have a mobile platform on the surface of Mars to bring rocks right up to your nose had never been tested or taken seriously by the scientific community. Likewise, airbags. Who in their right mind would land a spacecraft and have it bounce around on the surface of another planet?
COHEN: So it was a proof of concept?
MANNING: It’s hard for human beings to accept new paradigms without experiencing them. In 1995, the World Wide Web was just starting. We wouldn’t have dreamed that, a few years later, we couldn’t do our work without it. People have to have a little taste; you have to go out on the edge to try something out for people to take it seriously.
COHEN: So faster, better, cheaper worked.
MANNING: Yes, but after Mars Pathfinder, the faster, better, cheaper model led to the sad state of affairs in 1999, when the Mars Polar Lander and Mars Climate Orbiter failed. The notion was, if Mars Pathfinder can do this mission for $270 million, we can do it again for half that. People overestimated what was possible. They didn’t see how close to the edge we were financially and technically. The combined budget of the Mars Polar Lander and the orbiter equaled the Pathfinder budget. My view was, that team was probably better than we were, but I don’t know if they’re twice as good.
COHEN: Did they build on what you learned?
MANNING: Some. Mars Pathfinder was the first planetary mission to use a single-board computer with a commercial operating system. That’s since been repeated over and over. It was the first mission to use C programming constructs that have now been ubiquitous for more than a decade. Much of the software on Pathfinder went on to fly on Stardust, Genesis, and Odyssey.
COHEN: Were there particular mechanisms for passing along what was learned?
MANNING: We used the NASA lessons learned process to put particular lessons in the lessons learned database, but that doesn’t substitute for the people connection. You’ve got to connect with and talk to individuals who have gone through these experiences, either as review board members or team members or leaders of the follow-on mission. That’s the only method I know that really works. Frankly, one of the things faster, better, cheaper dropped was a paper record of what you did and why you did it. In the sixties and seventies, people had the time and financial resources to write reports. Even today, unfortunately, once you’ve landed, you’ve got to be off the job, because there are other projects ready to go.
COHEN: Some companies use peer assist—conversations with people who have done similar work—to pass along project knowledge.
MANNING: We do that. In the case of Pathfinder, we hadn’t landed on Mars in almost a quarter of a century, but the people who did it were still around. You go to Israel Taback, the chief engineer working for Jim Martin, and to Jim, the project manager of Viking. You go to Paul Siemers of the Viking project. They’re worth their weight in gold. They’ll say, “There’s a paper written by so-and-so. Call that person. That’s what I would do.” Imagine having Jim Martin, Iz Taback, Gentry Lee, Duncan MacPherson, and John Casani all in the same review board staring you down. Jim Martin saying, “If you can’t show me this entry-descent-landing system is going to work in the next four hours, this project is going to be over by noon. We shouldn’t be wasting taxpayers’ money if we don’t know how to pull this off.” The good news is that he and others had prepared us.
COHEN: You convinced them.
MANNING: We were the first in twenty-five years, and I think they wanted us to try. Convincing ourselves that it would work was touch and go. We had so many air bag failures, so many drop test failures, so many software problems. Literally two months before launch we were doing a full-up test of the entry-descent-landing system with the spacecraft and our test bed vehicle, and it crashed. We launched knowing that the software on board had a slim chance of working. We like to say, get our software done by launch, but it’s never really done. That seven months after launch has paid off multiple times on almost all our missions.
COHEN: You couldn’t delay the launch?
MANNING: You can launch to the outer planets with some regularity because you can fly by Venus and Earth a few times. But with Mars, you’re stuck in a two-week launch window every twenty-six months.
COHEN: That will be true for manned missions to Mars.
MANNING: The pressure will be phenomenal. The good news is, you can go into Earth orbit and wait it out there. But you’re right. When we develop missions to go to Mars with people, you’re going to see the same two-week window. All the launch pads are going to be in incredible use. One launch error or disaster potentially knocks the whole armada off.
COHEN: Do you see the Mars mission failures of 1999 as reality checks?
MANNING: In some respects, they were the best thing that could have happened. They reminded us that we were on the edge. Had they not occurred, others would have. You can dance a long time on the edge of the cliff, but if you’re that close, you’re going to fall. Dan Goldin was encouraging us to do more for less and saying, “A failure or two won’t hurt us,” but two failures within two months is painful. They reminded us what about faster, better, cheaper is good and what is bad. When you cut a project down, it acts almost like an incompressible fluid. The pressure goes astronomically as you reduce the volume. We squished those missions down until the risk squeezed way up.
COHEN: Do you think the experience taught both technical and management lessons?
MANNING: They’re almost interchangeable in my mind. I learned on Pathfinder that when you engineer something, you have to engineer the whole story. You don’t only engineer what it’s going to look like and how it’s going to work; you have to engineer the person who’s going to design it and build it. You have to think about how they work with everybody else and make sure they have the tests, the resources, the space, the test time, the schedule. It all has to go together. You might have to change the design of the system to match the capability of the people who do the work. You think, the design’s got to be what the design’s got to be, but it turns out there’s a lot of variability. You want to select a design approach that best uses the skills you have at your disposal.
COHEN: On Mars Pathfinder you had a lot of uncertainty to deal with.
MANNING: In the case of all landing systems on Mars so far, you don’t know as you’re designing it whether it’s going to work because there are so many unknowns in the Mars environment and in the systems interactions. You discover that the software doesn’t get along with the radar that you bought. The airbags, which we thought would weigh twenty to thirty kilograms turn out to be 125. That’s not the traditional twenty or thirty percent mass growth. There’s just no way you can tell a review board, “I need 500 percent margin in my mass.” They’ll say, “You don’t know what the heck you’re doing.” That’s correct; we don’t know because no one has done this before. Project managers at NASA want to stay on the road. If you’re going cross-country, you stay on the road and you name the cities you’re going to visit along the route. Entry, descent, and landing comes along and suddenly the road stops. The whole team is driving across a field or a river valley that wasn’t on the map. You end up taking the project off road because the road that you thought would take you from here to there has a big gap in it.
COHEN: I assume the Pathfinder experience laid the groundwork for the Mars Exploration Rover (MER) program.
MANNING: The whole MER premise was to take the Mars Pathfinder entry-descent-landing system, make the minimum necessary modifications in that detailed design, and fly a rover that’s designed to fit. That lasted about three months as a paradigm. It’s June 2000, and the launch date is June 2003. Projects need four years: one to do preliminary design, another to do detailed design, another to do fabrication and assembly, and the fourth year to test and launch. Three years is not enough, if you design from scratch. Even before we started seeing these changes, we got a phone call from Dan Goldin’s office saying, “Why aren’t you doing two?” We said, “No one asked. We don’t know that we can’t, and it might help us.” It turned out that it did. We wouldn’t have launched any had we only done one.
COHEN: How did it help to do two?
MANNING: When you’re building an assembly line of aircraft, you typically build one and put it through its paces to qualify that system design. Would you do that same lengthy test program for the tenth aircraft you build? No. You put it through an acceptance test program to certify that it matches the first one. With two vehicles, we put one through the set of qualifications for its cruise and entry-descent-landing phases and the other one, in parallel, through the surface phase qualification, and split the acceptance testing. That knocked a couple of months off our schedule, which allowed us to launch on time. Building two adds people in the assembly and extends the fabrication and assembly phase, test, and operations side, but the design phase isn’t affected much.
COHEN: What were some of the design challenges you faced on MER?
MANNING: We quickly discovered we needed bigger rockets and bigger parachutes because the system grew in mass. I said to myself, the volume’s the same, so we packed that tetrahedral lander for Mars Pathfinder to the brim. Turns out I was wrong. Mass grew by 50 percent in the same volume. It’s very dense. Because of that, we realized the airbags weren’t strong enough to handle the landing loads without ripping. So we had to change the airbags; we had multiple airbag designs we went through. A year and a half before launch I started to think, we won’t be allowed to launch this thing. We put these three little rockets in the back shell and a little inertial sensor that allowed us to figure out which way was up and correct itself, so when the three big rockets fired, it forced itself to be pretty much vertical. The problem is, winds could be pushing along horizontally. I’m thinking, I’ve got to get a horizontal velocity sensor to figure out how fast the thing’s moving on the way down. If there’s a big, steady wind pushing it along horizontally, right now the vehicle has no idea that’s happening because it can’t measure the velocity in respect to the ground. If the spacecraft knew the velocity, it could use the small rockets to adjust for that, too. I told my friend Miguel San Martin, “I need to get Doppler radar on the vehicle to measure velocity.” He puts two fingers up and says, “Give me two pictures.” I said, “Oh my God, what a brilliant idea. Who should I talk to?” He says, “Call Andrew Johnson. He does two-dimensional image correlation algorithms.” I knew this was not going to go over well with the project management. Emergency systems engineering, adding new subsystems at the last minute, is a sign of weakness. Luckily, it turns out we built rover electronics with ten camera ports but only nine were needed. We wanted to modify one of the existing science cameras and put it looking down and have it take pictures on the way down. It could compare two pictures. If they shifted by a certain amount and if you knew the time between them, you’d know how fast you were moving. We took three pictures—to double-check. Within six months it was in the design. Had we not used it, we would have ended up bouncing at 60 mph right toward the southern rim of Bonneville crater, where those sharp, wind-carved rocks called ventifacts lived.
COHEN: You had two vehicles to test, but then you had to launch two.
MANNING: You’re trying to get two vehicles up from the same launch area in a two-week window. If the first one blew up on the pad, you’re not going to do either one. You also want to maximize the separation of arrival times on Mars in case you have a problem with the first one. Not long after launch, we pointed out that if we had a failure in the first one, we wouldn’t have much time to set up a failure review board. So we said, “Let’s establish a failure review board and review board process in advance. We can rehearse both a successful landing and an unsuccessful landing with the failure review board present.”
COHEN: Did you learn from that?
MANNING: A ton. From the moment the first one landed until two and a half weeks later, we dissected the heck out of the first vehicle with the failure review board team present to watch us and comment. We found anomalies. Nothing serious, but for instance, we saw that the descent rate limiter lowered the lander from the backshell too slowly as it descended under the parachute. There were subtle things that we needed to get to the bottom of. We still had some elbow room to tweak things right up to the last day before the second landing. That turned out to be a very successful process. The review board learned a lot about how our team worked.
COHEN: Is there a key lesson this MER story teaches?
MANNING: In the case of these supercomplicated systems, give yourself a test program that gets you the answers you don’t want to hear early and have the team get into the test mode as soon as possible. Things that you build often don’t work to specifications, and oftentimes the environmentdoesn’t operate to specification. We wrote requirements on Mars; she failed to live up to them. You have to be willing to accept new information; new discoveries from the scientific team that say, for instance, “Hey, Rob, winds are much worse than you thought.” The systems engineer, the chief engineer, and management need to be aware that surprises can come downstream that will knock your socks off. But once you know about a problem, it becomes remarkably easy to fix it if you’ve given yourself the time. When people say, “I’m doing my testing at the end,” they’re asking for trouble unless it’s a system they already know very well. There’s no crime in being wrong. The crime is not giving yourself elbow room to fix the problems.
COHEN: How did you find experts to help you, and how do newcomers to NASA find you?
MANNING: It’s still ad hoc. If you call the people who are running the project, they will direct you to all the right people. But we have not coupled all the disciplines to individuals with their telephone numbers. We have an expertise Web site at JPL, JPL Know Who, but it’s still tough to make that happen. It’s constantly changing, and one person’s problem may not exactly match others. I keep a list of people and what their specialties are. If NASA Headquarters asks, “Who are the experts in entry, descent, and landing technology,” I send them that Excel file. We have a relatively small, tight-knit community across NASA, and any new project is going to use different mixes of the same crowd. From the upper levels of NASA, it may look like the organizations are in competition, but we all rely on each other to get the work done.
COHEN: In my experience, supporting what’s happening informally pays more dividends than heavy investment in lessons learned databases.
MANNING: The trouble with lessons learned is they say, “I made this mistake, try not to make it again.” They don’t teach you process or tell you who the experts are. They don’t tell you, for instance, how to determine margin for a system. My computer says I need this much, but I have uncertainties that say I need this much. Figuring out what to do turns out to involve an extraordinarily tricky and long-winded discussion involving people from many different disciplines.
COHEN: Let’s talk about 7120.5D and how it reflects some of your experience.
MANNING: There are three legs of program management. Program-level management focuses on the program/project objectives and the resources needed to get the job done. Mission assurance provides an independent view of safety and quality assurance and makes sure we follow the approved processes and get certification for the systems that we build. The third leg, which has always been there—we just haven’t made it official before—is the engineering leg. It’s the technical authority processes that start with the Office of the Chief Engineer and work all the way down to the projects and through the engineering line organizations at the NASA Centers. The line organizations and the lead engineers and project systems engineers have an independent technical say, almost a technical ombudsman role, going all the way up to the chief engineer in the event programs deviate from good engineering practice. We’ve been doing much of it unofficially for many years. 7120.5D makes it official. It allows people—especially new people who come to work for NASA—to understand how the processes work and the right methods for talking about engineering quality.
COHEN: Can 7120.5D do that without introducing a lot of bureaucratic paperwork?
MANNING: We have developed a lot of checklists; there are new processes involved. But some of the older processes have been streamlined. In the past, who was on your review board and how many different review boards you had was unclear. You might have a preliminary design review with one group and then a month later an independent review with a different group of people going through the same material. We said, “Let’s combine them.” Many of the processes and procedures we’ve been doing in an ad hoc way are now being codified: this is specifically what you need for this review; you only need to do it once. There’s new terminology—for example, the term “key decision point” describes the gate to get from one phase of the mission to another. That’s also been murky in the past. Is the critical design review really a gate to the next phase, or is there something else that has to happen? The new version attempts to clarify that.
COHEN: Do you think it will be equally appropriate for different projects?
MANNING: It’s tuned to different classes of projects. Constellation is a collection of projects that are very closely coupled. There’s a certain review process for that kind of program, versus the Mars program, which is a more continuous program that has projects that are somewhat coupled, but not as closely as Constellation. If we find a process is cumbersome, we’ll tune it. This is a work in progress. The first projects to use it will find holes in it. We’ll try to fix them in version E. Some people may argue that it’s great for big projects, but what about little projects? We’ll spend all our time doing documents and writing certification and flight readiness reports. We say, “Yes, you have to do it, but appropriately for your project.” That’s part of the balancing act that’s still to come.
COHEN: You’ve mentioned new terminology. Tom Gavin has talked about 7120.5D helping to standardize shared vocabulary.
MANNING: We still are not consistent about how we define terms and rules of engagement across the Agency. Having things written down really makes a difference.
COHEN: Will 7120.5D require people to document project learning?
MANNING: I don’t think that’s its intent. It’s an attempt to define the minimum requirements of projects to ensure that the system being built will meet its objective on budget and on schedule with the appropriate level of quality, mission assurance, and safety. Because budgets are tight, it’s still a problem to write down what you’ve done and why you did it.
COHEN: Which could be a problem for future missions.
MANNING: We have relied on Viking documentation to an extraordinary extent. Because there was a twenty-five-year hiatus between Mars missions, we could never have done Pathfinder and MER without the Viking documentation. The same thing is happening with Constellation and Apollo. The Apollo documentation has really helped people understand what happened in the 1960s. It’s helped them get a dose of experience they would not have gotten at this phase of the program otherwise. It has been healthy for the Agency to study the knowledge that was developed at that time.
COHEN: What do you think is the best way to present new documents to make clear it’s not just a bureaucratic annoyance?
MANNING: It would be useful for somebody to create a presentation that explains where it came from and why it’s the way it is. We tried to put as much information in as small amount of space as possible. 7120.5D has a long, rich history. Laying down rules and making a list of them is important, but when you first get them, you shouldn’t get them as rules, you should get them as stories so you can understand the context behind the rules: what is the problem being solved here? If you’re just following the rules without being aware of why you’re doing what you’re doing and why it’s important to your success, you’re being an automaton. Reading the document will probably confuse a lot of people unless they’re steeped in the stories behind it. There’s a logic behind the document that’s very deep and rich. If you read it without the context, it looks bureaucratic, but it’s based on crisp and well-thought-out project and program management issues: How is money assigned? How does NASA avoid throwing good money after bad? How is programmatic and technical risk communicated? How do we make sure that our cost estimates will be close to being right? A lot of it has to do with controlling the future, which is a notoriously difficult thing to do. It represents the best we’ve got to so far.