Developing the Orbiter Boom Sensor System (OBSS) was a prime example of a highly critical, highly visible, fast-turnaround project. When the work was authorized in September 2003, we were asked to complete it in six months, in time for a projected March 2004 shuttle Discovery launch date. After the Columbia accident, no shuttle was going to fly until we had the capability to examine it for damage after launch, so any significant delay in building the boom would keep the shuttle program and the work that depended on it — notably the completion of the International Space Station — on hold.
Managing such a project has special challenges and pressures and a few advantages, too. The clearest benefit of such high-profile, critical work is the ability it gave us to recruit top people to what, at one point, was a 500-member team. (And having high-quality team leads was one essential source of success.) We didn’t have to convince anyone that the work mattered to the space program and to the safety of our astronauts. And the importance of returning to flight and preventing future catastrophes gave us a defining and unifying goal that inspired hard work and cooperation, although, as with any project, it was important to help team members keep the goal in view as they dealt with the details, complexities, and inevitable frustrations of their parts of the work.
Successfully meeting the technical and organizational challenges of the project required not only team dedication but outstanding communication and openness, constant vigilance to detect and correct problems that could delay development, and clarity about what we needed to accomplish.
Reality Check
The feasibility assessment of inspection options that began weeks after the Columbia accident concluded that a boom sensor system to examine the shuttle’s thermal protection system in orbit could be developed using previously flown hardware and existing NASA spares in six months for under $40 million. The system requirements review we held within a month of forming the project management team clearly showed how unrealistic that assessment was. The available hardware was not “criticality-1” rated and therefore not acceptable to use on a system judged essential for astronaut safety. Also, required structural supports and vehicle modifications were not included in the initial assessment. The plans called for two sensor packages to meet the requirement for redundancy, but our initial timetable limited us to one because of the vehicle and boom modifications needed to provide enough power for two sensor packages. Many components — especially electronics — that worked properly in the relatively protected environment of the payload bay would need to be tested for survivability in the harsher conditions they would face at the end of a boom. We would likely have to develop new shielding and heaters to protect them.
One of my first jobs as project manager was to report to the Program Requirements Control Board that we could not meet their proposed cost or schedule. I said that the requirements the program had set for the project would cost $100 million and that we had less than a 10 percent chance of completing the project in the next ten months. Assuming no serious technical problems — a risky assumption — we estimated that the project would take about twice that long. This was not an easy message to deliver, but clarity and honesty were important to our success. I wanted management to support our actual cost and schedule, to recognize the risks, and know the project’s real needs. I’ve sometimes said, jokingly, “We were working so hard we didn’t have time to do anything but tell the truth.” But the truth in that joke is that telling the Board anything less would have made the project much harder — depleting time, energy, and good will — when we inevitably would have had to go back to management to ask for more time and resources. As it happened, the development cost came within 5 percent of our estimate.
Communication and Being There
Full and honest communication, with management and especially within the team, was a hallmark of the project and a major factor in our success. Weekly project meetings with core team leaders to share information and solve problems were not enough. Frequent teleconferences helped keep information flowing, but they were no substitute for meeting face to face. Travel, travel, travel was the most important part of our communication strategy. Groups in California, Texas, Florida, New Mexico, and Canada worked on the OBSS. Regular travel to those sites was absolutely essential to the work. Only actually being there makes it possible to understand issues fully and provide the necessary support and encouragement. Having the customer on site helps focus the work of even the best contractors. One important lesson we learned was that we should have spent more time earlier in the project with all our contractors. We had assumed it would not be necessary to track or visit experienced contractors that had been reliable in the past, but that turned out not to be true. Most critically, being there is sometimes the only way to identify problems before they threaten project cost and schedule.
In one instance, one of our main partners did not report a manufacturing problem it thought it could handle alone until weeks before a major delivery milestone, leaving no time to adjust the schedule in other parts of the project to compensate for the resulting delay. A lead team member went to their facility and stayed until he was sure they were back on track. Their reluctance to report the problem as soon as it arose is not surprising. NASA engineers and contractors usually try to solve problems before they elevate them to the next level. “Never show up without a potential solution” is part of the culture. But we needed to change that behavior, to encourage people to bring up every concern to the project level as soon as it occurred so the best resources from the whole team could be applied to solve it. Over time, we established a we-have-a-problem attitude rather than a they-have-a-problem attitude. Having people travel from site to site contributed to this change. As people got to know and trust each other and recognize that we were all working toward the same goal, information about problems became just data for the team to work with, not indications of failure.
Having a single repository for all project documents was another valuable contributor to collaboration. The systems engineer assigned to OBSS was our “documents guru.” Even though International Traffic and Arms Regulations meant there was some information our Canadian partners could not see and therefore added a management chore, that central repository saved time and effort by organizing documents and making them easily accessible.
Another aspect of our communication strategy dealt with communicating with outside assessment groups. The OBSS project was subject to a lot of scrutiny. Independent assessments were conducted by the Inspector General’s Office, by numerous safety and financial organizations, and by the Stafford Covey Task Group. Some assessments were helpful and some were not. We found it essential to have people specifically assigned to handle these outside requests for information, to act as a buffer for a technical team that was already stressed by the demanding work and could not afford to be distracted from their project tasks. The central repository also helped with assessments. As requests for information rolled in, we could send the link and let the requesting organizations pull the data they needed themselves.
Managing Risk
Open communication and our emphasis on identifying and dealing with potential problems as soon as possible were important parts of our efforts to reduce risks that could threaten our schedule or the successful performance of the boom. I initially resisted devoting time and resources to a formal risk-management system, but it proved well worth the investment and our top-level risk matrix became a valuable tool when providing updates on the project’s status. Because our schedule was so critical, we used multiple vendors for long-lead components and multiple shops for critical-path manufactured parts to ensure that a serious problem with one item would not delay the project. And we worked serially on three units — two flight units and a “spare” to give us some additional insurance against unforeseen manufacturing problems.
Fully integrated testing of the system on the flight vehicle revealed problems that would otherwise not have been discovered until the boom was in orbit. The risk of technical failure is high in a fast-turnaround project given that requirements development, design, and build phases come in rapid succession, so integrated testing is all the more important.
Moving Deadline, Changing Requirements
The return-to-flight launch date changed several times during our project, for reasons unrelated to progress on the boom. Early in our work, the original six-month target became nine months and then a year. Ultimately, Discovery launched in July 2005, sixteen months after we began work. At first glance, this sounds like good news for us — who can object to having more time to complete their project?
But the repeatedly changing date created its own problems; the postponements created an expectation that we would increase the quality, performance, and safety of the product without, of course, adding to the budget. The biggest example of changing requirements increasing cost was the decision to use some of that “extra” time to develop the originally specified two-sensor packages instead of the one that earlier deadlines seemed to require. In effect, we had multiple release dates for the OBSS, having the single-sensor version certified, tested, and ready to fly while we worked on the two-sensor boom. We followed that same pattern with software development, making sure one version was ready to fly while the team worked on enhancements that might or might not be tested, certified, and ready to go by the launch date.
As anyone who has managed a NASA development project knows, even without launch delays, it is hard to get team members to stop improving the product. They will want to fill any additional available time with tweaks and enhancements. One of the toughest tasks of a project manager is to decide when good is good enough and call a halt to further improvements.
Success
The Space Shuttle Discovery took off July 26, 2005, approximately two and a half years after theColumbia accident. On the 27th, the hard work of the OBSS project team paid off when the Orbiter Boom Sensor System successfully deployed and examined Discovery‘s thermal protection system. While we were confident the sensor would work and believed the mechanical elements of the system would work well, we breathed a collective sigh of relief when the boom was successfully re-stowed in the shuttle’s payload bay.