Brian Thomas, Agency Data Scientist at NASA Headquarters, discusses artificial intelligence.
Thomas provides examples of how NASA uses artificial intelligence to bring value to the agency, increase efficiencies, and support mission success.
In this episode of Small Steps, Giant Leaps, you’ll learn about:
- Specific applications of artificial intelligence across NASA
- Ways the agency is benefitting from machine learning
- Strategic efforts to improve data sharing
Brian Thomas is the Agency Data Scientist at NASA and is responsible for understanding challenges and capabilities in data science, data analytics and big data. Thomas conducts research in software engineering, data mining and computing, and applied data science. Projects include machine learning, deep learning, cloud-based analytics and game-based analysis. He has more than 25 years’ experience supporting or leading scientific research, data analysis and scientific programming at a number of institutions and facilities, including NASA, the National Optical Astronomy Observatory and the University of Maryland. Thomas holds a bachelor’s in physics from the University of Georgia and a doctoral degree in astronomy and astrophysics from Penn State University.
Brian Thomas: The upshot is that the machine in this case can actually do a better job than an individual records manager can do.
In terms of boots on the ground and practical projects that bring value to the agency today, I think it’s definitely on the soft AI end. It’s the machine learning end, which is very exciting, potential new capability for the agency.
It shouldn’t be viewed as some sort of scary thing that’s going to get rid of people. Rather, to my mind, it looks more like a bionic arm. This is going to help people execute tough, large jobs.
Deana Nunley (Host): You’re listening to Small Steps, Giant Leaps – a NASA APPEL Knowledge Services podcast featuring interviews and stories, tapping into project experiences in order to unravel lessons learned, identify best practices and discover novel ideas.
I’m Deana Nunley.
Artificial intelligence takes center stage today. Our conversation is with Brian Thomas, the Agency Data Scientist for NASA’s Office of the Chief Information Officer.
Thomas: I work in the area of trying to provide strategic insight to senior leadership in the areas of data science, big data, and data analytics. I have a team of data scientists and we take on data science challenges across the agency in order to better understand technology, but we also like to go around and talk to folks at the agency, to understand their challenges and capabilities. Clearly, if there’s someone out there at the agency who has already solved a challenge, we like to combine them with the challenge owner, so that we can facilitate in this way better understanding of analytics and better interagency communication.
Host: As you’re addressing data science challenges and helping people solve problems across the agency, how does artificial intelligence fit in?
Thomas: Artificial intelligence, generally speaking, you can think of it in two different ways. There’s hard AI, which is kind of the classic sci-fi approach of we’re going to have a machine that thinks for itself, is self-aware. That’s not what our team does.
It’s on the other end of things or so-called soft AI, where there are a number of procedures, approaches, which allow one to leverage a machine in new and novel ways that essentially boil down to teaching the machine through data rather than a priori coming up with an algorithm or an approach. So machine learning is essentially soft AI. It’s not all of soft AI, but it’s definitely a large chunk of it, and machine learning is basically what I’ve described. It’s a set of algorithms that one can apply a training set to that allows you to train this function approximator, if you will, to carry out and do work for you.
Host: Is NASA’s use of artificial intelligence primarily on the soft side?
Thomas: There probably are folks interested in hard AI at the agency, but in terms of boots on the ground and practical projects that bring value to the agency today, I think it’s definitely on the soft AI end. It’s the machine learning end, which is very exciting, potential new capability for the agency.
Host: What are some of the ways NASA is integrating AI into projects to help provide value?
Thomas: So, there are a lot of projects. We could probably talk for hours in terms of the applications, but they span from science in our engineering areas through operations and down into our business units. The value proposition here is that there are many problems, where understanding the data, having a human or a team of humans sitting down and understanding the data discretely, converting that into a heuristic or algorithm that then does what you want is impractical because it would take too much time. The data have such complexities that it would be very difficult to get it done accurately. So, the resourcing that would be required prevents us from solving these problems.
So we have problems in this class, where if we could create a training set that sort of describes the behavior we know needs to be encompassed by the algorithm, then we can actually now tackle that problem. So, in the areas of engineering, I’ve heard of folks applying machine learning to understand aircraft wing flutter. Flutter is essentially when the wing will become unstable. Now you could do discrete engineering to understand all the struts and the structure of the wing itself, but this becomes more and more a complex exercise, especially with modern material science playing into things and engineering techniques.
When one can essentially take a look at test data of how that particular design has behaved, one can actually then approach it from a machine learning standpoint in a much cheaper and perhaps even more accurate way than one would from doing the discrete heuristic from the engineering first principles in the business areas that our group focuses here.
Another example is trying to understand records retention for senior leadership e-mail. It’s not exactly obvious what exactly is a record and what isn’t. Sometimes it is. “Hey, let’s do lunch.” That’s not a record. But then there are other things where there may be a discussion between a vendor and a senior leader, and it may initially start out as sort of a, “Hey, why don’t you look at my technology?” call, but may evolve into something more strategic and have value to the agency to retain that record.
So our group basically used machine learning to classify records. We had records managers sit down with something like 10,000 of these e-mails and then classify them. It turns out that a human probably has an accuracy of somewhere in the neighborhood of 85 to 95 percent. We were able to achieve 95 percent accuracy by taking the combined or aggregate training set between several records managers, and applying a voting mechanism in order to reject ones that fell out. The upshot is that the machine in this case can actually do a better job than an individual records manager can do.
Furthermore, this task, if we had left it to a human, would have taken something like 23 human years of work. We clearly can’t wait that long, nor are we resourced to do so. The machine can actually, once it’s trained, which takes a matter of 10 to 20 hours of effort for a team to do, it can then execute all of these records, terabytes’ worth in a matter of a few hours. So, it’s a huge savings for the agency.
Host: When you start talking about machines delivering this much improvement in accuracy, quality and efficiency, is this cause for people to get a bit anxious about job security?
Thomas: I would like to say that even though we call it AI, even though it’s something similar to the machine – it appears the machine is thinking for itself, at the end of the day, it’s essentially a function approximator that’s approximating what a human would do, and it can encapsulate very complex behavior. I want to point out that when I talk with folks about applying machine learning to their various problems, it’s not a fantasy. If the trained data aren’t there, if you don’t have enough data, then you can’t use it.
Furthermore, it shouldn’t be viewed as some sort of scary thing that’s going to get rid of people. Rather, to my mind, it looks more like a bionic arm. This is going to help people execute tough, large jobs that fall in this space. So, it is definitely along the lines of efficiencies. That’s how I would look at it. Without the human to provide the trained data, the machine learning algorithm cannot go.
These things, an important aspect of them along these lines is that they’re bespoke. That is you can’t just take any old algorithm off the shelf and expect it to work magic. You tend to cycle through a variety of possible machine learning approaches until you find the one that gives you the best return on value. Again, you have to have a human that’s providing these test data, a domain expert that you’re trying to encapsulate their behavior.
The best systems, we have a machine learning-based document tagging system that we developed for our STI unit, is essentially something that works in tandem with people, where it will offer a probabilistic choice of metadata tags that might apply to any one document, but a human records manager then would overview that and say, “Yeah. I agree with that. That looks great,” and put a checkmark. Then that feedback goes back into retraining the algorithm to make it that much more accurate in the future.
Sometimes a human records manager would say, “No. This tag suggests it doesn’t apply. In fact, you’re missing a tag that’s a new tag that we previously haven’t considered, but we want to include.” So working in tandem with the solution, you can not only improve the accuracy and the efficiency, but it can, if you structure the interaction between the human and the machine algorithm correctly, in many cases it’s self-reinforcing a virtuous cycle where it gets better and better over time.
Host: As you’re talking about training or retraining an algorithm, one thing that comes to mind is “NASA-speak”—a lot of technical terms and acronyms and abbreviations. How do you work that into machine learning?
Thomas: Specifically, our group, and our brethren group at Langley, the data science group down there under Ed McLarney and Jeremy Yagle, have worked with speech detection solutions. Those are machine learning-based solutions, where you train it to recognize certain words. Certainly, you may ask the question, “Why can’t we just take something off the shelf from, say, Google or Facebook, who have very elegant solutions?” The answer is twofold.
Number one, oftentimes we’re applying this to data which are sensitive. So, we cannot utilize an off-premise solution that a vendor might be supplying. So we have to bring our own, as it were.
Secondly, Google’s solution, while it’s very good, it doesn’t recognize technical jargon, which I think we’ll all appreciate, working here at NASA, exists in abundance. So many acronyms and many technical terms will be missed by an off-the-shelf solution. So, it really requires working with domain experts and folks inside the agency to train up speech models that can then be applied.
It can be applied to many things. I know our fellow Langley colleagues were working on outbriefs for NASA personnel to do knowledge retention. So that’s a direct example to what you asked there.
Host: What trends are you seeing in the field of machine learning or artificial intelligence that could provide more value to NASA?
Thomas: I guess I’ll dial back to answer your question with a small story. Machine learning has been around since probably, arguably, the late ’50s. Certainly the idea of the perceptron is, I think, 1958. So the real question is why haven’t we used this earlier? This technology has been around for a long time. I myself, when I was a contractor at Goddard, executed a project in the early 2000s using neural nets, deep learning as we might now refer to it.
The answer is that previous to today, we haven’t had the two important things that are necessary. That is the processing power – these things are very thirsty for computational power – and the appropriate data. Without these two things, the machine learning solution cannot function.
Now we are definitely in an area now, especially with the advent of cloud computing at the agency, which my group and others in the OCIO are working hard to bring to our frontline users, but also the need for data. So, we need to be able to overcome essentially both technical and functional problems at the agency. So from a technical standpoint, missing overarching platforms across the agency and outside of mission boundaries provide access to necessary data and metadata that help folks at the agency understand the data. And a new program, called the Information Management Program, is seeking to, at least in part, tackle this problem, but also functional problems in terms of just cultural issues.
We’re very mission-driven here at the agency. It’s one of our strengths. But one of the, I would guess, negatives outcomes of being mission-driven is that we create cylinders of excellence, both in terms of team excellence and data. It becomes, for cultural reasons, even if we have the platforms in place, hard to get at the data. So we need to tackle both of these issues, so that we have sufficient data that we can apply these new classes of algorithms to, to solve problems for the agency and bring value. So that is really where we’re at today.
Also, from a programming standpoint, bringing vendor solutions into the NASA space is an important component, but it really is not the tall pole here. It really is about how do we get folks access to the necessary data that they can form up their training sets and apply these solutions. That’s where the real challenge is today, in my opinion.
Host: How do you overcome that challenge? What are the first steps for a program or project?
Thomas: I tend to look at this as a data scientist. So the most important thing to do upfront is to not identify any technical solution whatsoever. For example, you don’t know that you need machine learning. What you really need to do is identify the questions you want to answer. This is the most crucial first step.
Having done that, then start examining in tandem with somebody who is a domain expert and might understand the data sources. What are the data that actually have the information that might answer this question? Then what form are they or what form can we put them in under the resourcing we have? Sometimes that means working with a single data source that’s a little messy. Sometimes it means merging or combining data sources. They all have various difficulties.
These two things, should take up a significant chunk of your upfront time. Then the actual production of the dataset may then take up somewhere in the neighborhood of 30 to 50 percent of your time. During the time that you’re producing the datasets, you can be identifying algorithms that you might try, and it may be that you can apply something that’s not machine learning. You should do that.
If the heuristic is easy enough to understand and solves the question for you, that’s exactly what you should do. You should deterministically program it in whatever favorite paradigm and language you have at hand and solve the problem that way, analytically. Or if that’s not easy and the resourcing is out of scope, then machine learning might be a go-to for you. So, a lot depends on the question you’re answering and the type of data you have to hand to, to answer that question. Then that would determine the actual approach.
How we compare to industry, we’re kind of the opposite. If you take a look at the agency in terms of job description titles, which is not the most accurate way, but is certainly one of the easiest to hand ways of looking at the experience that exists at the agency, you’ll find that between the roles or job titles of program analysts, folks who have engineer in their title and scientist, you’ll come to a population of roughly somewhere in the neighborhood of just south of 30,000 folks.
That’s a huge population. That’s arguably almost half of the folks working onsite at NASA. And those are clearly folks that, if they’re not doing it now, at one point did some kind of data analytics or some very sophisticated algorithm production in their past. This compares to industry, where most organizations are rather thin in this area.
On the flip side, because of our mission-oriented and other history of the agency, we have thousands of very rich, large data sources that are not tied together, not well described outside of the mission, and are difficult to get at for folks who are outside of the mission. So, we’re kind of in the converse position strategically than our commercial brethren.
Again, to me, the problems we have are hard for the wrong reasons. These are solved problems in the commercial space. We should seek commercial solutions largely to solve these problems. Once we do, we will leapfrog past certainly many others in the federal space and I would argue in the commercial space as well.
When folks at NASA are able to get at the data and produce these training sets, I have no fear that we lack the expertise necessary to leverage these solutions. It’s there. It’s really getting at the data that’s our core problem.
Host: Do you see progress being made here?
Thomas: I think many of our technical folks already get it and they see value here when they can get at their data. I think bringing the understanding to our strategic leadership is a next step. I think there are strategic leaders that already get it. One of the things that our office is doing is attempting to create a data strategy working group for the agency. We’ll start small, within the OCIO, but we hope to grow this so that we’re engaging strategic and thought leaders across the mission and mission support space, so that they can see in front of them many of the technical solutions and issues that the agency is facing, raise the awareness, so that they themselves can help lead initiatives and formulate policies, which will be beneficial towards the uptake of the necessary technologies to support sharing of data where we can leverage these solutions.
Host: Really interesting. Brian, thanks for taking time to talk with us today.
Thomas: OK. It was my pleasure. Thank you so much for giving me the opportunity.
Host: Sure. Any closing thoughts?
Thomas: I don’t want to leave any doubt that machine learning is a viable, productive, innovative technology that the agency must take advantage of. There are many problems that we just simply have not been able to get to, that we’ve solved approximately with traditional approaches because of lack of this tool. This is a very useful tool for the agency. It is not an experimental tool in the sense of we don’t understand if this is ever going to bring value, stand back at arm’s length and let somebody else play around with this. This is absolutely something that can bring value to the agency today. If you can get access to the necessary data and processing power, it absolutely is a viable solution for many problems that we haven’t been able to solve at the agency until now.
Host: Thanks again to Brian Thomas for sharing his insights on artificial intelligence and how it fits into the NASA landscape. For Brian’s bio and pertinent links as well as a transcript of today’s episode, please visit APPEL. NASA.gov/podcast.
We’d like to hear your suggestions of guests and topics for upcoming shows. Let us know on Twitter at NASA_APPEL and use the hashtag SmallStepsGiantLeaps.
If you haven’t already, please take a moment and subscribe to the podcast, and tell your friends and colleagues about it.
Thanks for listening.
Angelo Conner, Millennium Engineering and Integration Company, contributed to the development of this episode.