DeepMind’s David Silver on games, beauty, and AI’s potential to avert human-made disasters

DeepMind’s David Silver speaks to the Bulletin of the Atomic Scientists about games, beauty, and AI’s potential to avert human-made disasters. Photo provided by David Silver and used with permission. DeepMind’s David Silver speaks to the Bulletin of the Atomic Scientists about games, beauty, and AI’s potential to avert human-made disasters. Photo provided by David Silver and used with permission.

David Silver thinks games are the key to creativity. After competing in national Scrabble competitions as a kid, he went on to study at Cambridge and co-found a video game company. Later, after earning his PhD in artificial intelligence, he led the DeepMind team that developed AlphaGo—the first program to beat a world champion at the ancient Chinese game of go. But he isn’t driven by competitiveness.

The ancient Chinese game of go. Photo credit: Marco Rubens. Used with permission.

That’s because for Silver, now a principal research scientist at DeepMind and computer science professor at University College London, games are playgrounds in which to understand how minds—human and artificial—learn on their own to achieve goals.

Silver’s programs use deep neural networks—machine learning algorithms inspired by the brain’s structure and function—to achieve results that resemble human intuition and creativity. First, he provided the program with information about what humans would do in various positions for it to imitate, a learning style known as “supervised” learning. Eventually, he let the program learn by playing itself, known as “reinforcement” learning.

Then, during a pivotal match between AlphaGo and the world champion, he had an epiphany: Perhaps the machine should have no human influence at all. That idea became AlphaGo Zero, the successor to AlphaGo that received “zero” human knowledge about how to play well. Instead, AlphaGo Zero relies only on the game’s rules and reinforcement learning. It beat AlphaGo 100 games to zero.

David Silver led the DeepMind team that developed AlphaGo—the first program to beat a world champion at the ancient Chinese game of go. Photo credit: Marco Rubens. Used with permission.

I first met Silver at the Heidelberg Laureate Forum—an invitation-only gathering of “the most exceptional mathematicians and computer scientists of their generations.” In Heidelberg, he was recognized for having received the Association for Computing Machinery’s prestigious Prize in Computing for breakthrough advances in computer game-playing.

“Few other researchers have generated as much excitement in the AI field as David Silver,” Association for Computing Machinery President Cherri M. Pancake said at the time. “His insights into deep reinforcement learning are already being applied in areas such as improving the efficiency of the UK’s power grid, reducing power consumption at Google’s data centers, and planning the trajectories of space probes for the European Space Agency.” Silver is also an elected Fellow of the Royal Society and was the first recipient of the Mensa Foundation Prize for the best scientific discovery in the field of artificial intelligence.

Silver’s stardom contrasts with his quiet, unassuming nature. In this condensed, edited, from-the-heart interview, I talk with Silver about games, the meaning of creativity, and AI’s potential to avert disasters such as climate change, human-made pathogens, mass poverty, and environmental catastrophe.

As a kid, did you play games differently from other kids?

I had some funny moments playing in National School Scrabble competitions. In one event, at the end of the final game, I asked my opponent, “Are you sure you want to play that? Why not play this other word which scores more points?” He changed his move and won the game and championship, which made me really happy.

More than winning, I am fascinated with what it means to play a game really well.

How did you translate that love of games into a real job?

Later on, I played junior chess, where I met [fellow DeepMind co-founder] Demis Hassabis. At that time, he was the strongest boy chess player of his age in the world. He would turn up in my local town when he needed pocket money, play in these tournaments, win the 50-pound prize money, and then go back home. Later, we got to know each other at Cambridge and together we set up Elixir, our games company. Now we’re back together at DeepMind.

What did this fascination with games teach you about problem solving?

Humans want to believe that we’ve got this special capacity called “creativity” that our algorithms don’t or won’t have. It’s a fallacy.

We’ve already seen the beginnings of creativity in our AIs. There was a moment in the second game of the [2016] AlphaGo match [against world champion Lee Sodol] where it played a particular move called “move 37.” The go community certainly felt that this was creative. It tried something new which didn’t come from examples of what would normally be done there.

But is that the same kind of broad creativity that humans can apply to anything, rather than just moves within a game?

The whole process of trial-and-error learning, of trying to figure out for yourself, or asking AI to figure out for itself, how to solve the problem is a process of creativity. You or the AI start off not knowing anything. Then you or it discover one new thing, one creative leap, one new pattern or one new idea that helps in achieving the goal a little bit better than before. And now you have this new way of playing your game, solving your puzzle, or interacting with people. The process is a million mini discoveries, one after the other. It is the essence of creativity.

If our algorithms aren’t creative, they’ll get stuck. They need an ability to try out new ideas for themselves—ideas that we’re not providing. That has to be the direction of future research, to keep pushing on systems that can do that for themselves.

If we can crack [how self-learning systems achieve goals], it’s more powerful than writing a system that just plays go. Because then we’ll have an ability to learn to solve a problem that can be applied to many situations.

Many thought that computers could only ever play go at the level of human amateurs. Did you ever doubt your ability to make progress?

When I arrived in South Korea [for the 2016 AlphaGo match] and saw row upon row of cameras set up to watch and heard how many people [over 200 million] were watching online, I thought, “Hang on, is this really going to work?” It was scary. The world champion is unbelievably versatile and creative in his ability to probe the program for weaknesses. He would try everything in an attempt to push the program into weird situations that don’t normally occur.

I feel lucky that we stood up to that test. That spectacular and terrifying experience led me to reflect. I stepped back and asked, “Can we go back to the basics to understand what it means for a system to truly learn for itself?” To find something purer, we threw away the human knowledge that had gone into it and came up with AlphaZero.

Humans have developed well-known strategies for go over millennia. What did you think as AlphaZero quickly discovered, and rejected, these in favor of novel approaches?

We set up board positions where the original version of AlphaGo had made mistakes. We thought if we could find a new version that gets them right, we’d make progress. At first, we made massive progress, but then it appeared to stop. We thought it wasn’t getting 20 or 30 positions right.

Fan Hui, the professional player [and European champion] we were working with, spent hours studying the moves. Eventually, he said that the professional players were wrong in these positions and AlphaZero was right. It found solutions that made him reassess what was in the category of being a mistake. I realized that we had an ability to overturn what humans thought was standard knowledge.

After go, you moved on to a program that mastered StarCraft—a real-time strategy video game. Why the jump to video games?

Go is one narrow domain. Extending from that to the human brain’s breadth of capabilities requires a huge number of steps. We’re trying to add any dimensions of complexity where humans can do things, but our agents can’t.

AlphaStar moves toward things which are more naturalistic. Like human vision, the system only gets to look at a certain part of the map. It’s not like playing go or chess where you see all of your opponent’s pieces. You see nearby information and have to scout to acquire information. These aspects bring it closer to what happens in the real world.

What’s the end goal?

I think it’s AI agents that are as broadly capable as human brains. We don’t know how to get there yet but we have a proof of existence in the human brain.

Replicating the human brain? Do you really think that’s realistic?

I don’t believe in magical, mystical explanations of the brain. At some level, the human brain is an algorithm which takes inputs and produces outputs in a powerful and general way. We’re limited by our ability to understand and build AIs, but that understanding is growing fast. Today we have systems that are able to crack narrow domains like go. We’ve also got language models which can understand and produce compelling language. We’re building things one challenge at a time.

So, you think there’s no ceiling to what AI can do?

We’re just at the beginning. Imagine if you run evolution for another 4 billion years. Where would we end up? Maybe we would have much more sophisticated intelligences which could do a much better job. I see AI a little bit like that. There is no limit to this process because the world is essentially infinitely complex.

And so, is there a limit? At some point, you hit physical limits, so it’s not that there are no bounds. Eventually you use up all of the energy in the universe and all of the atoms in the universe in building your computational device. But relative to where we are now, that’s essentially limitless intelligence. The spectrum beyond human intelligence is vast, and that’s an exciting thought.

Stephen Hawking, who served on the Bulletin’s Board of Sponsors, worried about unintended consequences of machine intelligence. Do you share his concern?

I worry about the unintended consequences of human intelligence, such as climate change, human-made pathogens, mass poverty, and environmental catastrophe. The quest for AI should result in new technology, greater understanding, and smarter decision making. AI may one day become our greatest tool in averting such disasters. However, we should proceed cautiously and establish clear rules prohibiting unacceptable uses of AI, such as banning the development of autonomous weapons.

You’ve had many successes meeting these grand challenges through games, but have there been any disappointments?

Well, supervised learning—this idea that you learn from examples—has had an enormous mainstream impact. Most of the big applications that come out of Google use supervised learning somewhere in the system. Machine translation systems from English to French, for example, in which you want to know the right translation of a particular sentence, are trained by supervised learning. It is a very well understood problem and we’ve got clear machinery now that is effective at scaling up.

One of my disappointments at the moment is that we haven’t yet seen that level of impact with self-learning systems through reinforcement learning. In the future, I’d love to see self-learning systems which are interacting with people, in virtual worlds, in ways that are really achieving our goals. For example, a digital assistant that’s learning for itself the best way to accomplish your goals. That would be a beautiful accomplishment.

What kinds of goals?

Maybe we don’t need to say. Maybe it’s more like we pat our AI on the back every time it does something we like, and it learns to maximize the number of pats on the back it gets and, in doing so, achieves all kinds of goals for us, enriching our lives and helping us doing things better. But we are far from this.

Do you have a personal goal for your work?

During the AlphaGo match with Lee Sedol, I went outside and found a go player in tears. I thought he was sad about how things were going, but he wasn’t. In this domain in which he had invested so much, AlphaGo was playing moves he hadn’t realized were possible. Those moves brought him a profound sense of beauty.

I’m not enough of a go player to appreciate that at the level he could. However, we should strive to build intelligence where we all get a sense of that.

If you look around—not just in the human world but in the animal world—there are amazing examples of intelligence. I’m drawn to say, “We built something that’s adding to that spectrum of intelligence.” We should do this not because of what it does or how it helps us, but because intelligence is a beautiful thing.