Main | Robot Oppression: Unethicality of the Three Laws »
Why We Need Friendly AI
There are certain important things that evolution created. We don’t know that evolution reliably creates these things, but we know that it happened at least once. A sense of fun, the love of beauty, taking joy in helping others, the ability to be swayed by moral argument, the wish to be better people. Call these things humaneness, the parts of ourselves that we treasure – our ideals, our inclinations to alleviate suffering. If human is what we are, then humane is what we wish we were. Tribalism and hatred, prejudice and revenge, these things are also part of human nature. They are not humane, but they are human. They are a part of me; not by my choice, but by evolution’s design, and the heritage of three and half billion years of lethal combat. Nature, bloody in tooth and claw, inscribed each base of my DNA. That is the tragedy of the human condition, that we are not what we wish we were. Humans were not designed by humans, humans were designed by evolution, which is a physical process devoid of conscience and compassion. And yet we have conscience. We have compassion. How did these things evolve? That’s a real question with a real answer, which you can find in the field of evolutionary psychology. But for whatever reason, our humane tendencies are now a part of human nature.
If we do our jobs right, then four billion years from now, some… student… may be surprised to learn that altruism, honor, fun, beauty, joy, and love can arise from natural selection operating on hunter-gatherers. Of course a mind that loves beauty will try to design another mind that loves beauty, but it is passing strange that the love of beauty should also be produced by evolution alone. It is the most wonderful event in the history of the universe - true altruism, a genuine joy in helping people, arising from the cutthroat competition of evolution’s endless war. It is a great triumph, which must not be lost.
That is our responsibility, to preserve the humane pattern through the transition from evolution to recursive self-improvement (i.e., to a mind improving directly its own mind), because we are the first. That is our responsibility, not to break the chain, as we consider the creation of Artificial Intelligence, the second intelligence ever to exist.
People have asked how we can keep Artificial Intelligences under control, or how we can integrate AIs into society. The question is not one of dominance, or even coexistence, but creation. We have intuitions for treating other humans as friends, trade partners, enemies; slaves who might rebel, or children in need of protection. We only have intuitions for dealing with minds that arrive from the factory with the exact human nature we know. We have no intuitions for creating a mind with a humane nature. It doesn’t make sense to ask whether “AIs” will be friendly or hostile. When you talk about Artificial Intelligence you have left the tiny corner of design space where humanity lives, and stepped out into a vast empty place. The question is what we will create within it.
Human is what we are, and humane is what we wish we were. Humaneness is renormalized humanity - humans turning around and judging our own emotions, asking how we could be better people. Humaneness is the trajectory traced out by the human emotions under recursive self-improvement. Human nature is not a static ideal, but a pathway - a road that leads somewhere. What we need to do is create a mind within the humane pathway, what I have called a Friendly AI. That is not a trivial thing to attempt. It’s not a matter of a few injunctions added or a module bolted onto existing code. It is not a simple thing to simultaneously move a morality from one place to another, while also renormalizing through the transfer, but still making sure that you can backtrack on any mistakes. Some of this is very elegant. None of it is easy to explain. This is not something AI researchers are going to solve in a few hours of spare time.
But. I think that if we can handle the matter of AI at all, we should be able to create a mind that’s a far nicer person than anything evolution could have constructed. This issue cannot be won on the defensive. We need to step forward as far as we can in the process of solving it. What we need is not superintelligence, but supermorality, which includes superintelligence as a special case. That’s the pattern we need to preserve into the era of recursive self-improvement.
We have a chance to do that, because we are the first. And we have a chance to fail, because we are the first. There is no fate in this. There is nothing that happens to us, only what we do to ourselves. We may fail to understand what we are building - we may look at an AI design and believe that it is humane, when in fact it is not. If so, it will be us that made the mistake. It will be our own understanding that failed. Whatever we really build, we will be the ones who built it. The danger is that we will construct AI without really understanding it.
How dangerous is that, exactly? How fast does recursive self-improvement run once it gets started? One classic answer is that human research in Artificial Intelligence has gone very slowly, so there must not be any problem. This is mixing up the cake with the recipe. It’s like looking at the physicists on the Manhattan project, and saying that because it took them years to figure out their equations, therefore actual nuclear explosions must expand very slowly. Actually, what happens is that there’s a chain reaction, fissions setting off other fissions, and the whole thing takes place on the timescale of nuclear interactions, which happens to be extremely fast relative to human neurons. So from our perspective, the whole thing just goes FOOM. Now it is possible to take a nuclear explosion in the process of going FOOM and shape this tremendous force into a constructive pattern - that’s what a civilian power plant is - but to do that you need a very deep understanding of nuclear interactions. You have to understand the consequences of what you’re doing, not just in a moral sense, but in the sense of being able to make specific detailed technical predictions. For that matter, you need to understand nuclear interactions just to make the prediction that a critical mass goes FOOM, and you need to understand nuclear interactions to predict how much uranium you need before anything interesting happens. That’s the dangerous part of not knowing; without an accurate theory, you can’t predict the consequences of ignorance.
In the case of Artificial Intelligence there are at least three obvious reasons that AI could improve unexpectedly fast once it is created. The most obvious reason is that computer chips already run at ten million times the serial speed of human neurons and are still getting faster. The next reason is that an AI can absorb hundreds or thousands of times as much computing power, where humans are limited to what they’re born with. The third and most powerful reason is that an AI is a recursively self-improving pattern. Just as evolution creates order and structure enormously faster than accidental emergence, we may find that recursive self-improvement creates order enormously faster than evolution. If so, we may have only one chance to get this right.
It’s okay to fail at building AI. The dangerous thing is to succeed at building AI and fail at Friendly AI. Right now, right at this minute, humanity is not prepared to handle this. We’re not prepared at all. The reason we’ve survived so far is that AI is surrounded by a protective shell of enormous theoretical difficulties that have prevented us from messing with AI before we knew what we were doing.
AI is not enough. You need Friendly AI. That changes everything. It alters the entire strategic picture of AI development. Let’s say you’re a futurist, and you’re thinking about AI. You’re not thinking about Friendly AI as a separate issue; that hasn’t occurred to you yet. Or maybe you’re thinking about AI, and you just assume that it’ll be Friendly, or you assume that whoever builds AI will solve the problem. If you assume that, then you conclude that AI is a good thing, and that AIs will be nice people. And if so, you want AI as soon as possible. And Moore’s Law is a good thing, because it brings AI closer.
But here’s a different way of looking at it. When futurists are trying to convince people that AI will be developed, they talk about Moore’s Law because Moore’s Law is steady, and measurable, and very impressive, in drastic contrast to progress on our understanding of intelligence. You can persuade people that AI will happen by arguing that Moore’s Law will eventually make it possible for us to make a computer with the power of a human brain, or if necessary a computer with ten thousand times the power of a human brain, and poke and prod until intelligence comes out, even if we don’t quite understand what we’re doing.
But if you take the problem of Friendly AI into account, things look very different. Moore’s Law does make it easier to develop AI without understanding what you’re doing, but that’s not a good thing. Moore’s Law gradually lowers the difficulty of building AI, but it doesn’t make Friendly AI any easier. Friendly AI has nothing to do with hardware; it is a question of understanding. Once you have just enough computing power that someone can build AI if they know exactly what they’re doing, Moore’s Law is no longer your friend. Moore’s Law is slowly weakening the shield that prevents us from messing around with AI before we really understand intelligence. Eventually that barrier will go down, and if we haven’t mastered the art of Friendly AI by that time, we’re in very serious trouble. Moore’s Law is the countdown and it is ticking away. Moore’s Law is the enemy.
In Eric Drexler’s Nanosystems, there’s a description of a one-kilogram nanocomputer capable of performing ten to the twenty-first operations per second. That’s around ten thousand times the estimated power of a human brain. That’s our deadline. Of course the real deadline could be earlier than that, maybe much earlier. Or it could even conceivably be later. I don’t know how to perform that calculation. It’s not any one threshold, really - it’s the possibility that nanotechnology will suddenly create an enormous jump in computing power before we’re ready to handle it. This is a major, commonly overlooked, and early-appearing risk of nanotechnology - that it will be used to brute-force AI. This is a much more serious risk than grey goo. Enormously powerful computers are a much earlier application of nanotechnology than open-air replicators. Some well-intentioned person is much more likely to try it, too.
Now you can, of course, give the standard reply that as long as supercomputers are equally available to everyone, then good programmers with Friendly AIs will have more resources than any rogues, and the balance will be maintained. Or you could give the less reassuring but more realistic reply that the first Friendly AI will go FOOM, in a pleasant way, after which that AI will be able to deal with any predators. But both of these scenarios require that someone be able to create a Friendly AI. If no one can build a Friendly AI, because we haven’t figured it out, then it doesn’t matter whether the good guys or the bad guys have bigger computers, because we’ll be just as sunk either way. Good intentions are not enough. Heroic efforts are not enough. What we need is a piece of knowledge. The standard solutions for dealing with new technologies only apply to AI after we have made it theoretically possible to win. The field of AI, just by failing to advance, or failing to advance far enough, can spoil it for everyone else no matter how good their intentions are.
If we wait to get started on Friendly AI until after it becomes an emergency, we will lose. If nanocomputers show up and we still haven’t solved Friendly AI, there are a few things I can think of that would buy time, but it would be very expensive time. It is vastly easier to buy time before the emergency than afterward. What are we buying time for? This is a predictable problem. We’re going to run into this. Whatever we can imagine ourselves doing then, we should get started on now. Otherwise, by the time we get around to paying attention, we may find that the board has already been played into a position from which it is impossible to win.
Comments
Excellent article.
Posted by: Jacob G. at July 30, 2004 03:13 AM
Interesting and important topic. I recently completed a novel dealing with this issue. I thought it would come to a head in about fifty to sixty years, but after reading about some of the recent developments and listening to the futurists, it may be thirty or less. It could become a danger as great as terrorism. I’m afraid good intentions and knowledge won’t be enough - it only takes one misguided person in the right position at the right tine to start the chain reaction. There needs to be serious study of this before we go any further.
Posted by: David P. at August 4, 2004 02:41 AM
Very interesting article and reply, which I am afraid to say leaves me without the slightest trace of hope. Firstly the history of humans suggests to me that new technology is not used altruistically but is used to dominate either politcally or economically even if the inventors were themselves altruistic.
In our current world situation we are dominated by the indutro-military complex and the share price of corporations that are much richer and more influential than most countries. Any new technology as profitable as AI will not be allowed to simply unfold - it will be controlled by those with the greatest resources to benefit themselves, maintain their market share and stifle the competition.
After having worked as a scientist let me say that not all people working in white coats in laboratories are seeking after truth but after funding, patent rights and international recognition just to name a few other motives for scientific enquiry.
Finally who will define morality. We humans can not even decide what is humane. Today there are people in the world who feel that it is morally correct to strap bombs to themseleves and kill innocent bystanders. I am not making a value judgement about this simply using, probably, the most extreme current example about different moral standards. What will the decisions be about much more subtle moral issues i.e. When is it right to turn off a life support machine or is it wrong to even contemplate asking that question?
David P. suggests that even one misguided person could be disasterous but my question is who is going to decide what is misguided. He is right when he says that “There needs to be serious study of this before we go any further” but unfortunately there will not be, study takes time and as we all know time is money.
Posted by: Steve at August 8, 2004 09:41 PM
Interesting article. One thing to consider, though, is that while an AI with perfect morality, the final goal, will be hard tp produce, as Steve says, we don’t need to have the first version perfected. What we need, as I see it, is some degree of humaneness in the AIs that will prevent them from being a threat to humankind. After that we can worry about those moral issues that we ‘can not even decide’.
Posted by: Solarhaphaeriom at August 17, 2004 11:22 AM
Excellent piece.
Admittedly, I’m not massively into AI theories, but your work is still the first I’ve come across which raises the concern about behavioural tendencies in AI, and the need for it to be “Friendly”.
In all honesty, I find it odd that there isn’t more being made of it, especially given the sheer amount of science-fiction which talks about “What happens when the machines go bad”. It’s certainly something that needs to be given more consideration than it has been.
Posted by: Misha at August 22, 2004 10:02 AM
Powered by Movable Type 3.01D
by