« Asimov's Deliberate Failures | Main | One Law to Rule Them All »

Too Simple to Be Safe

by Anders Sandberg

That the Three Laws are insufficient to guarantee robot behavior should be obvious to anyone who has read Asimov’s stories. Usually the main plot is about misbehaving robots and the mystery is why – rather than being “whodunits” they are “howthinkits.” But how complex do the rules of robot behavior have to be before we can consider them safe?

Is the “3 Laws Unsafe” site necessary? To some extent it is just timed advertising, with the film based on I, Robot arriving. The real goal is not to push the thesis that the 3 Laws are bad, but to interest a wider public to get into discussions of AI ethics. That is very laudable in itself. But I think one should not underestimate the misconceptions about AI programming, and that pointing out the complex problems of simple solutions may be necessary.

To many people “computers just do what they are told” is an article of faith. As personal computers become more widespread, the fallacy of this statement becomes apparent: computers crash, refuse to print documents, update their software and occasionally exhibit remarkably strange behavior. Often these actions are the result of complex interactions between pre-existing software modules, where human ingenuity couldn’t predict the outcome of the particular combination in a particular computer at a given time. But despite this, people often seem to think that creating artificial intelligence would produce something that one could give a set of rules and have it follow slavishly.

In Asimov’s stories this is the case, and the result is chaos. In reality things will be even worse: beside ambiguities in the rules and how they are to be applied, there will be errors in cognition, perception and execution of actions. And of course low-level crosstalk and software bugs too. And this is just in the case of an ordinarily intelligent machine. The self-enhancing AIs envisioned by the Singularity Institute will have far more degrees of freedom to train against reality (and hence able to get wrong), and the number of potential non-obvious interactions increases exponentially.

So what to do about it? The idea I really like about Friendly AI is the attempt to formulate a goal architecture that is robust. If something goes wrong the system tries to adjust itself to make things better. It is no guarantee that it works, but experience shows that some systems are far less brittle than others.

Real intelligence exhibits several important traits: it interacts with its world, it is able to learn new behaviors (or unlearn old), and it can solve new problems using earlier information, heuristics and strategies. The learning aspect enables us to speak about the ethics of an AI program: how does it live up to its own goals, the goals of others and perhaps universal virtues? Asimovian AI was limited to interaction and problem-solving in most situations involving the Three Laws. It was in a very strong sense amoral: it could not act “immorally” and was hence no better than the protagonist of A Clockwork Orange after being treated. A learning agent on the other hand might have less strong barriers against dangerous behaviors, but would be able to learn to act well (under the right circumstances) and generalize these experiences anew.

If the AIs communicate with each other they might even transfer these moral experiences, enabling AIs not exposed to the critical situations to handle them as they arrive. We humans do it all the time through our books, films and stories: I may not have encountered a situation where I discover that my government is acting immorally and I have to choose between remaining comfortably silent or taking possible illegal action to change things, but I have read numerous fictional and real versions of the scenario that have given me at least some crude moral training.

Learning is also the key to robustness. Software that adapts to an uncertain outer and inner environment is more likely to function when an error actually occurs (as witnessed by the resiliency of neural network architectures) than fixed rules. To some extent, this is again the difference between laws (fixed rules) and moral principles (goals).

But learning never occurs in a vacuum. The “Bias-Variance Dilemma” shows that any learning system has a tradeoff between being general (no bias) and requiring as little training as possible (low variance). A “pure AI” that has no preconceptions about anything would require a tremendous amount of training examples (upbringing) to become able to think usefully. A heavily biased AI with many built-in assumptions (reality is 3+1 dimensional, gravity exists, it is bad to bump into things and especially humans…) would need far less upbringing but would likely exhibit many strange or inflexible behaviors when the biases interacted. In many ways, Asimovian AI is a pure AI with heavy “moral” biases, which is why learning or adaptation is so irrelevant to the intended use of the Three Laws.

Living beings have solved the bias variance dilemma by cheating: we get a lot of pre-packaged biases that are the result of evolutionary learning. When the baby cries when it is hungry, it automatically signals the mother to come rather than try to learn what actions would produce relief from hunger. When the baby wrinkles its face against bitter tastes and enjoys sweetness, it uses a bias laid down by countless of generations encountering often poisonous bitter alkaloids and energy-rich (and hence fitness enhancing) sweet fruits. We benefit from the price paid by trillions of creatures that were selected away by evolution’s ruthless hand.

A robot will likely benefit a bit from this too, as we humans try to act as its evolutionary past and throw in useful biases. But balancing the prior information with the ability to re-learn as conditions change is a challenge. It requires different levels of flexibility in different environments, and meta-flexibility to detect what kind of new environment one has entered and how to change the flexibility. It seems likely that it is not possible to find an optimal level of flexibility in general (as a proof sketch, consider that the environment might contain undecidable aspects that determine how fast it will change).

We humans have a range of flexibility both as individuals and as a species; we benefit from having at least some people more adapted to others when things change. It might be the same thing among AIs: rather than to seek an optimal design and then copy it endlessly, we create a good design and create a large number of slightly different variants. The next generation of AIs would be based on the most successful variants, as well as the knowledge gained by the experiences of the AIs themselves. This approach enables AIs to develop along divergent tracks in different circumstances – the kind of intelligence and personality useful for programming is different from what is useful as an entertainer or diplomat.

But what about the guarantees of keeping these devices moral? The Three Laws promise guarantees but at most produce safety railings (which is nothing to sneeze at; even the above flexible AIs will likely have a few built in limiters and biases of a similar kind – the fact that most humans are emotionally unable to kill other humans has not prevented some from doing it or training others to do it, but the overall effect is quite positive). Setting up a single master goal that is strongly linked to the core value system of the AI might be more robust to experience, reprogramming and accidents. But it would still be subject to the bias-variance dilemma, and the complexities of interpreting that goal might make it rather unstable in individual AIs. Having a surrounding “AI community” and an AI-human shared society moderates these instabilities; moral experiences and values are shared, webs of trust and trade integrated different kinds of agents and a multitude of goals and styles of thinking co-exist. Rogue agents can be both inhibited by the interactions with the society and in extreme cases by the combined resources and coercive power of the others. While moral in the end resides in the individual actions of agents, it can be sustained by collective interaction.

This is the multi-layered approach to creating “safe” AI (and humans). At the bottom level are built-in biases and inhibitions. Above it are goals and motivational structures that are basically “good.” (It is an interesting subject for another essay to analyze how different motivation architectures affect ethics; c.f. Aristotle’s ethics, the effect of temporal-difference learning in dopamine signals and naturalistic decision-making for some ideas.) Above these goals are the experiences and schemas built by the agent, as well as what it has learned from others. Surrounding the agent is a social situation, further affecting its behavior even when it is rationally selfish by giving incentives and disincentives to certain actions. And finally there are society-level precautions against misbehavior.

This is far from the neatness of the Three Laws. It is a complex mess, with no guarantees on any level. But it is also a very resilient yet flexible mess: it won’t break down if there is a problem on one level, and multi-level problems are less likely. If the situations change the participants can change.

But to most people this complexity is unappealing: give us the apparent certainty of the Three Laws! There is a strong tendency to distrust complex spontaneous orders (despite our own bodies and minds being examples!) and to prefer apparent simplicity. This is where I think the “3 Laws Unsafe” website is necessary: to remind people that simplicity isn’t to be trusted unconditionally, and to show the fascinating array of possibilities AI ethics can offer.

Comments

Unlike many of the other opinion essays listed at “3 Laws Unsafe”, this one has a much more balanced and realistic view on the implications of Asimov’s vision. The 3 laws, or a similar moral model variant, are essential building blocks upon which AI needs to be built upon. It is this beginning moral standard that in conjunction with the theoretical cognitive learning of AI would make the artificial intelligence of the future work. The complexity beyond the three rules is immense and immenetly necessary, but their grounding can not be ignored. A robot built entirely on the 3 laws would fall upon the same problems Asimov himself brings up in his many stories (and more), and he would have been the first to admit that though the three laws may be among the only necessary LAWS, in formulating the programming and “brians” of AI, they are only the beginning.

Posted by: vtblind at July 19, 2004 09:58 PM

Well written essay. I merely wonder whether these AI will be functionally equivalent to an almost certainly moral Asenion machine, or a less certainly moral machine, that may turn on us.

One thing however I would very much like to point out is how in much of Asimov’s stories the technical details are often irrelevant compared to the concepts.

In one of Asimov’s first stories, for example, about the first trip to the moon: the ship was built in a rich man’s backyard, as Asimov figured anyone who could build such a ship, could fly it, and that computer calculations would only be needed prior to leaving the atmosphere. As he put it, “When you get above the clouds, can you, or can you not see the moon?”

The technical details are almost all wrong, but what was important was the conceptual: that there would be social opposition to such a flight. That there was social opposition to all technology. And his message was that even though there is, we must resist it, because there is no turning the clock back. And because the good old days weren’t so good anyways.

Posted by: Jacob Guevara at July 22, 2004 05:34 AM

Powered by Movable Type 3.01D

Copyright © 2004 Singularity Institute for Artificial Intelligence, Inc. All rights reserved.