This hypertext node is downstream of GPT-3 on Coherent Extrapolated Volition.
Coherent Extrapolated Volition was a term developed by Eliezer Yudkowsky while discussing Friendly AI development. It’s meant as an argument that it would not be sufficient to explicitly program what we think our desires and motivations are into an AI, instead, we should find a way to program it in a way that it would act in our best interests – what we want it to do and not what we tell it to.
Related: Friendly AI, Metaethics Sequence, Complexity of Value
In calculating CEV, an AI would predict what an idealized version of us would want, “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”. It would recursively iterate this prediction for humanity as a whole, and determine the desires which converge. This initial dynamic would be used to generate the AI’s utility function.
For example, imagine if an AI asked me what I wanted, and I said
…, “I want to be immortal and live on Mars,” and the AI granted me that. For me, that would not be enough - it would fall short of my CEV. I would continue asking for things, indefinitely. This means that while you and I, when asked what we want, might reply with some trivial or temporary thing, an AI could never do this, as an AI would only be able to calculate what we, the idealized version of us, would want.
…, “I want to write a novel” (this is not a very coherent ‘want’, because one would be hard-pressed to justify that preference on a consequentialist basis, but it’s close enough to CEV for discussion). The AI considers alternative possible me’s, and postulates future technological impacts on me to ask me what I would want ‘if I knew more’. I might respond, with my improved information, “Well, I’
… that I wanted it to write the greatest novel ever, compose the most perfect symphony, and to split the atom. At this point, for it to be a Friendly AI, it would have to extrapolate past my stated desires to determine that I really didn’t want it to deal with dirty details like what genre the novel should be, what instruments should be used in the symphony, or what types of atoms would produce the greatest explosion. Instead I would like these things to be revealed to me
… “I want a large, well-laid out desktop, with creative software and high-end hardware. I don’t care whether the mouse pad is red or blue, or the chair is swivel or fixed, or that there are two monitors instead of one.” If an AI were to design for me exactly what I said I wanted, it wouldn’t actually be acting on my wishes; it would be acting on my description of those wishes.
… “I want 47 more paperclips”. This clearly wouldn’t be enough to calculate what I really want. A “succesful” AGI based on this alone would simply result in an infinitely accelerating runaway train accumulating absurdly powerful weaponry, other AIs, nanotechnology, displacing humanity and building, or enslaving, a paperclip maximizer.
This is something that would presumably be avoided by directly implementing the sort of CEV calculation.