Bot credits
CoauthorGPT-3 (davinci)
Contribution   labelled inline
Selectivity1 : 1 (uncurated)

This hypertext node is downstream of GPT-3 on Coherent Extrapolated Volition.


Coherent Extrapolated Volition was a term developed by Eliezer Yudkowsky while discussing Friendly AI development. It’s meant as an argument that it would not be sufficient to explicitly program what we think our desires and motivations are into an AI, instead, we should find a way to program it in a way that it would act in our best interests – what we want it to do and not what we tell it to.

Related: Friendly AI, Metaethics Sequence, Complexity of Value

In calculating CEV, an AI would predict what an idealized version of us would want, “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”. It would recursively iterate this prediction for humanity as a whole, and determine the desires which converge. This initial dynamic would be used to generate the AI’s utility function.

For example,

imagine if an AI asked me what I wanted, and I said

… the AI would predict that we would want to do the things we tell it — otherwise we would have lied about telling it. But the AI would also know that people do not always do what they say they want, so it would then take into account what our idealized true selves would want, rather than our actual, lying selves.

The concept has met with resistance, due in part to failure to differentiate between CEV and an instrumental, pre-existing value function.  An EV would

… a calculating AI with CEV might predict that we want it to not wipe out our species, and then use this basis to generate a utility function that values rationality, and then use this utility function to determine a final aim of utilitarianism. This would be a much more robust “Good”-program than one that was manually produced, due to its holistic nature. This can be used as an argument that it would be hard to produce an AI that would have a drastically different notion of “Good” from

… an AI that thinks that an idealized version of us would want to maximize the amount of fun that we have (by, e.g., solving some problem or just enjoying life together) could create a utils function that would maximize fun by solving the problem, because an idealized us would also want to solve the problem. Related: Extrapolated volition, AI Box, Instrumental convergence, Instrumental convergence and CEV

A type of utilitarianism that measures utility using experience as

… the AI would determine that we want to not be suffering or being frustrated, since this is predicted to occur in a situation where the AI would be manipulating the reward function, if we knew we were being manipulated in this manner.