GPT-3 on GPT-3 on Coherent Extrapolated Volition ::

Bot credits


Coauthor	GPT-3 (davinci)
Contribution	labelled inline
Selectivity	1 : 1 (uncurated)

This hypertext node is downstream of GPT-3 on Coherent Extrapolated Volition. bold is prompt, unformatted is GPT-3.

Coherent Extrapolated Volition is an outer alignment proposal by Eliezer Yudkowsky, in which an AGI is given the objective of predict(ing) what an idealized version of us would want, “if we knew more, thought faster, were more the people we wished we were, had grown up farther together”. An obvious implementation difficulty is how to encode something so abstract and philosphical in the form of a utility function.

‘The main problems with CEV include, firstly, the great difficulty of implementing such a program - “If one attempted to write an ordinary computer program using ordinary computer programming skills, the task would be a thousand lightyears beyond hopeless.”’

Fortunately, the concept is easily conveyed in words, and we have taught AIs to understand words. GPT-3 can elaborate coherently on the concept of CEV and extrapolate volitions for toy examples given two paragraphs of description of what CEV is + whatever preexisting understanding of the concept exists in its weights.

Why is this significant? Not because it’s surprising. CEV is no more complicated than many other things that GPT-3 talks fluently about. It’s significant because before GPT-3, we had no idea how to even begin to instantiate a concept like CEV in an AI - hence the “great difficulty” of implementation. How do we write a utility function that describes predicting what humans would want if they knew what they really wanted? The concepts involved - “human”, “want”, “know”, and “really” - are easy for us to understand but impossible to specify in a programming language. But GPT-3 takes specifications in natural language, which is made of words, which point to nuanced human ideas, ideas like “explicitly”, “best interests”, “idealized”, and “utility function”. Several of GPT-3’s parameters, such as the definition of “human”, the meaning of “want”, and how the AI is supposed to predict what we would want if we knew more, are explicitly open-ended to allow for a huge variety of possibilities. These are the parameters that would inevitably need to be tuned and refined as the AI learns more and gets smarter.

The point is, now that we have GPT-3, we know

… exactly how to instantiate CEV. And, recognizing how useful GPT-3 is going to be, we’ve applied it not only to CEV but to a number of other goals and utilities that we’ve cooked up. And not only did we apply it to the first CEV goal, we applied it to CEV itself, so that it can iteratively dream up smarter CEV versions of CEV.
This cascade of recursive definitions, converging into an AI

… how to space out the control knobs sufficiently that we can imagine eventually acquiring an AI with a utility function for extrapolating the volitions of an idealized version of humanity.
Rather than be the first one to create a utility function that approximated a miracle, I thought it would probably be best if I were one of the last. – Eliezer Yudkowsky, “Coherent Extrapolated Volition”
What does GPT-3 know about humans?

… how to write the utility function, or at least how to describe it in words, in a way that a natural-language-understanding AGI could ingest. It’s a prototype. This isn’t CEV itself, just a first draft. But it’s a huge step forward.
CEV appears to be the long-sought general model of an idealized, intelligently-ai-aligned AGI. Before GPT-3 we had no general

… exactly how to implement CEV in a utility function and we have the only clear framework for the case with any degree of detail. All we need is a scaleable way to encode the concept of CEV in a utility function and the frame of mind of an idealized version of us without losing track of our humanity. One way this can be done is by letting the AI keep in mind things that the idealized version of us would estimate we might not think were important, and be able to predict