Paper by John Nay: “Artificial Intelligence (AI) capabilities are rapidly advancing. Highly capable AI could cause radically different futures depending on how it is developed and deployed. We are unable to specify human goals and societal values in a way that reliably directs AI behavior. Specifying the desirability (value) of an AI system taking a particular action in a particular state of the world is unwieldy beyond a very limited set of value-action-states. The purpose of machine learning is to train on a subset of states and have the resulting agent generalize an ability to choose high value actions in unencountered circumstances. But the function ascribing values to an agent’s actions during training is inevitably an incredibly incomplete encapsulation of human values, and the training process is a sparse exploration of states pertinent to all possible futures. Therefore, after training, AI is deployed with a coarse map of human preferred territory and will often choose actions unaligned with our preferred paths.
Law-making and legal interpretation form a computational engine that converts opaque human intentions and values into legible directives. Law Informs Code is the research agenda capturing complex computational legal processes, and embedding them in AI. Similar to how parties to a legal contract cannot foresee every potential “if-then” contingency of their future relationship, and legislators cannot predict all the circumstances under which their proposed bills will be applied, we cannot ex ante specify “if-then” rules that provably direct good AI behavior. Legal theory and practice have developed arrays of tools to address these specification problems. For instance, legal standards allow humans to develop shared understandings and adapt them to novel situations, i.e., to generalize expectations regarding actions taken to unspecified states of the world. In contrast to more prosaic uses of the law (e.g., as a deterrent of bad behavior through the threat of sanction), leveraged as an expression of how humans communicate their goals, and what society values, Law Informs Code.
We describe how data generated by legal processes and the practices of law (methods of law-making, statutory interpretation, contract drafting, applications of standards, legal reasoning, etc.) can facilitate the robust specification of inherently vague human goals. This increases human-AI alignment and the local usefulness of AI. Toward society-AI alignment, we present a framework for understanding law as the applied philosophy of multi-agent alignment, harnessing public law as an up-to-date knowledge base of democratically endorsed values ascribed to state-action pairs. Although law is partly a reflection of historically contingent political power – and thus not a perfect aggregation of citizen preferences – if properly parsed, its distillation offers the most legitimate computational comprehension of societal values available. Other data sources suggested for AI alignment – surveys of preferences, humans labeling “ethical” situations, or (most commonly) the implicit beliefs of the AI system designers – lack an authoritative source of synthesized preference aggregation. Law is grounded in a verifiable resolution: ultimately obtained from a court opinion, but short of that, elicited from legal experts. If law eventually informs powerful AI, engaging in the deliberative political process to improve law takes on even more meaning…(More)”.