Description and outline

The tutorial will cover major ideas from economics, law, political science, and cultural evolutionary theory about how societies align individual actions with total or group social welfare. We’ll then introduce some promising avenues by which these ideas could be integrated into new approaches for AI safety and alignment.

  1. General equilibrium theory, welfare economics
    • Fundamental theorems of welfare economics (proof that perfectly competitive markets maximize a (utilitarian) social welfare function and achieve the same distribution of resources and goods as an omniscient benevolent social planner; and that any desired final distribution can be achieved through a redistribution of initial endowments); how these theorems guide analysis of real (ie not competitive) markets; limitations and critiques of utilitarian welfare functions
    • The role of legal rules in structuring markets (property, contract) correcting market failure (tort, regulation) to realign individual incentives to social welfare (internalizing externalities)
  2. Democratic/political theory
    • Alternative mechanisms for aggregating preferences
    • Substantive normative commitments (eg equality, dignity, autonomy) and the limitations of the concept of preferences
    • Democratic institutions beyond voting (courts, administration)
    • Overlapping consensus and pluralism (Rawls political liberalism)
  3. Principal-agent theory and mechanism design
    • How incentives of agents are aligned to principals in light of asymmetric information and self-interest using contracts and mechanisms (structures that implement optimal payoff structures, as in auctions); core mathematical results
    • How incompleteness of contracts impacts optimal mechanism design
    • Designing mechanisms for multiple principals
  4. The evolution of human cooperation
    • Evidence of human ultra-sociality and cooperation relative to other mammals
    • Does evolution of human cooperation require particular human psychological traits such as desire for prestige or conformity?
    • Evolutionary game theoretic models for the evolution of cooperation based on third-party punishment
  5. Applications to AI
    • Examples of how the above ideas have been/can be integrated into AI design, deployment and governance:
      • Multi-agent contracting
      • Consensus models
      • Classification institutions for MARL & Gen AI agents

Goals

Participants will come away with a sophisticated understanding of how other disciplines in the social sciences analyze alignment in human societies with a focus not on aligning individuals but rather on aligning group behavior with overall group welfare. Participants will leave with an understanding of how theories of institutions, cooperation, norms, social dilemmas, and contracts have been used to understand mechanisms like regulated markets that encourage self-interested individuals to generate wealth while not imposing excessive costs (externalities) on others, and how these approaches explain how institutions like markets can harness individual-level misalignment to advance social welfare. Participants will also be exposed to the idea that different alignment problems call for different solutions: managing a forest is different from managing a fishery, and both differ from social media and AI. In all domains, governance must be tailored to the problem at hand, and AI is no different. Our expected learning outcomes are that participants will gain a more sophisticated understanding of concepts such as preferences, pluralism and institutions, with knowledge of core results and references from the literatures in economics, law, political theory and human cultural evolutionary theory. We aim to ground the next generation of alignment work solidly in these more sophisticated concepts and encourage further advancements through collaboration and engagement between these disciplines and computer science.