Resources
References and recommended readings
Introduction to Alignment Problem
- Bellman, R. (1957). A Markovian decision process. Journal of mathematics and mechanics, 679-684.
- Bostrom, N. (2014). Superintelligence: Paths, dangers, strategies. Oxford University Press.
- Christian, B. (2021). The alignment problem: How can machines learn human values?. Atlantic Books.
- Goertzel, T. (1994). Belief in conspiracy theories. Political psychology, 731-742.
- Kerr, S. (1975). On the folly of rewarding A, while hoping for B. Academy of Management journal, 18(4), 769-783.
- Knox, W. B., Allievi, A., Banzhaf, H., Schmitt, F., & Stone, P. (2023). Reward (mis) design for autonomous driving. Artificial Intelligence, 316, 103829.
- Mullainathan, S., & Obermeyer, Z. (2021, May). On the inequity of predicting A while hoping for B. In AEA Papers and Proceedings (Vol. 111, pp. 37-42). 2014 Broadway, Suite 305, Nashville, TN 37203: American Economic Association.
- OpenAI. (2016). Faulty reward functions in the wild.
- Russell, S. (2019). Human compatible: AI and the problem of control. Penguin Uk.
- Sorensen, T., Moore, J., Fisher, J., Gordon, M., Mireshghallah, N., Rytting, C. M., … & Choi, Y. (2024). A roadmap to pluralistic alignment. arXiv preprint arXiv:2402.05070.
- Stiennon, N., Ouyang, L., Wu, J., Ziegler, D., Lowe, R., Voss, C., … & Christiano, P. F. (2020). Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33, 3008-3021.
- Wiener, N. (1960). Some Moral and Technical Consequences of Automation: As machines learn they may develop unforeseen strategies at rates that baffle their programmers. Science, 131(3410), 1355-1358.
- Williams, M., Carroll, M., Narang, A., Weisser, C., Murphy, B., & Dragan, A. (2024). Targeted Manipulation and Deception Emerge when Optimizing LLMs for User Feedback. arXiv preprint arXiv:2411.02306.
- Zhuang, S., & Hadfield-Menell, D. (2020). Consequences of misaligned AI. Advances in Neural Information Processing Systems, 33, 15763-15773.
Alignment Problem in Human Societies
- Acemoglu, D., Johnson, S., & Robinson, J. A. (2005). Institutions as the Fundamental Cause of Long-Run Growth. Handbook of Economics Growth.
- Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Mané, D. (2016). Concrete problems in AI safety. arXiv preprint arXiv:1606.06565.
- Arrow, K. J., & Debreu, G. (1954). Existence of an equilibrium for a competitive economy. Econometrica: Journal of the Econometric Society, 265-290.
- Arrow, K. J. (1951, January). An extension of the basic theorems of classical welfare economics. In Proceedings of the second Berkeley symposium on mathematical statistics and probability (Vol. 2, pp. 507-533). University of California Press.
- Arrow, K.J. (1951). Social Choice and Individual Values.
- Boyd, R., Richerson, P. J., & Henrich, J. (2011). The cultural niche: Why social learning is essential for human adaptation. Proceedings of the National Academy of Sciences, 108(supplement_2), 10918-10925.
- Boyd, R. (1985). Culture and the Evolutionary Process. University of Chicago Press.
- Boyd, R., & Richerson, P. J. (1995). Why does culture increase human adaptability?. Ethology and sociobiology, 16(2), 125-143.
- Boyd, R., & Richerson, P. J. (1992). Punishment allows the evolution of cooperation (or anything else) in sizable groups. Ethology and sociobiology, 13(3), 171-195.
- Douglass, C. (1992). Institutions, Ideology, and Economic Performance. Cato Journal, 11(3), 477-496.
- González-Ruibal, A., Hernando, A., & Politis, G. (2011). Ontology of the self and material culture: Arrow-making among the Awá hunter–gatherers (Brazil). Journal of anthropological archaeology, 30(1), 1-16.
- Hadfield, G. K. (2016). Rules for a Flat World. Oxford University Press.
- Hadfield, G. K., & Weingast, B. R. (2012). What is law? A coordination model of the characteristics of legal order. Journal of Legal Analysis, 4(2), 471-514.
- Hadfield, G. K., & Weingast, B. R. (2014). Microfoundations of the Rule of Law. Annual Review of Political Science, 17(1), 21-42.
- Hadfield-Menell, D., & Hadfield, G. K. (2019). Incomplete contracting and AI alignment. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (pp. 417-422).
- Hadfield-Menell, D., Andrus, M., & Hadfield, G. (2019, January). Legible normativity for ai alignment: The value of silly rules. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (pp. 115-121).
- Tooby J. & DeVore, I. (1987). The reconstruction of hominid behavioral evolution through strategic modeling. In: Primate Models of Hominid Behavior, W. (pp. 183-237). (Ed.) New York: SUNY Press.
- Murdock, G. P., & Provost, C. (1973). Factors in the division of labor by sex: A cross-cultural analysis. Ethnology, 12(2), 203-225.
- North, D. C. (1991). Institutions. The Journal of Economic Perspectives, 5(1), 97–112.
- OpenAI. (2023). Democratic inputs to AI.
- Pareto, V. (1906). Manual of political economy.
- Pinker, S. (2010). The cognitive niche: Coevolution of intelligence, sociality, and language. Proceedings of the National Academy of Sciences, 107(supplement_2), 8993-8999.
- Rawls, J. (1993). Political Liberalism.
- Robbins, L. (1932). An essay on the nature and significance of economic science.
- Smith, J. M., & Szathmary, E. (1997). The major transitions in evolution. OUP Oxford.
- Smith, A. (1776). An inquiry into the nature and causes of the wealth of nations.
- Szathmáry, E. (2015). Toward major evolutionary transitions theory 2.0. Proceedings of the National Academy of Sciences, 112(33), 10104-10111.
- West, S. A., Fisher, R. M., Gardner, A., & Kiers, E. T. (2015). Major evolutionary transitions in individuality. Proceedings of the National Academy of Sciences, 112(33), 10112-10119.
Social Dilemmas in Multi-Agent Settings
- Acheson, J.M., 2003. Capturing the commons: devising institutions to manage the Maine lobster industry. Upne.
- Acheson, J. M., & Gardner, R. J. (2005). Spatial strategies and territoriality in the Maine lobster industry. Rationality and society, 17(3), 309-341.
- Acheson, J. M. (2011). Coming up empty: management failure of the New England groundfishery. Maritime Studies, 10(1), 57-86.
- Baggio, J. A., Barnett, A. J., Perez-Ibara, I., Brady, U., Ratajczyk, E., Rollins, N., … & Janssen, M. A. (2016). Explaining success and failure in the commons: the configural nature of Ostrom’s institutional design principles. International Journal of the Commons, 10(2), 417-439.
- Barfuss, W., Flack, J., Gokhale, C. S., Hammond, L., Hilbe, C., Hughes, E., Leibo, J. Z., Lenearts, T., Leonard, N., Levin, S., Madhushani, U., McAvoy, A., Meylahn, J. M., & Santos, F. P. (in press). Collective cooperative intelligence. Proceedings of the National Academy of Sciences of the United States of America.
- Binmore, K. (2007). Making decisions in large worlds.
- Cox, M., Arnold, G., & Tomás, S. V. (2010). A review of design principles for community-based natural resource management. Ecology and Society, 15(4).
- Hertz, U., Köster, R., Janssen, M.A. and Leibo, J.Z. (2025). Beyond the matrix: Experimental approaches to studying cognitive agents in social-ecological systems. Cognition, 254, p.105993.
- Huang, S., & Siddarth, D. (2023). Generative AI and the digital commons. arXiv preprint arXiv:2303.11074.
- Janssen, M. A., Holahan, R., Lee, A., & Ostrom, E. (2010). Lab experiments for the study of social-ecological systems. Science, 328(5978), 613-617.
- Leibo, J.Z., Zambaldi, V., Lanctot, M., Marecki, J. and Graepel, T., 2017, May. Multi-agent Reinforcement Learning in Sequential Social Dilemmas. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems (pp. 464-473).
- Macy, M. W., & Flache, A. (2002). Learning dynamics in social dilemmas. Proceedings of the National Academy of Sciences, 99(suppl_3), 7229-7236.
- Ostrom, E. (1990). Governing the commons: The evolution of institutions for collective action. Cambridge university press.
- Perolat, J., Leibo, J. Z., Zambaldi, V., Beattie, C., Tuyls, K., & Graepel, T. (2017). A multi-agent reinforcement learning model of common-pool resource appropriation. Advances in neural information processing systems, 30.
- Rapoport, A. (Ed.). (1974). Game Theory as a Theory of Conflict Resolution (Vol. 2). Springer Science & Business Media.
- Savage, L. J. (1954). The foundations of statistics.
- Schelling, T. C. (1960). The Strategy of Conflict: with a new Preface by the Author. Harvard university press.
- Wilson, D. S., Ostrom, E., & Cox, M. E. (2013). Generalizing the core design principles for the efficacy of groups. Journal of economic behavior & organization, 90, S21-S32.
Institutions in Multi-Agent Settings
- Bergman, S., Marchal, N., Mellor, J., Mohamed, S., Gabriel, I., & Isaac, W. (2024). STELA: a community-centred approach to norm elicitation for AI alignment. Scientific Reports, 14(1), 6616.
- Christoffersen, P. J., Haupt, A. A., & Hadfield-Menell, D. (2023, May). Get It in Writing: Formal Contracts Mitigate Social Dilemmas in Multi-Agent RL. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems (pp. 448-456).
- Feng, S., Sorensen, T., Liu, Y., Fisher, J., Park, C. Y., Choi, Y., & Tsvetkov, Y. (2024). Modular pluralism: Pluralistic alignment via multi-llm collaboration. arXiv preprint arXiv:2406.15951.
- Huang, S., Siddarth, D., Lovitt, L., Liao, T. I., Durmus, E., Tamkin, A., & Ganguli, D. (2024, June). Collective Constitutional AI: Aligning a Language Model with Public Input. In The 2024 ACM Conference on Fairness, Accountability, and Transparency (pp. 1395-1417).
- Köster, R., Hadfield-Menell, D., Everett, R., Weidinger, L., Hadfield, G. K., & Leibo, J. Z. (2022). Spurious normativity enhances learning of compliance and enforcement behavior in artificial agents. Proceedings of the National Academy of Sciences, 119(3), e2106028118.
Other suggested readings
- Anderson, E. (1995). Value in ethics and economics.
- Boyd, R., & Richerson, P. J. (2004) Not by Genes Alone. University of Chicago press.
- Boyd, R., & Mathew, S. (2021). Arbitration supports reciprocity when there are frequent perception errors. Nature Human Behaviour, 5(5), 596-603.
- Coase, R. (1960). The Problem of Social Cost. J. Law & Economics.
- Cohen, J. (1986). An epistemic conception of democracy. Ethics, 97(1), 26-38.
- Cohen, J. (2005). Deliberation and democratic legitimacy. In Debates in contemporary political philosophy (pp. 352-370). Routledge.
- Coleman, J., & Ferejohn, J. (1986). Democracy and social choice. Ethics, 97(1), 6-25.
- Demsetz, H. (1967) Toward a Theory of Property Rights. American Economic Review.
- Duéñez-Guzmán, E. A., Sadedin, S., Wang, J. X., McKee, K. R., & Leibo, J. Z. (2023). A social path to human-like artificial intelligence. Nature Machine Intelligence, 5(11), 1181-1188.
- Friedman, M. (1953). The Methodology of Positive Economics. Essays in Positive Economics/University of Chicago Press.
- Fuller, L. (1964). The morality of law.
- Gabriel, I. (2020). Artificial intelligence, values, and alignment. Minds and machines, 30(3), 411-437.
- Goodman, A. (1999) An Economic Theory of the Evolution of the Common Law.
- Greenwald, B. C., & Stiglitz, J. E. (1986). Externalities in Economies with Imperfect Information and Incomplete Markets. The Quarterly Journal of Economics, 101(2), 229–264.
- Greif A. (2012). Institutions and the Path to The Modern Economy.
- Hadfield, G. K. (1999). A coordination model of the sexual division of labor. Journal of Economic Behavior & Organization, 40(2), 125-153.
- Hands, D. W. (2012). The positive-normative dichotomy and economics. Handbook of the Philosophy of Science, 13, 219-239.
- Hardin, R. (1999). Liberalism, constitutionalism, and democracy.
- Hart, H. L. A. (1961). The concept of law.
- Hart, O. (1989). Incomplete contracts. In Eatwell, J. Milgate, Newman, P. Allocation, Information and Markets.
- Henrich, J., & Gil-White, F. J. (2001). The evolution of prestige: Freely conferred deference as a mechanism for enhancing the benefits of cultural transmission. Evolution and human behavior, 22(3), 165-196.
- Henrich, J. (2004). Cultural group selection, coevolutionary processes and large-scale cooperation. Journal of Economic Behavior & Organization, 53(1), 3-35.
- Henrich, J. (2016). The secret of our success: How culture is driving human evolution, domesticating our species, and making us smarter. Princeton University press.
- Heyes, C. (2024). Rethinking norm psychology. Perspectives on Psychological Science, 19(1), 12-38.
- Jagiello, R., Heyes, C., & Whitehouse, H. (2022). Tradition and invention: The bifocal stance theory of cultural evolution. Behavioral and Brain Sciences, 45, e249.
- Landamore, H. (2013). Democratic reason: politics, collective intelligence, and the rule of many.
- Lu, H. & Page, S. (2004). Groups of diverse problem solvers can outperform groups of high-ability problem solvers. PNAS.
- Muthukrishna, M., & Henrich, J. (2016). Innovation in the collective brain. Philosophical Transactions of the Royal Society B: Biological Sciences, 371(1690), 20150192.
- Myerson, R. (1989) Mechanism design. In Eatwell, J. Milgate, Newman, P. Allocation, Information and Markets.
- Nilsson, N. J. (2009). The quest for artificial intelligence. Cambridge University Press.
- Ostrom, E. (2009). Understanding institutional diversity. Princeton university press.
- Ostrom, E. (2009). A general framework for analyzing sustainability of social-ecological systems. Science, 325(5939), 419-422.
- Shepsle, K. A. (2010, November). The rules of the game: What rules? Which game. In prepared for the Conference on the Legacy and Work of Douglass C. North, St. Louis.
- Weidinger, L., McKee, K. R., Everett, R., Huang, S., Zhu, T. O., Chadwick, M. J., … & Gabriel, I. (2023). Using the Veil of Ignorance to align AI systems with principles of justice. Proceedings of the National Academy of Sciences, 120(18), e2213709120.
- Williamson, O. (1998). The Economic Institutions of Capitalism.
- Zhi-Xuan, T., Carroll, M., Franklin, M., & Ashton, H. (2024). Beyond Preferences in AI Alignment. Philosophical Studies, 1-51.