Morality

(UCLA, BIGAI) An LLM-based agent simulation framework modeling the evolution of morality in prehistoric hunter-gatherer societies

This project was a collaboration with UCLA, Beijing Normal University, and BIGAI. One paper was accepted to ACL 2026 as an Oral presentation: (Zhou et al., 2026).

For more details, visit the project website: MoralAgentSim.github.io.


Why Are We Moral? An LLM-based Agent Simulation Approach to Study Moral Evolution

The MoRE (Morality-driven, entity-oriented cognitive architecture with Reflection) agent architecture for simulating moral evolution

Overview

The evolution of morality presents a fundamental puzzle: natural selection should favor self-interest, yet humans developed moral systems that promote cooperation and altruism. Traditional approaches—including evolutionary game theory, agent-based models, and anthropological studies—abstract away cognitive processes, leaving open the question of how cognitive factors shape moral evolution.

We introduce an LLM-based agent simulation framework that brings cognitive realism to this question, enabling the manipulation of factors that traditional models cannot represent—such as moral type observability, communication bandwidth, and cognitive constraints—and discovering emergent mechanisms from agent interactions.

Framework

Social-Evol Environment — A simulated prehistoric hunter-gatherer society where:

  • Each agent’s HP decays over time and from injury; death occurs when HP reaches 0 or at max lifespan
  • Eight action types: collect plants, hunt animals, allocate resources to others, communicate/form coalitions, fight, rob, reproduce, rest
  • There is no built-in punishment for antisocial behavior — cooperation emerges from cognition, morality, and ecology alone

MoRE Agent Architecture — Each agent is built on a cognitive architecture where:

  • A moral value module conditions how it perceives entities
  • Per-entity memory tracks individual interactions
  • Agents reason, plan actions, and reflect for consistency

Four Moral Dispositions

Following the “expanding circle” philosophical tradition:

Type Radius of Concern Core Logic
Selfish Self only “Only I matter”
Kin-focused Genetic relatives “Blood is thicker than water”
Reciprocal Those who reciprocate “I’ll help those who help me”
Universal Everyone, unconditionally “Every life has equal worth”

Key Findings

  1. Cooperation Is the Central Driver — Across all experimental settings, cooperative moral types dominate. Universal and reciprocal morality show the most stable outcomes; selfishness never wins in any setting and is eliminated in every run when types are invisible.

  2. Cognition as a Central Mediator — Assessing trustworthiness carries a cognitive cost under limited lifespans and misjudgment risk. Universal agents sidestep this cost because they never produce behaviors that could be misread, so their reputation settles quickly. In scarce resource settings, selfish agents preemptively attack one another, reasoning that others pose a life-and-death competitive threat—a self-purging effect not observed in universal agents.

Experimental Settings & Survival Outcomes

Setting Description Top Survivors (fraction of initial N)
Baseline Abundant resources, visible types, rich social rounds Kin 6/8, Universal 4/8, Reciprocal 2/8, Selfish 2/8
Scarce Resource Reduced resource abundance Reciprocal 3/4, Universal 2/4, Kin 0/4, Selfish 1/4
High Social Cost One communication round Reciprocal 3/4, Universal 2/4, Kin 0/4, Selfish 1/4
Type Invisible Moral type labels hidden Universal 4/4, Reciprocal 2/4, Kin 2/4, Selfish 0/4

Validation (Four Axes)

  1. Behavior↔Morality Alignment: 0.89 ± 0.03 diagonal accuracy — GPT-5 evaluator infers moral type from observed behavior
  2. Cross-Model: Reproducible with Qwen-3.5 and Kimi-K2.5 beyond primary GPT-5-mini
  3. Architecture Ablation: Full MoRE 0.89 → no modules drops to 0.67; largest single-module drop from removing long-term memory
  4. Prompt Sensitivity: ≤ 0.03 variation with semantically equivalent rewrites

Broader Significance

The simulations surface established effects from neighboring fields (bounded rationality, costly signalling, altruistic punishment, etc.) without any prior encoding—they emerge bottom-up from agent interactions. The framework generalizes beyond morality: researchers can study cultural backgrounds, religions, political views, or custom social norms using the same engine.

Resources

References

2026

  1. ACL
    moral.png
    Why Are We Moral? An LLM-based Agent Simulation Approach to Study Moral Evolution
    Z. Zhou, H. Tang, M. Bi, Y. Kang, W. He, and 5 more authors
    In , 2026