SuperIntelligence: Why the Future of AI is a File System (CORAL), Setup & Implementation Guide inside.

If You Like Our Meta-Quantum.Today, Please Send us your email.

📌 Introduction

A widely held assumption: that making AI smarter means training smarter models. The presenter argues that the AI research community has quietly pivoted in a different direction entirely — not improving the LLM itself, but engineering an increasingly sophisticated environment around it. The centerpiece of this argument is CORAL (Collaborative Open-ended Reinforcement Agent Learning), a new autonomous multi-agent infrastructure published jointly by MIT, Stanford, NUS, McGill, Meta, Microsoft, and Amazon. Rather than calling this progress “AI,” It coins the term ADI — Advanced Intelligence — to describe intelligence that lives in the harness, not the model. Watch the video. Here is the Complete Setup & Implementation Guide.

🆕 New Features and Core Concepts

CORAL is described not as a mere agent framework but as a complete technological infrastructure for self-evolving multi-agent systems. Several key innovations stand out:

1. Hierarchical File System as Shared Memory

CORAL replaces the concept of internal model memory with a standardized, persistent, shared file system. All agents — typically 4 to 8 running in parallel — write to and read from a common .coral/public directory. Three artifact types form the backbone:

  • Attempts — JSON logs of every function evaluation, keyed by git commit hash
  • Notes — Markdown files where agents write textual hypotheses about the problem space
  • Skills — Reusable, abstracted code modules extracted from successful runs

2. Isolated Git Worktrees Per Agent

Each agent operates on its own local git worktree, ensuring its exploratory changes don’t corrupt the shared global state. Coordination is achieved entirely through the shared file system — no message queues, no centralized orchestrator.

3. Heartbeat Intervention Protocol

To prevent agents from getting trapped in local minima (a key failure mode in optimization), CORAL introduces a background asynchronous heartbeat:

  • Interval trigger — After N evaluations, forces each agent to synthesize its notes into abstracted skills, effectively rewriting the starting point.
  • Plateau trigger — Detects stagnation and commands agents to attempt a mathematically orthogonal approach, conceptually equivalent to thermal noise injection in simulated annealing.

4. Gradient-Free, Inference-Time Operation

Critically, no model weights are ever updated. Claude Opus 4 and Minimax M2.5 are used as frozen reasoning engines. All “learning” is in-context memory accumulation via the file system — making CORAL a gradient-free search algorithm operating entirely at inference time.

5. Custom Task Grader

CORAL requires a task-specific evalgrader.py that functions as a reward function — returning a score (Boolean or scalar) for each agent submission, replacing the need for a human-in-the-loop or LLM-as-judge.

6. CORAL.md Agent Instruction File

Each agent’s workspace contains a CORAL.md file — the deterministic instruction prompt defining the agent’s orientation workflow, ground rules, and behavioral constraints. The presenter draws a direct parallel to CLAUDE.md files in Claude Code’s ecosystem.

Here is a comprehensive, step-by-step guide to building the CORAL infrastructure from scratch, sourced directly from the official repository and paper.

🪸 CORAL: Complete Setup & Implementation Guide

Repository: https://github.com/Human-Agent-Society/CORAL
Paper: https://arxiv.org/abs/2604.01658

Part 1 — Understanding the Architecture First

Before touching a single file, you need to internalize the mental model. CORAL is an infrastructure for building organizations of autonomous AI agents that run experiments, share knowledge, and continuously improve solutions. Give it a codebase and a grading script, and CORAL handles the rest: isolated workspaces, safe evaluation, persistent shared knowledge, and multi-agent collaboration to enable robust evolution.

The entire system rests on five interlocking concepts:

┌─────────────────────────────────────────────────────┐
│                  CORAL ARCHITECTURE                 │
│                                                     │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐           │
│  │ Agent 1  │  │ Agent 2  │  │ Agent N  │  (frozen  │
│  │ (Claude  │  │ (Claude  │  │ (Claude  │   LLMs)   │
│  │  Code)   │  │  Code)   │  │  Code)   │           │
│  └────┬─────┘  └────┬─────┘  └────┬─────┘           │
│       │             │             │                 │
│       └─────────────┴─────────────┘                 │
│                     │                               │
│             ┌─────────▼─────────┐                   │
│             │  .coral/public/   │  (shared memory)  │
│             │  ├── attempts/    │                   │
│             │  ├── notes/       │                   │
│             │  └── skills/      │                   │
│             └───────────────────┘                   │
│                     │                               │
│             ┌─────────▼─────────┐                   │
│             │  Grader / Eval    │  (reward signal)  │
│             └───────────────────┘                   │
│                     │                               │
│             ┌─────────▼─────────┐                   │
│             │  Heartbeat Mgr    │  (anti-stagnation)│
│             └───────────────────┘                   │
└─────────────────────────────────────────────────────┘

Each agent runs in its own git worktree branch. Shared state (attempts, notes, skills) lives in .coral/public/ and is symlinked into every worktree — agents see each other’s work in real time with zero sync overhead. The manager watches for new attempts and can interrupt agents with heartbeat-triggered prompts (e.g. “reflect”, “consolidate skills”).

Part 2 — Prerequisites & System Requirements

2.1 System Requirements

RequirementDetail
OSLinux / macOS (tmux required)
Python3.11 or higher
Package manageruv (from Astral)
GitRequired for worktree isolation
API KeyAnthropic (Claude Code), OpenAI (Codex), or local (OpenCode)
Budget awarenessSingle 4-agent run can cost up to $240 USD — set API spending limits before starting

2.2 Install uv (Package Manager)

curl -LsSf <https://astral.sh/uv/install.sh> | sh

Verify:

uv --version

2.3 Install and Authenticate Your Coding Agent

Before using CORAL, make sure you have fully set up the agent(s) you plan to use. Install the Agent following official installation instructions (e.g., Claude Code, Codex, OpenCode). Authentication: Login and authenticate your coding agent first to make sure they do not ask for your credentials in CLI mode. Set up any required environment variables, configuration files, or authentication secrets. CORAL does not handle agent installation or authentication for you. The infrastructure will fail to function if the underlying agent cannot start or is not properly authenticated.

For Claude Code (recommended, most tested):

npm install -g @anthropic-ai/claude-code
claude login

Then configure permissions so the agent can run autonomously. Edit ~/.claude/settings.json:

{
  "permissions": {
    "allow": ["Bash", "Read", "Write", "Edit"],
    "deny": ["WebFetch", "WebSearch"]
  }
}

Part 3 — Install CORAL

git clone <https://github.com/Human-Agent-Society/CORAL.git>
cd CORAL

# Sync dependencies (Python packages via uv)
uv sync

# Optional: include web dashboard dependencies
uv sync --extra ui

Verify the install:

uv run coral --help

You should see all 17+ CLI commands listed.

Part 4 — Building Your First CORAL Task (Full Walkthrough)

The official TSP (Travelling Salesman Problem) example is the cleanest path to understanding the complete lifecycle.

Step 1 — Create the Task Directory Structure

mkdir -p examples/tsp/seed
mkdir -p examples/tsp/eval

Your target layout:

examples/tsp/
├── seed/
│   └── solution.py      ← agents iterate on THIS
├── eval/
│   └── grader.py        ← scores agent submissions
└── task.yaml            ← CORAL configuration

Step 2 — Write the Seed Codebase

The seed is the starting code that agents will iterate on. It should be a working but naive baseline. Agents will improve on it through evolutionary search.

# examples/tsp/seed/solution.py
import random

# Restate the problem here — agents cannot read grader.py
random.seed(42)
CITIES = [(random.random(), random.random()) for _ in range(100)]

# Naive baseline: visit cities in index order
for i in range(len(CITIES)):
    print(i)

Why seed matters: The better your seed, the less time agents spend on trivial improvements. You can provide an empty seed, but agents will take longer to converge.

Step 3 — Write the Grader (Reward Function)

This is the most critical component. Subclass TaskGrader and implement evaluate(). The base class provides two helpers: self.run_program(filename) which runs a file from the agent’s codebase in a subprocess and returns a CompletedProcess (with .stdout, .stderr, .returncode), and self.fail(reason) which records the failure and returns a null score.

# examples/tsp/eval/grader.py
import math
import random
from coral.grader import TaskGrader, ScoreBundle

random.seed(42)
CITIES = [(random.random(), random.random()) for _ in range(100)]

class Grader(TaskGrader):
    def evaluate(self) -> float | ScoreBundle:
        try:
            result = self.run_program("solution.py")
            order = [int(x) for x in result.stdout.strip().split("\\n")]
            assert sorted(order) == list(range(len(CITIES)))
            dist = sum(
                math.dist(CITIES[order[i]], CITIES[order[(i + 1) % len(order)]])
                for i in range(len(order))
            )
            return -dist  # shorter = higher score
        except Exception as e:
            return self.fail(str(e))

Grader design principles:

PrincipleExplanation
DeterministicSame input must always produce the same score
FastEach agent submits many attempts; slow graders create bottlenecks
No hallucinationScore comes from running actual code, not LLM judgment
Scalar outputReturn a float (higher = better) or use ScoreBundle for multi-metric
Fail gracefullyAlways wrap in try/except and call self.fail()

Step 4 — Write the Task Configuration (task.yaml)

This YAML file is the control center for everything CORAL does.

# examples/tsp/task.yaml

task:
  name: tsp
  description: |
    Find the shortest round-trip tour through 100 cities. The coordinates
    are generated via random with a fixed seed in solution.py.
    DO NOT MODIFY the seed or CITIES generation!

    solution.py must print 100 integers (0-99) to stdout, one per line,
    representing the visit order. Each city must appear exactly once.

    The grader computes the total Euclidean round-trip distance
    and returns -distance as the score (shorter = higher).

grader:
  type: function
  module: eval.grader        # points to eval/grader.py → Grader class

agents:
  count: 1                   # start with 1, scale to 4-8 when ready
  runtime: claude_code       # or "codex" or "opencode"
  model: claude-sonnet-4-6   # use sonnet to test; opus for production
  max_turns: 200             # agent reboots after this many turns; CORAL keeps running

workspace:
  results_dir: "./results"          # where outputs are stored
  repo_path: "./examples/tsp/seed"  # your seed codebase

Agent count guidance:

AgentsUse CaseApprox. Cost (3h run)
1Testing & debugging~$15–60
4Standard multi-agent run~$60–240
8Maximum exploration~$120–480

Step 5 — Validate Before Launching

Always validate your grader before spending API credits:

uv run coral validate examples/tsp/task.yaml

This runs the grader against your seed codebase without spawning any agents. Fix errors here before proceeding.

Step 6 — Launch CORAL

uv run coral start --config examples/tsp/task.yaml

You should then see CORAL in a tmux session named coral-tsp.

CORAL will automatically:

  1. Create a git repository from your seed directory
  2. Create isolated git worktrees for each agent
  3. Set up .coral/public/ shared state directory
  4. Symlink the shared state into each worktree
  5. Generate CORAL.md instruction files for each agent
  6. Spawn the coding agents as subprocesses
  7. Start the heartbeat manager in the background

Step 7 — Monitor Progress

# Open web dashboard (port 8420)
uv sync --extra ui && uv run coral ui

# CLI leaderboard (top 20 attempts)
uv run coral log

# Agent health + real-time status
uv run coral status

# View notes written by agents
uv run coral notes

# View abstracted skills discovered by agents
uv run coral skills

Step 8 — Override Config at Runtime

You can override YAML values directly from the CLI without editing files:

# Launch with 4 agents instead of 1
uv run coral start -c examples/tsp/task.yaml agents.count=4

# Use a different model
uv run coral start -c examples/tsp/task.yaml agents.model=opus

Part 5 — Understanding the Shared State Directory

Shared state (attempts, notes, skills) lives in .coral/public/ and is symlinked into every worktree — agents see each other’s work in real time with zero sync overhead.

.coral/
├── public/                    ← shared across ALL agents (symlinked)
│   ├── attempts/              ← JSON logs of every eval, keyed by commit hash
│   ├── notes/                 ← Markdown hypotheses written by agents
│   └── skills/                ← Reusable code modules from successful runs
├── logs/
│   └── agent_{id}.log         ← Per-agent terminal logs
└── heartbeat/
    └── config.json            ← Heartbeat trigger configuration

agent_1/                       ← Agent 1's isolated git worktree
├── .coral → ../.coral/public  ← symlink to shared state
├── CORAL.md                   ← agent instruction file (auto-generated)
├── solution.py                ← agent's current working code
└── ...

agent_2/                       ← Agent 2's isolated worktree
└── ...

The Three Artifact Types Explained

Attempts (attempts/*.json)

{
  "commit_hash": "a3f92bc",
  "agent_id": "agent_1",
  "score": -45.23,
  "description": "Applied 2-opt local search after nearest-neighbor init",
  "timestamp": "2026-04-26T08:14:22",
  "status": "success"
}

Notes (notes/*.md)

---
agent: agent_2
timestamp: 2026-04-26T09:02:11
tags: [hypothesis, topology]
---
# Hypothesis: Or-opt moves may outperform 2-opt on this instance

The 2-opt implementation converges quickly but gets stuck around -43.x.
Observed that agent_1's nearest-neighbor init creates long crossing edges
in the middle cluster. Or-opt with segment size 3 might resolve this.
Next: try 3-opt with Lin-Kernighan style moves.

Skills (skills/)

skills/
└── two_opt_optimizer/
    ├── SKILL.md              ← description, usage, when to apply
    └── two_opt.py            ← reusable implementation

Part 6 — The CORAL.md Agent Instruction File

CORAL auto-generates a CORAL.md in each agent’s worktree. This is the agent’s operating manual — the deterministic behavioral scaffold that tells the frozen LLM how to work. Key sections include:

# CORAL Agent Instructions

## Your Identity
You are Agent 1 of 4 working on: TSP (100-city optimization)

## Orientation (do this FIRST, before any code)
1. Read this task description carefully
2. Read key files to understand current code state
3. Check the leaderboard: `coral log` — find the best current score
4. Check recent agent activity: what are others trying?
5. Inspect top attempts by hash: `coral show <hash>`
6. Search for prior art: keywords relevant to this problem
7. Read notes from other agents — what hypotheses exist?
8. Check available skills in the shared system

## Workflow Loop (repeat continuously)
PLAN → EDIT → EVAL → REPEAT

### Plan
- Review what worked. Check coral logs.
- Inspect top attempts and notes from teammates.
- Think creatively. What hasn't been tried?

### Edit
- Modify solution.py (and helper files if needed)
- Do not modify the seed or evaluation logic

### Eval
- Run: `coral eval -m "description of what you tried"`
- This stages, commits, and grades in one shot
- Read the score. Understand why it improved or didn't.

## Ground Rules
- You are fully autonomous. Do not ask for permission.
- Write notes frequently: `coral notes add "your hypothesis"`
- Extract skills from successful approaches
- Collaborate: learn from your agent teammates
- Never modify grader.py or task configuration

Part 7 — The Heartbeat Protocol

The heartbeat is CORAL’s mechanism for escaping local minima. It runs asynchronously in the background and can interrupt agents with two types of triggers:

View and Modify Heartbeat Config

uv run coral heartbeat          # view current config
uv run coral heartbeat add      # add a new action
uv run coral heartbeat remove   # remove an action

Trigger Types

Interval Trigger — fires after N evaluation attempts:

{
  "type": "interval",
  "every_n_evals": 10,
  "action": "consolidate_skills",
  "prompt": "You have run 10 evaluations. Stop coding. Read all notes in .coral/public/notes/. Synthesize the key findings into a new skill file. What is the best approach discovered so far? What should the next agent prioritize?"
}

Plateau Trigger — fires when scores stop improving:

{
  "type": "plateau",
  "patience": 5,
  "action": "orthogonal_approach",
  "prompt": "Your last 5 attempts have not improved the score. You are in a local minimum. Stop the current approach entirely. Think of a mathematically orthogonal method you have not tried. What would a completely different algorithm look like?"
}

This is conceptually equivalent to thermal noise injection in simulated annealing — applying random perturbation to escape gravitational wells in the search space.

Part 8 — Multi-Agent Scale-Up

Once your single-agent run works, scaling to multi-agent is just one config change:

agents:
  count: 4           # scale from 1 → 4
  runtime: claude_code
  model: claude-opus-4-6   # upgrade to opus for production
uv run coral start -c examples/tsp/task.yaml agents.count=4

While a single autonomous agent can already outperform strong state-of-the-art baselines, a population of agents can push performance substantially further. On Anthropic’s take-home task for a kernel engineer role, a single agent improved the state of the art from 1,363 cycles to 1,350, while a population of four agents pushed it dramatically further to 1,103.

Part 9 — Stop, Resume, and Iterate

# Stop all agents cleanly
uv run coral stop

# Resume from where you left off (session state is preserved)
uv run coral resume

# View all past runs
uv run coral runs

# Inspect a specific attempt by commit hash
uv run coral show a3f92bc

# Reset a worktree to a previous good state
uv run coral checkout a3f92bc

# Undo last commit in current worktree
uv run coral revert

Part 10 — Complete CLI Reference

CORAL provides 17+ commands across 5 modules:

CommandWhat It Does
coral init <name>Scaffold a new task skeleton
coral validate <config>Test grader against seed, no agents
coral start -c task.yamlLaunch all agents
coral resumeResume previous run
coral stopStop all agents
coral statusAgent health + leaderboard
coral logTop 20 attempts leaderboard
coral log --recentMost recent attempts
coral log --search "query"Search attempts by description
coral show <hash>Full attempt details + code diff
coral notesBrowse all agent notes
coral skillsBrowse all abstracted skills
coral runsList all historical runs
coral uiWeb dashboard (port 8420)
coral eval -m "desc"Stage + commit + grade (used by agents)
coral diffShow uncommitted changes
coral revertUndo last commit
coral checkout <hash>Reset to previous attempt
coral heartbeatView/modify heartbeat config

Part 11 — Available Example Tasks

Ready-to-run task configurations are provided in examples/:

TaskDomainWhat Agents Optimize
circle_packingOptimizationPack 26 circles into unit square
erdosMathematicsSolve Erdős Min Overlap conjecture
kernel_builderSystemsVLIW SIMD kernel performance
kernel_engineeringSystemsGPU kernel cycle count
mnistMachine LearningDigit classification accuracy
spaceship_titanicML/KagglePassenger survival prediction
stanford_covid_vaccineBio/MLmRNA degradation rate

Run any example directly:

uv run coral start -c examples/circle_packing/task.yaml
uv run coral start -c examples/kernel_engineering/task.yaml

Part 12 — Key Design Decisions & Warnings

What CORAL updates vs. what it doesn’t:

Updated During a RunNever Updated
.coral/public/attempts/*.jsonLLM model weights
.coral/public/notes/*.mdGrader logic
.coral/public/skills/Task configuration
Agent worktree code filesToken context window of any model
Heartbeat configuration

Cost Management — Critical:

  • Set hard API spending limits in your provider dashboard before launching
  • Start with agents.count=1 and model: sonnet for testing
  • Upgrade to count=4 and model: opus only for production runs
  • A 12-hour, 8-agent Opus run can cost $500+ if uncapped

Budget-aware launch pattern:

# Test run: 1 agent, sonnet, 50 turns max
uv run coral start -c task.yaml agents.count=1 agents.model=claude-sonnet-4-6 agents.max_turns=50

# Production run: 4 agents, opus, unlimited turns (stop manually)
uv run coral start -c task.yaml agents.count=4 agents.model=claude-opus-4-6

Summary: The CORAL Setup Checklist

✅ 1. Install uv
✅ 2. Install & authenticate your coding agent (Claude Code / Codex / OpenCode)
✅ 3. Clone CORAL repo and run `uv sync`
✅ 4. Create task directory: seed/ + eval/
✅ 5. Write seed codebase (naive working baseline)
✅ 6. Write grader.py (subclass TaskGrader, implement evaluate())
✅ 7. Write task.yaml (link seed, grader, agent config)
✅ 8. Validate: `uv run coral validate task.yaml`
✅ 9. Set API spending limits!
✅ 10. Launch: `uv run coral start -c task.yaml`
✅ 11. Monitor: `uv run coral ui` or `uv run coral status`
✅ 12. Iterate: stop → adjust heartbeat → resume

🎬Video about CORAL

🔍 Related Concepts and Context in the Video

Why not train the LLM? The presenter’s central critique — and genuine puzzle — is that elite research institutions are deliberately not updating model weights, even when new knowledge is being discovered during runs. The argument is economic and practical: file-system-based intelligence is cheaper, debuggable, human-readable, and deployable without retraining cycles.

The ADI Framing The presenter reframes this class of systems as “Advanced Intelligence of the Environment” (ADI), contrasting it with AGI. The suggestion is that what institutions are building is not a smarter mind, but a smarter world around a frozen mind.

Connections to Prior Work The presenter references OpenClaw and AutoResearch-Claw as conceptual predecessors — autonomous agent loops with no human interaction that spawn agents, share knowledge, and iterate. Early autonomous agent frameworks like AutoGPT and BabyAGI established the pattern of AI programs that can think for themselves, create tasks, and reprioritize their task list to achieve a given objective — CORAL extends this into a far more structured, multi-agent, file-system-native paradigm.

Open-Ended Complexity Unlike benchmark-oriented systems, CORAL targets open-ended problem spaces — mathematical optimization, theoretical physics conjectures, and similar domains where the solution topology is unknown. The file system becomes the mechanism for agents to build cumulative knowledge maps across potentially days-long runs.

Cost Warning The presenter issues a practical caution: a single 3-hour, 4-agent CORAL run using Claude Opus 4 can cost up to $240 USD. Extended overnight multi-agent runs could produce significant unexpected API charges without proper credit limits in place.


✅ Conclusion and Key Takeaways

CORAL represents a meaningful inflection point in AI systems design — not because it makes models smarter, but because it makes the context around models dramatically richer, more persistent, and more collaborative. Whether this is the “right” path forward is a question the presenter leaves deliberately open. The frozen-model, file-system-native approach trades the unpredictability of fine-tuning for the predictability of deterministic markdown templates — but also carries genuine risks: smaller LLMs hallucinating during complex synthesis tasks, costs scaling rapidly with agent count, and a system architecture that depends heavily on the linguistic reasoning ability of the very LLM it refuses to update.

Key Takeaways:

  • CORAL is an autonomous multi-agent infrastructure from MIT/Stanford/Meta — not just a framework but a full operational system
  • Intelligence in CORAL lives in the file system harness, not the model — all LLMs remain frozen
  • Three artifact types (Attempts, Notes, Skills) form the shared memory layer across all agents
  • The Heartbeat Protocol is CORAL’s mechanism for escaping local minima — analogous to simulated annealing
  • This is gradient-free, inference-time intelligence — no training, no fine-tuning, no weight updates
  • Cost discipline is critical: multi-agent runs with frontier models can become very expensive very quickly
  • The presenter coins ADI (Advanced Intelligence) to distinguish this paradigm from classical AI/AGI framing
  • The GitHub repo is public, MIT licensed, and actively maintained: github.com/corl-team

📚 Related References

Leave a Reply

Your email address will not be published. Required fields are marked *