UC Santa Barbara

GEA

Group-Evolving Agents — Open-Ended Self-Improvement via Experience Sharing

arXiv:2602.04837 Feb 2026 AI / Multi-Agent Systems

Abstract

What the paper is about

Group-Evolving Agents (GEA) proposes a fundamentally new paradigm for open-ended self-improvement in AI systems. Rather than treating individual agents as the unit of evolution — the "lone wolf" approach that dominates current self-improving agent frameworks — GEA treats a group of agents as the fundamental evolutionary unit. Agents within the group can autonomously modify their own structural designs to advance capabilities and overcome limitations. A shared pool of experiences is inherited by every future agent as established fact, enabling initial exploratory diversity that evolves over time. On challenging coding benchmarks, GEA achieves 71.0% compared to 56.7% for prior self-evolving methods, and matches human-engineered AI systems on SWE-bench with zero additional inference cost.

Core Concepts

The three pillars of GEA

Group as Unit

Evolution operates at the group level, not the individual. Agents are generated, evaluated, and replaced collectively — like a population rather than a lone optimizer.

Experience Sharing

A shared memory pool accumulates successful strategies and failures. Each new generation inherits the distilled wisdom of all prior agents in the group.

Open-Ended Growth

No predefined target capability. The system continuously discovers new strategies and incorporates them, enabling unbounded self-improvement over time.

A Paradigm Shift

From solitary evolution to collective intelligence

Before

Lone Agent

Single agent optimizes in isolation

Limited exploration diversity

Knowledge lost between iterations

Converges to local optimum

GEA

Group Evolution

Population evolves collectively

Diverse exploration across agents

Shared experience pool persists

Open-ended continuous improvement

Architecture

How GEA works — the evolution loop

Initialize

Create a group of diverse agents with varied prompts and strategies

Execute

Each agent independently attempts the target task

Evaluate

Score agents on task performance and solution quality

Extract successful strategies into the shared experience pool

Evolve

Replace weakest agents with new ones inheriting from the pool

Experience Pool

A persistent, growing repository of strategies, lessons learned, and successful patterns. Every agent contributes; every new agent inherits.

Agent Diversity

Initial agents are deliberately diverse — different system prompts, reasoning strategies, and tool-use patterns — ensuring broad initial exploration of the solution space.

Selection Pressure

Underperforming agents are replaced. New agents are spawned using mutated prompts informed by the experience pool, creating a Darwinian improvement dynamic.

Zero Inference Overhead

At deployment time, only the best evolved agent runs. The evolution process is offline — no extra inference cost during production use.

Performance

Significant gains over prior self-evolving methods

71.0%

Coding Benchmark

GEA performance

56.7%

Previous SOTA

Self-evolving methods

+14.3

Absolute Gain

Percentage points

Inference Overhead

At deployment time

GEA (Group-Evolving Agents) 71.0%

Self-Evolving Methods (SOTA) 56.7%

Human-Engineered Systems ~71%

Experience Sharing

The mechanism that makes GEA work

The core innovation of GEA is the shared experience pool. Each agent in the group, when it successfully completes a task, contributes its strategy — the chain of reasoning, tool selections, and structural decisions — to this shared pool.

When a new agent is spawned to replace an underperformer, it doesn't start from scratch. It inherits the accumulated experience of all prior successful agents as established fact. This is fundamentally different from individual self-improvement, where each iteration only has its own past to learn from.

The result is a form of cultural evolution — knowledge accumulates across the population, not just within a single lineage. This enables GEA to discover and retain diverse strategies that a single agent would never explore on its own.

SWE-bench

Matching human-engineered systems at zero extra cost

71%

GEA matches the performance of human-engineered AI systems on SWE-bench, the gold-standard benchmark for real-world software engineering tasks.

Zero additional inference cost at deployment — evolution happens offline.

Read Paper Source Code