2026-03-15

Fast Mode, Compaction API & Claude Code Updates

✦ Fast Mode for Opus 4.6 — Up to 2.5× Faster Output

Launched on 7 February 2026, Fast Mode for Claude Opus 4.6 delivers up to 2.5× faster output token generation via a new speed parameter in the Messages API. Previously you had to choose between Opus-level intelligence and sub-second responsiveness; Fast Mode lets you get both — at premium pricing — for latency-sensitive agentic applications.

How to enable Fast Mode

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    speed="fast",          # ← enable Fast Mode
    messages=[{
        "role": "user",
        "content": "Summarise this report in bullet points."
    }]
)
print(response.content[0].text)

When to use it

Real-time agents — coding assistants, live chat, interactive workflows where users are watching the stream.
Iterative pipelines — multi-step chains where every node adds latency; Fast Mode compounds the saving across each hop.
High-volume burst workloads — faster throughput means you can serve more concurrent requests within your rate limits.

Tip Fast Mode is currently available via a waitlist at claude.com/fast-mode. Standard Opus 4.6 pricing applies with a premium multiplier — benchmark your actual use case to confirm the speed gain justifies the cost delta for your workload.

✦ Compaction API — Effectively Infinite Conversations

The Compaction API (launched in beta on 5 February 2026) provides server-side context summarisation for Claude Opus 4.6 conversations. When your conversation grows beyond the model's practical context limit, the API can automatically distil the history into a compact summary, then continue as if no context was lost. Long-running agents, multi-session workflows, and persistent assistants can now operate without manual context management.

How compaction works

You set a compaction policy on your request — the API monitors token usage and triggers summarisation when a threshold is approached.
The server generates a structured summary of prior turns, retaining key facts, decisions, and outstanding tasks.
The summary replaces the raw history in subsequent turns — significantly shrinking the effective prompt size without losing meaningful context.
A compaction_metadata object in the response tells you when and how many tokens were compacted.

Minimal usage

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=2048,
    compaction={
        "enabled": True,
        "threshold": 0.85   # compact when 85 % of context is used
    },
    messages=long_conversation_history
)

Key insight Previously, long-running agents had to either truncate history (losing context) or implement bespoke summarisation logic (engineering overhead). The Compaction API moves this complexity to the platform layer so your agent code stays simple.

Beta caveat Compaction is currently available only for Opus 4.6. Summarisation is non-deterministic — run evals on your use case to verify that no business-critical information is lost across compaction boundaries.

✦ Claude Code: Three New Productivity Features

Recent Claude Code releases (v2.1.74 – v2.1.76) have shipped several quality-of-life improvements aimed at reducing wasted context and giving developers finer control over effort and speed. Here are the three most impactful additions.

1. `/context` — Actionable Optimisation Suggestions

Running /context in any Claude Code session now shows not just how many tokens are in use, but actionable suggestions identifying the biggest optimisation opportunities in your current context window — for example, large tool results that could be summarised, or conversation branches that are no longer relevant. Use it before a long coding task to ensure you're not burning context on stale scaffolding.

2. `/effort` — On-the-Fly Effort Control

The new /effort slash command lets you adjust Claude's effort level mid-session without restarting. Higher effort means deeper reasoning (and more tokens); lower effort is faster and cheaper for mechanical tasks like simple edits or reformatting. This maps to the API's effort parameter (which replaced budget_tokens for Opus 4.6).

/effort high    # deep reasoning for architecture decisions
/effort low     # fast mode for search-and-replace style edits

3. `worktree.sparsePaths` — Sparse Worktree Checkouts

In large monorepos, running Claude Code in an isolated git worktree previously checked out the entire repo — slow and token-expensive. The new worktree.sparsePaths setting in ~/.claude/settings.json lets you specify which directories to include, giving Claude a lean, relevant slice of the repository.

// ~/.claude/settings.json
{
  "worktree": {
    "sparsePaths": ["src/api", "src/shared", "tests/integration"]
  }
}

Tip Combine worktree.sparsePaths with the /context command at session start. Check optimisation suggestions, trim irrelevant files from context, then use /effort high only for the reasoning-intensive parts of your task.

✦ Custom Subagents & Agent Teams in Claude Code

Claude Code now supports custom subagents — specialised Claude instances defined in your project's .claude/agents/ directory — and agent teams where multiple parallel sessions coordinate through a shared task list. These two features let you decompose complex, long-horizon work in a way that keeps each agent's context window clean and focused.

Custom Subagents

A subagent is a Markdown file in .claude/agents/ with a YAML front-matter block. Claude can delegate to it automatically (based on context) or on demand via /agent <name>. Each subagent runs in its own isolated context window, so tool results and file reads don't pollute your main session.

# .claude/agents/test-runner.md
---
name: test-runner
description: Runs the test suite and reports failures with root-cause analysis
model: claude-haiku-4-5         # fast, cheap for mechanical work
allowedTools: [Bash, Read, Glob]
---

You are a test specialist. When invoked, run `npm test`, capture failures,
read the relevant source files, and return a structured failure report.
Do not fix the code — only diagnose.

Agent Teams

For large parallel workloads, assemble a team: one lead agent breaks the problem into tasks, spawns worker agents to handle them concurrently, and synthesises the results. Workers communicate through a shared task list (created via TaskCreate / TaskUpdate) and message each other through the lead's context.

Ideal for codebase-wide refactors, bulk migrations, and multi-file analysis.
Each worker operates with the minimum tool set needed — use allowedTools to enforce least-privilege.
The lead's context stays small: it only sees task summaries, not each worker's full file read history.

Tip Store subagent definitions in a shared repo so the whole team inherits the same specialised helpers. Treat the description field as documentation — Claude reads it to decide when to delegate automatically.

✦ Fast Mode for Opus 4.6 — Up to 2.5× Faster Output

How to enable Fast Mode

When to use it

✦ Compaction API — Effectively Infinite Conversations

How compaction works

Minimal usage

✦ Claude Code: Three New Productivity Features

1. /context — Actionable Optimisation Suggestions

2. /effort — On-the-Fly Effort Control

3. worktree.sparsePaths — Sparse Worktree Checkouts

✦ Custom Subagents & Agent Teams in Claude Code

Custom Subagents

Agent Teams

1. `/context` — Actionable Optimisation Suggestions

2. `/effort` — On-the-Fly Effort Control

3. `worktree.sparsePaths` — Sparse Worktree Checkouts