✦ Fast Mode for Opus 4.6 — Up to 2.5× Faster Output
Launched on 7 February 2026, Fast Mode for Claude Opus 4.6
delivers up to 2.5× faster output token generation via a new
speed parameter in the Messages API. Previously you had to choose between
Opus-level intelligence and sub-second responsiveness; Fast Mode lets you get both — at
premium pricing — for latency-sensitive agentic applications.
How to enable Fast Mode
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=1024,
speed="fast", # ← enable Fast Mode
messages=[{
"role": "user",
"content": "Summarise this report in bullet points."
}]
)
print(response.content[0].text)
When to use it
Real-time agents — coding assistants, live chat, interactive workflows
where users are watching the stream.
Iterative pipelines — multi-step chains where every node adds latency;
Fast Mode compounds the saving across each hop.
High-volume burst workloads — faster throughput means you can serve more
concurrent requests within your rate limits.
Tip Fast Mode is currently available via a waitlist at
claude.com/fast-mode. Standard Opus 4.6 pricing applies with a premium
multiplier — benchmark your actual use case to confirm the speed gain justifies the cost
delta for your workload.
fast-modeOpus-4.6latencyAPI
✦ Compaction API — Effectively Infinite Conversations
The Compaction API (launched in beta on 5 February 2026)
provides server-side context summarisation for Claude Opus 4.6 conversations.
When your conversation grows beyond the model's practical context limit, the API can
automatically distil the history into a compact summary, then continue as if no context was
lost. Long-running agents, multi-session workflows, and persistent assistants can now
operate without manual context management.
How compaction works
You set a compaction policy on your request — the API monitors token usage
and triggers summarisation when a threshold is approached.
The server generates a structured summary of prior turns, retaining key facts, decisions,
and outstanding tasks.
The summary replaces the raw history in subsequent turns — significantly shrinking the
effective prompt size without losing meaningful context.
A compaction_metadata object in the response tells you when and how many
tokens were compacted.
Minimal usage
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=2048,
compaction={
"enabled": True,
"threshold": 0.85 # compact when 85 % of context is used
},
messages=long_conversation_history
)
Key insight Previously, long-running agents had to either truncate
history (losing context) or implement bespoke summarisation logic (engineering overhead).
The Compaction API moves this complexity to the platform layer so your agent code stays
simple.
Beta caveat Compaction is currently available only for Opus 4.6.
Summarisation is non-deterministic — run evals on your use case to verify that no
business-critical information is lost across compaction boundaries.
compactionOpus-4.6agentscontext-managementbeta
✦ Claude Code: Three New Productivity Features
Recent Claude Code releases (v2.1.74 – v2.1.76) have shipped several quality-of-life
improvements aimed at reducing wasted context and giving developers finer control over
effort and speed. Here are the three most impactful additions.
1. /context — Actionable Optimisation Suggestions
Running /context in any Claude Code session now shows not just how many tokens
are in use, but actionable suggestions identifying the biggest optimisation
opportunities in your current context window — for example, large tool results that could
be summarised, or conversation branches that are no longer relevant. Use it before a long
coding task to ensure you're not burning context on stale scaffolding.
2. /effort — On-the-Fly Effort Control
The new /effort slash command lets you adjust Claude's
effort level mid-session without restarting. Higher effort means deeper
reasoning (and more tokens); lower effort is faster and cheaper for mechanical tasks like
simple edits or reformatting. This maps to the API's effort parameter
(which replaced budget_tokens for Opus 4.6).
/effort high # deep reasoning for architecture decisions
/effort low # fast mode for search-and-replace style edits
In large monorepos, running Claude Code in an isolated git worktree previously checked out
the entire repo — slow and token-expensive. The new worktree.sparsePaths
setting in ~/.claude/settings.json lets you specify which directories to
include, giving Claude a lean, relevant slice of the repository.
Tip Combine worktree.sparsePaths with the
/context command at session start. Check optimisation suggestions, trim
irrelevant files from context, then use /effort high only for the
reasoning-intensive parts of your task.
Claude Code now supports custom subagents — specialised Claude instances
defined in your project's .claude/agents/ directory — and
agent teams where multiple parallel sessions coordinate through a shared
task list. These two features let you decompose complex, long-horizon work in a way that
keeps each agent's context window clean and focused.
Custom Subagents
A subagent is a Markdown file in .claude/agents/ with a YAML front-matter
block. Claude can delegate to it automatically (based on context) or on demand via
/agent <name>. Each subagent runs in its own isolated context window,
so tool results and file reads don't pollute your main session.
# .claude/agents/test-runner.md
---
name: test-runner
description: Runs the test suite and reports failures with root-cause analysis
model: claude-haiku-4-5 # fast, cheap for mechanical work
allowedTools: [Bash, Read, Glob]
---
You are a test specialist. When invoked, run `npm test`, capture failures,
read the relevant source files, and return a structured failure report.
Do not fix the code — only diagnose.
Agent Teams
For large parallel workloads, assemble a team: one lead agent breaks
the problem into tasks, spawns worker agents to handle them concurrently, and synthesises
the results. Workers communicate through a shared task list (created via
TaskCreate / TaskUpdate) and message each other through the
lead's context.
Ideal for codebase-wide refactors, bulk migrations, and multi-file analysis.
Each worker operates with the minimum tool set needed — use
allowedTools to enforce least-privilege.
The lead's context stays small: it only sees task summaries, not each worker's
full file read history.
Tip Store subagent definitions in a shared repo so the whole team
inherits the same specialised helpers. Treat the description field as
documentation — Claude reads it to decide when to delegate automatically.