← Back to all entries
2026-03-14

Context, Caching & Plan Mode — Three Big Upgrades

Context, Caching & Plan Mode — visual for 2026-03-14

1 Million Token Context Window — Now Generally Available

As of 12 March 2026, the 1 million token context window is generally available for both Claude Opus 4.6 and Claude Sonnet 4.6 — no beta header required. This removes the single biggest practical barrier to processing entire codebases, lengthy documents, or long conversation histories in a single pass.

What fits in 1M tokens?

Practical tips for large contexts

Tip Measure latency before committing to 1M-token prompts in production. Time-to-first-token grows with context size; for latency-sensitive apps consider chunked retrieval or caching strategies first.

context-window Opus-4.6 Sonnet-4.6 GA

Automatic Prompt Caching

Anthropic has launched automatic prompt caching for the Messages API. Instead of manually placing cache_control breakpoints throughout your prompt, you now add a single cache_control field to your request and Claude automatically moves the cache point forward as the conversation grows. This makes multi-turn conversations with large system prompts or tool definitions dramatically cheaper and faster — without any extra engineering effort.

How it works

Minimal API usage

// Python — enable automatic caching
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[{
        "type": "text",
        "text": very_long_system_prompt,
        "cache_control": {"type": "ephemeral"}   # ← one field
    }],
    messages=conversation_history
)

Tip Prompt caching pays off most when you have a large, stable prefix (system prompt, tool definitions, RAG context) that is reused across many turns or many users. Even a 10 000-token system prompt cached across 100 calls saves significant cost.

prompt-caching API cost-optimisation latency

Claude Code: Plan Before You Code

One of the highest-leverage Claude Code best practices in 2026 is explicitly separating the planning phase from the coding phase. Asking Claude to produce a detailed implementation plan — and explicitly telling it not to write any code yet — dramatically reduces wasted edits and mid-task course corrections.

The two-phase workflow

Combining with Claude Code's built-in Plan Mode

Claude Code ships a dedicated Plan Mode (toggle with Shift+Tab or the /plan command). In Plan Mode the model is permitted to read files and think, but all write/edit tools are disabled. This gives you a safe sandbox to review the full plan before any filesystem changes are made.

Key insight Most costly Claude Code mistakes happen when coding starts before the requirements are fully understood. A 2-minute planning step routinely saves 20+ minutes of reverting incorrect edits.

Useful phrases

claude-code plan-mode workflow best-practices