2026-03-14

Context, Caching & Plan Mode — Three Big Upgrades

✦ 1 Million Token Context Window — Now Generally Available

As of 12 March 2026, the 1 million token context window is generally available for both Claude Opus 4.6 and Claude Sonnet 4.6 — no beta header required. This removes the single biggest practical barrier to processing entire codebases, lengthy documents, or long conversation histories in a single pass.

What fits in 1M tokens?

~750 000 words of plain text — roughly 10 full-length novels
A medium-to-large codebase (e.g. all source files in a typical SaaS monorepo)
Hours of meeting transcripts, call logs, or support tickets
Large PDF documents such as legal contracts or technical specifications

Practical tips for large contexts

Place the most critical instructions at the very start and very end of the prompt — Claude weighs these positions most heavily in long contexts.
Combine with prompt caching (see entry below) to avoid re-processing the same large corpus on every turn.
For structured retrieval over huge corpora, still prefer embedding-based search — not every use case needs to dump everything into context.

Tip Measure latency before committing to 1M-token prompts in production. Time-to-first-token grows with context size; for latency-sensitive apps consider chunked retrieval or caching strategies first.

✦ Automatic Prompt Caching

Anthropic has launched automatic prompt caching for the Messages API. Instead of manually placing cache_control breakpoints throughout your prompt, you now add a single cache_control field to your request and Claude automatically moves the cache point forward as the conversation grows. This makes multi-turn conversations with large system prompts or tool definitions dramatically cheaper and faster — without any extra engineering effort.

How it works

Cached tokens are stored server-side for up to 5 minutes by default (configurable via the ttl parameter).
Cache hits are billed at a reduced rate — typically 90 % cheaper than full input token pricing.
The cache auto-advances: each turn only pays for the new tokens added since the last cache point.

Minimal API usage

// Python — enable automatic caching
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[{
        "type": "text",
        "text": very_long_system_prompt,
        "cache_control": {"type": "ephemeral"}   # ← one field
    }],
    messages=conversation_history
)

Tip Prompt caching pays off most when you have a large, stable prefix (system prompt, tool definitions, RAG context) that is reused across many turns or many users. Even a 10 000-token system prompt cached across 100 calls saves significant cost.

✦ Claude Code: Plan Before You Code

One of the highest-leverage Claude Code best practices in 2026 is explicitly separating the planning phase from the coding phase. Asking Claude to produce a detailed implementation plan — and explicitly telling it not to write any code yet — dramatically reduces wasted edits and mid-task course corrections.

The two-phase workflow

Phase 1 — Plan: Ask Claude to think through the task, list files it will touch, describe the approach, and flag risks. Use the phrase "think hard" or "think carefully" to allocate more reasoning budget. End with "Do not write any code yet. Just show me the plan."
Phase 2 — Execute: Review the plan, correct misunderstandings, then say "Go ahead and implement the plan." Claude now codes with a shared, reviewed understanding of the goal.

Combining with Claude Code's built-in Plan Mode

Claude Code ships a dedicated Plan Mode (toggle with Shift+Tab or the /plan command). In Plan Mode the model is permitted to read files and think, but all write/edit tools are disabled. This gives you a safe sandbox to review the full plan before any filesystem changes are made.

Key insight Most costly Claude Code mistakes happen when coding starts before the requirements are fully understood. A 2-minute planning step routinely saves 20+ minutes of reverting incorrect edits.

Useful phrases

"Think hard and produce a plan. Do not write code yet."
"List every file you will change and explain why."
"What edge cases should we handle before we start?"