Claude Opus 4.6 — What’s New
Released by Anthropic on 2026-02-05, focused on coding, enterprise agents, and professional workflows.
Official links
- Introducing Claude Opus 4.6: https://www.anthropic.com/news/claude-opus-4-6
- Product page: https://www.anthropic.com/claude/opus
Key updates
Agent Teams
The headline feature. You can split work into multiple agents that run in parallel and coordinate directly—no more single-agent, fully-serial workflows.
1M token context (beta)
Opus now supports a 1M-token context window (beta) and up to 128K output, aimed at long-running tasks and large codebases.
Context Compaction
When context gets close to the limit, the system can automatically compress older content into a summary to keep long conversations usable.
Adaptive Thinking
Configurable “thinking intensity” with multiple levels (low/medium/high/max). Use lower levels for simple requests to reduce cost/latency; crank it up for hard problems.
Stronger coding performance
Reported Terminal-Bench 2.0 score: 65.4% (new high). Improvements across planning, review, debugging, and working inside large repos.
Zero-day vulnerability detection
Anthropic claims out-of-the-box discovery of 500+ open-source zero-days, all human-verified.
Long-context performance
MRCR v2 score reportedly 76% (vs Sonnet 4.5 at 18.5%), addressing the “gets fuzzy as the conversation grows” problem.
Office integrations
Excel can better interpret messy spreadsheets; PowerPoint generation is in preview and tries to match existing styles.
Benchmarks (as reported)
| Benchmark | Result |
|---|---|
| GDPval-AA (finance/legal real work) | 1606 Elo, +144 over GPT-5.2 |
| Finance agent benchmark | #1 |
| Terminal-Bench 2.0 | 65.4% |
Safety
Anthropic positions Opus 4.6 as having strong safety performance and says it underwent the most rigorous evaluation process to date.
What this means for me
Agent Teams is the most exciting part. In practice, it’s the difference between “one agent does everything” and “one agent codes, one tests, one researches docs” — a big real-world speedup.
1M context + compaction matters for long projects. It reduces the painful loop of re-explaining requirements as the session grows.
Adaptive Thinking is also practical: not every request needs maximum reasoning. Being able to dial it down helps save tokens for the truly hard parts.
On pricing: I’m on the Max 5x plan ($100/month). It’s not cheap, but for heavy users it can be cost-effective—especially if Adaptive Thinking lets you stretch the Opus budget further.