Claude Sonnet 5: A Harmonious Convergence of Agility and Intellect.

It's only been a month since Opus 4.8 was announced. In that time most teams have already put the model into production, though outdated integration notes may still be sitting somewhere in the file system. This week Anthropic released a second update: Claude Sonnet 5.
This time the difference lies somewhere else. Opus 4.8 held on to the position of “top of the frontier”; Sonnet 5, meanwhile, is a mid-tier model that has climbed well above its class and into Opus territory. In Anthropic’s own words, it’s the company’s “most agentic Sonnet yet.” What’s unusual is this: it beats Opus 4.8 on some tests, while lagging clearly behind on others and its price is still at Sonnet level.
In short, the question has changed. In the Opus 4.8 piece we asked “is this update actually worth it.” With Sonnet 5 the question is: when can the cheap model now replace the expensive one, and when can’t it?
If you’re working on SkyStudio, the practical part stays the same: you can use Sonnet 5 too, through infrastructure processed under enterprise security standards, without taking on any extra integration burden.
Sonnet 5 in Brief
Released on June 30, 2026, Sonnet 5 is the fifth version of Anthropic’s mid-tier series. In the API it’s referred to as claude-sonnet-5. On claude.ai it has become the default model on the Free and Pro plans, and it’s also available on the Max, Team, and Enterprise plans. There’s an interesting move on pricing: an introductory price of $2 per million input tokens and $10 per million output tokens applies until August 31; after that date the price rises to $3/$15 still well below Opus 4.8’s $5/$25.
Feature | Value |
Context window | 1 million tokens |
Maximum output | 128,000 tokens (can be raised to 300,000 with a beta header) |
Adaptive thinking | Always on, default effort High |
Effort tiers | Low / Medium / High / xHigh |
Knowledge cutoff | January 2026 |
Price (introductory, through August 31) | Input $2 / Output $10 (per million tokens) |
Price (standard, after August 31) | Input $3 / Output $15 (per million tokens) |
Access | claude.ai, Claude Code, Claude API, Cursor, VS Code, GitHub Copilot |
One detail stands out: Sonnet 5 uses the updated tokenizer that shipped with Opus 4.7. The same text can correspond to roughly 1 to 1.35 times more tokens than under the old tokenizer. So even though the price per token looks lower, this multiplier needs to be factored in when comparing the total cost of the same job.
Why the Sonnet series matters
Sonnet 4.6, released in February 2026, was already a standout model in its class; in head-to-head comparisons, developers preferred it over Opus 4.5 59 percent of the time. Sonnet 5 takes that a step further, but this time the gap is bigger: it beats Sonnet 4.6 in every tested category agentic coding, computer use, terminal use, and knowledge work and in some areas comes close to catching up with Opus 4.8.
What do the benchmarks say?
The figures below come from Anthropic’s launch notes and system card.
Coding
On SWE-bench Pro (real-world software tasks), Sonnet 5 scores 63.2 percent. Sonnet 4.6 had stayed at 58.1 percent; Opus 4.8, at 69.2 percent, is still ahead. So while Sonnet 5 marks a clear jump over the previous version, it hasn’t fully caught up to the Opus tier in coding.
Computer use and terminal
On OSWorld-Verified, Sonnet 5 scores 81.2 percent; Sonnet 4.6 had stayed at 78.5 percent, and Opus 4.8 leads at 83.4 percent. But on Terminal-Bench 2.1 the table flips: Sonnet 5’s 80.4 percent beats Opus 4.8’s 74.6 percent. It’s fair to say Sonnet 5 is currently Anthropic’s strongest option for terminal-heavy automation work.
Knowledge work
On GDPval-AA v2 (knowledge work), Sonnet 5 scores 1,618 points, narrowly edging out Opus 4.8’s 1,615. This is one of the rare cases where a mid-tier model can outdo the flagship on tasks like verification, reporting, and decision support.
Expert-level reasoning
On Humanity’s Last Exam without tool use, Sonnet 5 scores 57.4 percent, with Opus 4.8 just slightly ahead at 57.9 percent. The gap is close to statistical noise; the two models are nearly on par in this area.
What difference does it make in practice?
Anthropic states that Sonnet 5 now carries multi-step tasks through to completion where previous versions used to leave them unfinished. Feedback from early-access partners points the same way: one automation company reports that a two-step CRM-update-and-announcement task that used to stall halfway through now completes end to end with Sonnet 5. A code-editor company says its agents now stay on plan and produce multi-step changes that adhere more closely to code conventions.
There’s no regression on the safety side either if anything, it’s improved: compared to Sonnet 4.6, Sonnet 5 has lower hallucination and sycophancy rates, and greater resistance to prompt injection attacks in agentic contexts. That said, Anthropic also notes that Sonnet 5’s rate of “misaligned behavior” remains somewhat higher than that of Opus 4.8 and the Mythos family so on the safety front too, Opus still comes out on top.
Adaptive thinking
With Sonnet 5 you don’t need to manually pick an effort level; the model decides for itself how much to “think” based on the weight of the task. It applies a light touch for a simple rewrite, and heavier reasoning for a multi-file refactor or a research question. The default effort is High in the API and in Claude Code; you can also set it manually anywhere from Low to xHigh.
Where does it stand against the competition?
Sonnet 5’s real pricing target isn’t Opus 4.8, but rival models like GPT-5.5 and Gemini 3.1 Pro. Its price per token is lower than both; only Gemini 3.5 Flash remains cheaper. According to the table Anthropic published, Sonnet 5 is ahead of these rivals in agentic coding and knowledge-work categories, but it continues to trail Opus 4.8 in coding.
So who is it useful for?
For budget-conscious teams: the gap on SWE-bench Pro still favors Opus 4.8, but Sonnet 5 has now crossed the “good enough” threshold in coding as well. On non-critical development work, Sonnet 5 delivers similar results for a fraction of Opus 4.8’s cost.
For teams building terminal and CLI-heavy automation: the lead on Terminal-Bench 2.1 is a concrete point in its favor here.
For long-running, unsupervised agent tasks: the model’s tendency to finish a task rather than leave it half-done pays off directly in day-to-day operational automation.
For work that demands high accuracy and where mistakes are costly: Opus 4.8 is still the safer choice; here Sonnet 5 isn’t an alternative, it’s a complement.
What does Sonnet 5 actually mean for SkyStudio?
When the picture above is mapped onto SkyStudio’s day-to-day workflows, a few areas stand out:
High-volume content production: For recurring work like changelog text, package descriptions, and multilingual “What’s New” updates, Sonnet 5’s lower cost translates directly into an advantage.
AI provider node testing and QA: Terminal-Bench leadership pays off directly in test scenarios that require terminal and tool use.
Long-running operations agents: The tendency not to leave tasks half-finished means less manual follow-up when building end-to-end automations.
For critical document and contract review, and other work where the margin for error needs to be low, Opus 4.8 is still the safer choice; it’s worth adding a verification step before handing this kind of work to Sonnet 5.
Considered together with effort tiers: running routine, high-volume work on Sonnet 5 at low/medium effort, and critical analysis on Opus 4.8 at high effort, is the most sensible way to allocate token budget according to the weight of the task.
In summary
Sonnet 5 is a release that shows a mid-tier model no longer has to live in the flagship’s shadow. It beats Opus 4.8 on terminal use, narrowly leads on knowledge work, and still trails on coding and computer use though the gap has narrowed. Pricing, meanwhile, puts the two models in almost different categories: at the introductory price valid through August 31, Sonnet 5 costs less than a fifth of Opus 4.8’s output cost.



