Skip to main content
← writing

·9 min read

What Opus 4.7 actually changes in your workflow

A 'direct upgrade' that asks you to re-tune your prompts is not a direct upgrade. The two changes that break your existing work, the four that don't, and the one they buried that matters most.

I have a Claude Code skill I run every morning. It is a small one — a /morning-sync command backed by a SKILL.md file that summarizes the state of my in-flight sessions before standup. It had not needed a change in a month. On Thursday, the day Opus 4.7 went live, it stopped working.

The skill had a line in it — "keep the summary short, under 150 words" — that I had written weeks ago and forgotten about. I had meant it as a soft upper bound. Opus 4.6 had always read it that way: summaries came back at whatever length the sessions warranted, usually two hundred words, sometimes three hundred. On Thursday, Opus 4.7 delivered 147 words. Every subsequent run stayed inside that window, to the point where I noticed sessions were getting truncated without anything being flagged.

The model was obeying me. The model had not been obeying me before, and I had not noticed. The gap between the model I had been writing instructions for and the model actually reading them had been hiding small bugs across a month of skills, prompts, and CLAUDE.md files — and Opus 4.7 uncovered them all at once.

This is the actual story of the Opus 4.7 launch. The top-line features — new effort level, new slash command, auto mode for Max — are real and they are fine. The story is that the model reads your prompts differently now. And Anthropic is calling this a direct upgrade. The launch tweet sells it as "you can hand off your hardest work with less supervision," which is true on average and misleading in the particular — because the work you hand off is mediated by prompts, and your prompts were written for the old model.

TL;DR. Opus 4.7 ships with stricter instruction-following and a new tokenizer that maps the same input to 1.0–1.35× more tokens — both require prompt re-tuning. The four new features are a new xhigh effort level, a /ultrareview slash command in Claude Code, auto mode for Max users, and API task budgets in public beta. File-system memory is quietly better, which is the upgrade that matters most.

What changed from Opus 4.6 to Opus 4.7, at a glance

ChangeTypeAction required
Stricter instruction-followingGotchaRe-tune loose prompts; literal interpretation is the default now
New tokenizer (1.0–1.35× more tokens on same input)GotchaMeasure tokens-per-request before rolling over
xhigh effort level (between high and max)NewDefault in Claude Code; drop to high for latency-sensitive paths
/ultrareview slash commandNewPro and Max get 3 free; try on a real PR before relying on it
Auto mode for Max usersNewUse only on throwaway branches or worktrees
API task budgets (public beta)NewSet a token cap for long agent runs
Improved file-system memoryQuiet upgradeAudit what your agents write to disk — 4.7 will use more of it

The two changes that break your prompts

1. Opus 4.7 obeys

Earlier Claude models interpreted instructions loosely. They read "keep it short" the way a human colleague would — as a note about your preference, not a rule. They read "don't add commentary" as aspirational. They read "use a table if it's useful" as "you decide."

Opus 4.7 reads all three literally. If you wrote "keep it short," it will keep it short to its internal notion of short, and that notion is now more aggressive than it was. If you wrote "don't add commentary," you get no commentary, even when the commentary was what made the output useful. If you wrote "use a table if it's useful," you sometimes get no table, because the model is deferring to its own judgment of usefulness — not the judgment Opus 4.6 was quietly substituting for yours.

Anthropic frames this as an improvement"prompts written for earlier models can sometimes now produce unexpected results," and users should "re-tune their prompts and harnesses accordingly." Strictly, it is an improvement. A model that follows instructions is better than a model that guesses. But every prompt you wrote before Thursday is a contract with the old model, and the new model is going to read that contract like a lawyer. Any clause you wrote without meaning is going to fire.

The fix is not to lower the effort level. The fix is to read your own prompts the way the model now reads them. If you meant a word count as a suggestion, write it as a suggestion. If you wanted commentary, ask for it. If you want a table, say so. The looseness you relied on was a bug in the old model. It has been fixed, and the fix is going to cost you a week of prompt debugging.

2. The same input is more tokens now

Opus 4.7 uses an updated tokenizer. The same input maps to somewhere between 1.0× and 1.35× more tokens, depending on content type. Code hits the high end. Natural English hits the low end.

Your API bill will go up. Not because per-token pricing changed — it did not; still $5/$25 per million — but because every request now costs more tokens than it did last week. If you are running tight to a cost ceiling, a rate-limit ceiling, or a context window that was already cramped, this is a silent degradation. You will notice it in the invoice before you notice it in the behavior.

Anthropic also notes that 4.7 "thinks more" at higher effort levels, particularly on later turns in long agent runs. Output tokens rise at high and xhigh. This is a quality-for-cost trade the company has defaulted you into. Their own testing says the net effect is favorable — token usage across all effort levels is improved on an internal coding evaluation. That may hold for your traffic too. It may not. Measure before you assume.

The right migration ritual is mechanical: run your three heaviest prompts on both models, count the tokens, project the cost across a week of real traffic, and then decide whether you need to ask Claude to be more concise, lower the effort, or tighten the prompt itself. There is no universal fix for this. There is only your traffic.

The four things that are actually new

A new effort level, one rung down from max

The reasoning effort ladder used to be low → medium → high → max. Opus 4.7 adds xhigh between high and max — more thinking than high, less than max.

This matters mostly because of Claude Code, which now defaults to xhigh for every plan. You did not opt into this. You will feel it. Responses are a little slower and a little more careful than they used to be. For hard coding, this is what you want. For "what's wrong with this file," it is overkill. If you want the old speed back, drop to high explicitly.

/ultrareview in Claude Code

A slash command that runs a dedicated review pass on your changes — reads the diff, flags bugs and design issues, behaves like a careful reviewer rather than a helpful assistant. Pro and Max users get three free uses to try it; after that it meters against normal usage.

This is the first Claude Code command to treat code review as a distinct mode, separate from editing or exploring. If you have been pasting diffs into chat with a prompt like "what's wrong with this," /ultrareview is the supported version of that ask. Use it before you push something you do not want to eat the fallout on.

Auto mode for Max

Auto mode was previously gated from Max users. It is now available to them. It lets Claude decide which commands to run without asking, so a long agent run does not pause every thirty seconds for permission. Fewer interruptions, more trust, more risk that something you did not want gets executed.

The rule here is simple and non-negotiable: auto mode on a worktree or a throwaway branch, never on a branch you cannot git reset back to. Once you have run auto mode twice on a branch you trust, you will be tempted to run it on your main, and that is the moment you lose work. Do not.

Task budgets on the API

Public beta. You can now hand Claude a token budget for a task and the model will prioritize work to fit inside it — rather than either blowing through your budget unnoticed or stopping short of finishing. If you ship agents that can run for tens of thousands of tokens, this is the knob you have been waiting for. If you do not, you will not notice it exists.

The quiet change that matters most

The four new features above are what Anthropic leads with. The thing that is going to change what is buildable on Claude this year is not in the top of the launch post. It is a line halfway down: "Opus 4.7 is better at using file system-based memory. It remembers important notes across long, multi-session work."

Every serious agent I have built has the same ceiling. It is not intelligence. It is context. An agent can think clearly for one session. The moment the session ends, it forgets everything it learned, and the next session starts from scratch — with you, the human, filling in what it should have remembered. The only way around this is to have the agent write what it learns to disk and re-read the notes on the next run. Every framework is converging on this pattern. Every developer building long-running agents lives in this pattern.

Anthropic's claim is that 4.7 uses those notes more reliably than 4.6 did. I have not run 4.7 long enough to verify this independently — it is day two — but the claim, if it holds, is the one that matters most. An agent with a CLAUDE.md and a scratch directory that actually uses them on the second run is a fundamentally different tool than one that pretends to. That is the upgrade worth stress-testing first.

It is also the only one that did not make the front of the launch post.

Migration checklist, in order

  1. Audit your prompts. Anywhere you were relying on looseness, rewrite for intent.
  2. Measure tokens-per-request on real traffic before flipping production over.
  3. Lower effort from xhigh to high on latency-sensitive paths.
  4. If you use Claude Code, try /ultrareview on a real PR before you rely on it.
  5. If you use auto mode, confine it to a branch you can throw away.
  6. If you ship long-running agents, audit what you write to disk for memory — 4.7 will use more of it than 4.6 did.

Anthropic calls Opus 4.7 a direct upgrade. It is, in the sense that the model is better at almost everything. It is not, in the sense that your prompts were contracts with the old model and the new one reads them differently. A direct upgrade that asks you to re-tune your prompts is not a direct upgrade.

Treat it as a migration. You will thank yourself in a week.

Migration FAQ

Will my Opus 4.6 prompts break on Opus 4.7? Probably not outright, but loose prompts that relied on the model interpreting generously will misfire. Opus 4.7 takes instructions more literally. Re-tune anywhere you wrote "keep it short," "don't add commentary," or "use a table if it's useful."

Will my API costs go up? Per-token pricing is unchanged at $5 / $25 per million tokens. Tokens per request are higher — the new tokenizer maps the same input to 1.0–1.35× more tokens. Output tokens also rise at high and xhigh effort. Measure on real traffic before rolling over production.

What is xhigh effort? A new reasoning effort level between high and max. Claude Code defaults to xhigh across all plans. Drop to high for latency-sensitive paths; use max when you want the model to think as hard as possible.

What does /ultrareview do in Claude Code? It runs a dedicated code-review pass on your changes and flags bugs and design issues a careful reviewer would catch. Pro and Max users get three free uses; after that it is metered.

Is auto mode safe? Auto mode lets Claude make permission decisions during long runs. It is safe on throwaway branches or worktrees. It is not safe on a repo you cannot git reset back to.

Did file-system memory get better? Yes. Opus 4.7 uses notes the agent writes to disk more reliably across multi-session runs. If you build long-running agents with a CLAUDE.md or scratch directory, that context now compounds across sessions instead of evaporating between them.

Filed underClaudeAI agentsTooling