
When Anthropic announced Opus 4.7 in Claude, they said it was the best coding model they’ve ever released.
But the developer response was swift and frustrated: 4.7 seemed much worse than 4.6 in real coding workflows and agentic applications.
4.7 is a smarter, more capable model. But its evolving behavior means existing prompts, tools, and architectures aren’t the right fit anymore.
It’s not technically a breaking change; the API still works the same and the model can still call the same tools or generate the same output schemas. With AI models, we have a new problem – the degrading change, or when a new model performs worse inside existing scaffolding.
With model releases coming faster than ever, how should engineering teams manage degrading changes?
Software vendors have spent decades building discipline around backwards compatibility. Stripe, AWS, and Linux all have careful versioning, deprecation timelines, and contracts. A breaking change is loud. You know it's coming and what to fix.
Foundation model APIs might look like normal APIs, but they aren't. The contract used to be the API spec; now it's the behavior. New models often have different:
These are hard to measure directly, and they interact. When a new model lands, the regression is often subtle. Users might feel it before evals catch it.
Sometimes the fix might be straightforward: a few lines tweaked in a prompt get the new model back on track. Often it isn't: maybe the old system had dozens of specialized agents, while the new model works best as a single orchestrator running the whole workflow.
But it’s never a simple bump to a package version. This is the defining tax on agent companies.
API users are not foundation model companies’ top priority. The labs’ main goal is to push the frontier of model capabilities, and they’ll use whatever scaffolding is needed to maximize performance. In fact, they’re incentivized to co-develop models with their own harnesses to drive differentiation for products like Claude Code and Cowork.
The labs aren’t operating like traditional software companies, but like Google search.
Entire businesses were built on Google's search algorithm, and were periodically destroyed or created by announced and unannounced updates that changed its behavior in qualitative ways. SEO became a discipline of constant adaptation.
That is the world agent companies now live in. Frontier labs aren’t offering behavioral backwards compatibility, and structurally they can't: their differentiation comes from pushing the frontier of the models and their coupled systems. This is a permanent dynamic, not a temporary growing pain.
A new, smarter model comes out but makes your agent perform worse. The fix might be simple or might require a full system redesign. The new model might be dramatically better but only after you've rebuilt around it. Or maybe you spend months refactoring, and the difference is negligible. What should you do?
The common thread is adaptation over optimization. The best AI agent teams will separate from the pack because their systems adapt to and benefit from each model release; everyone else will spend the cycle just scrambling to catch up.
What should the labs do?
Even if it’s not their top priority, the labs can be better at supporting API customers through new model releases.
Today, guidance comes from ad-hoc tweets and developer word of mouth. Two things would help.
If you're building agent companies and thinking about these kinds of problems, I'd love to hear from you: at@theoryvc.com