Stay up-to-date on our team's latest theses and research
Until recently, software was crafted by hand. Engineers wrote code like sculptors, painstakingly shaping and smoothing it into its final form.
LLM-powered coding agents replaced this process with something closer to 3D printing. Describe the object you want, and one pops out. The craft shifted from hand-sculpting to specification: tell the agent what to build, and it builds it.
But the future of software production is not 10 engineers standing in front of 10 3D printers. It’s not even 10 engineers managing 100 3D printers.
It’s something more radical: AI engineers will be managed and optimized by AI managers – potentially multiple levels of them. Humans won’t be in the production flow. Instead, they’ll be leaders and maintenance crews for self-optimizing agent factories.
The best companies are already starting to work this way. What does it mean for the industry?
The first impact of coding agents was turning software engineers into engineering managers/tech leads. Instead of writing code, they communicate goals in natural language, then provide feedback and review outputs as AI agents do the building.
Talented engineers have pushed this even further; they might manage 5-6 concurrent agents today. But this is naturally limited by the cognitive load of context-switching. Better agent management and collaboration platforms might increase that ratio by a factor of two. How could we increase our leverage by an order of magnitude, or more? Human managers won’t cut it.
The most recent generation of models have demonstrated they can not not only do the work, but plan and manage it.
Instead of writing prompts and reviewing outputs themselves, the best engineers are now moving even further up the stack. They're creating manager agents who plan work, orchestrate builder agents, review their outputs, suggest changes, and update prompts – all on their own.
The shift is striking. These engineers are typically not looking at code. They're often not even looking at prompts used to generate the code. They're operating at a higher layer of abstraction: defining goals, setting constraints, and evaluating outcomes. The actual work can then be done by dozens or even hundreds of agents at once.
How far up the management ladder will AI climb?
We don't see fully lights-off AI software factories in the near future. But the roles that remain for people will look very different from today's org chart, emerging as two key categories.
Senior leadership: Even as AI can analyze data and research best practices, strategic decision-making often sits in difficult gray areas. People will play a role here for some time; both because they can make these decisions effectively, and because we’ll want human accountability for them.
A lot of this work will be product-oriented: people can sit face-to-face with customers to hear the nuance in their requests, understand where analytics data might be biased, or use a vision to inform decisions when there is no data at all.
There are also strategic decisions in core engineering: for example, whether to architect a system for simplicity and development speed, or to build it for large scale from the start.
Maintenance and support staff: To enable factory-scale operations, agents will need new infrastructure to enable closed-loop development, iteration, and deployment.
Last week I wrote about sandbox environments, one component of the agent-native DevOps stack. There’s a lot more that agent factories will need: git rethought for agent-scale throughput; sophisticated experimentation platforms; and underlying data pipelines, orchestration, and inference infrastructure.
There will be a number of roles for humans to design the plumbing and wiring for their agentic workforce (though agents will undoubtedly help build it).
We are entering unprecedented times. We will no longer build things directly; we will set up and support mostly-autonomous, non-deterministic systems, guiding them to our desired outcomes.
The impacts will be dramatic. Startups will move much more quickly: we see portfolio companies deploying small, highly-autonomous teams of 2-3 people, building multiple products in parallel that each could have staffed 10 people just a year ago. And as Tomasz wrote about last week, incumbents will restructure too: Block led the way with a nearly 50% reduction of their team.
As companies are evolving to this new state, there will be a huge gap in productivity between average and best-in-class teams. The best ones have three things.
On the last point, the next generation of AI infrastructure is clear to us: it’s the tooling required for agent factories to run effectively at massive scale. We will share more of our theories here in the coming months!
If you're thinking about or building for this new world, I’d love to hear from you: at@theoryvc.com.
It’s becoming clear that AI coding agents will do much more than draft PRs and ship webapps to localhost. They will be able to refactor your authentication stack, migrate a database schema, or deploy a microservice.
Models are quickly getting there: even smaller models are excellent at multi-step planning and actions across code, infrastructure, and live systems.
But they don’t get it right every time. Agents iterate, explore dead ends, and sometimes make mistakes. That’s fine in a sandbox. It’s terrifying in production.
For code, we already have guardrails: version-controlled git repos, CI/CD pipelines, unit tests. But what happens when agents need to interact with real data? With a deployed product your customers depend on? With sprawling cloud infrastructure?
Git and offline evals won’t cut it. We need entire isolated worlds for agents to operate: sophisticated sandbox environments that mirror complex production systems, where mistakes can be made freely.
We think this represents one of the most exciting infrastructure opportunities as agentic development matures.
Human developers have used sandboxes for decades. These are environments designed to be:
But agents work differently from humans, so their sandboxes will need to be different too. We envision agent sandboxes must also be:
1. Stateful: Existing DevOps tools like Terraform or Kubernetes are great at spinning up the blueprint of an application: new servers, empty or mocked databases, etc. But for an agent tasked with fixing a multi-step checkout bug or migrating a legacy database, a blueprint of an application is useless. They need to interact with a real, stateful environment: manipulating real data, messages in a Kafka queue, etc. Current pipelines cannot generate massive, interconnected, stateful environments; we need new infrastructure that makes state as branchable and deployable as code.
2. Scalable: Human developers will work for hours, then take a 15-minute coffee break while their preview environment builds. Agent developers may want a new environment every minute as they iterate on a solution, or even to spin up multiple environments at once to test a number of different solutions. Agent sandboxes need to be instantly available, lightweight, and highly concurrent.
3. Child-proof: Traditional staging environments are not fully isolated; they might rely on human developers to know not to run a script sending 10,000 test emails to real customers, or hitting the Stripe API 10,000 times. We cannot rely on agents to have this intuition. A “child-proof” agent sandbox will need to intercept outbound API calls and synthesize realistic responses; containing the agent’s blast radius so it truly cannot impact the real world.
4. Machine-readable: When a human deploys code, they might click around a Vercel preview UI, look at line charts in Datadog, and read through logs. Agents will want much more structured telemetry in their environment: in addition to rendering the UI, they might receive a structured event stream, descriptive state diffs, and detailed network payloads.
AI won’t just make software engineers faster. It will collapse the boundaries between engineering, product, and design. We’re seeing early signals of this already: with Claude Code, some designers can inspect a front end and make simple changes to the codebase themselves.
But rendering a UI in a local environment (or looking at a hosted Vercel preview) isn’t enough. In the future, we expect agents and humans will build together on deployed full-stack products: collaborating on code, interfaces, and data in a coherent, shared development environment.
This will not just democratize and accelerate product development, but also enable new types of learning: agents can take signals from real usage and associate them directly with product components, observing the impact of changes and using those to guide further iterations (potentially even simulating user behavior). This will shift product evolution from a slow, build-wait-measure cycle to an ongoing loop of learning and refinement.
Working with data is fundamentally harder than working with code. Code is text: it’s easy to branch it, diff it, and roll it back if needed. With data, scale alone makes duplication impractical or impossible. Reprocessing is slow and expensive. And mistakes can be devastating: if an agent drops a critical table, you can’t just revert it.
Even experienced data engineers struggle with the operational complexity: understanding what’s current, running backfills, managing migrations. It’s no surprise that production data is one of the last things any enterprise would let an AI agent touch.
Agents will need infrastructure that lets them work with real data in an environment that feels like production but doesn’t impact live systems, and can be tested, version-controlled, and reversed like code.
Enterprises operate with sprawling webs of infrastructure and external tools/services: systems of record, ticketing systems, identity providers, cloud consoles, communication platforms, each with its own APIs, permissions, and data models.
For agents to explore, learn, and improve in these environments, they need sandboxes that replicate the full complexity. This means spinning up real infrastructure, connecting to real or mocked services, and even potentially rendering tool UIs.
Two of our AI security portfolio companies, Dropzone and Maze, have already invested heavily in this capability: their engineering teams can programmatically spin up hundreds of realistic enterprise environments (with first and third-party platforms) on demand to train, test, and iterate. This isn’t a nice-to-have; it’s core infrastructure for building & testing reliable agents.
We see analogous opportunities across enterprise domains. Whether it’s operational/supply chain platforms, IT/SRE automation, or sales/GTM teams, any domain where agents need to interact with complex, multi-tool environments will need this kind of simulation infrastructure.
—
The best agents won’t just be smarter. They’ll have better worlds to practice in. The companies building these simulation layers will become foundational infrastructure for the agentic era, just as CI/CD and preview deploys became foundational for human-driven development.
If you’re building agent environments, simulation infrastructure, or sandboxing tools, I’d love to hear from you: at@theoryvc.com.
Thanks to Cris Dobbins for feedback on this piece.