Stay up-to-date on our team's latest theses and research
I think about LLMs as creating a near-infinite supply of near-free interns. While they don’t have years of domain-specific expertise, they can be phenomenal at getting tasks done. Ask an LLM to summarize a memo or draft an email, and you could easily confuse its responses with a human’s.
How will businesses evolve when this is the case?
It’s clear that many workflows will change dramatically. But LLMs aren’t well-suited for every application. A sales team can gain real superpowers using LLMs to automate outreach. But a loan officer would find it unwise (not to mention illegal) to use an LLM to adjudicate home loans.
In our last blog, we explored how data moats will change with LLMs. What other criteria are important to identify the best business use cases for an LLM? As a founder or product leader, deciding which problem to tackle often comes from intuition and subject matter expertise. But we find probing the below six questions in depth can help test and pitch an LLM application:
Note: There are many other questions that are important to answer for a B2B software business – market size, willingness to pay, founder experience, etc. This post focuses on questions specific to LLM systems. But the other ones are important too!
LLMs can make many jobs easier. If email automation saves me 15 minutes a day, I’d be happy. But solving a minor inconvenience does not make a big business.
What workflows have a burning need for LLM-based automation? Here are four key archetypes we’ve seen we’ve seen:
In the longer term, LLMs will enable totally new workflows that aren’t possible today. A sales automation platform might respond immediately to inbound requests with a customized, interactive pitch. A security automation platform might detect a vulnerability and immediately patch it or update configurations.
After identifying a workflow, the first question to test is simple – can an LLM do the job?
First, we look at each of the specific tasks in a workflow. Is it composed of tasks LLMs do well? Or does it require steps that we know LLMs can’t do reliably yet?
Tasks LLMs can do well are:

Today, LLMs are generally not great at :

LLM behaviors are unpredictable. Because LLM output tokens are sampled probabilistically, the same input will generate a different response each time. And if the input data varies slightly – query to query or over time – a model may fail at a task it once performed well.
While LLMs can do many business tasks reliably, they’re never going to work 100% of the time. For any LLM application, it’s critical to understand how a workflow will respond to a failure.
Consider these two potential LLM applications in marketing:
What workflow characteristics make it more likely an LLM can do them reliably?
1. For complex applications, the best LLM systems will have expert-defined workflows and orchestration. These systems use LLMs for what they’re best at, with logic, prompting, information retrieval, and external models handling the rest.
An LLM tasked to simply “research this company” will produce something, but of variable form and quantity.
Instead, a well-designed LLM workflow will have an orchestrating system with dozens or hundreds of points of guidance. “Extract data from this financial statement.” “If margins decreased by more than 5%, search call transcripts for mentions of costs.” “Look at competitors’ annual reports for mention of the target company.” And so on.
2. Workflows are more reliable when an LLM can be validated programmatically. Code can be evaluated for syntactic correctness or run to test for compilation issues. Workflows can build in heuristics-based guidelines, e.g., “the extracted value should be a dollar amount between $1,000 and $100,000” or “the report should be between 150-200 words.” Last, another LLM with specific prompting can judge if the model output passes muster.
External validation makes sure that LLM outputs won’t break other parts of the system. The same feedback can also be given to the LLM to self-correct its output.
3. Many business applications will benefit from fine-tuning or custom model development. This can help provide domain-specific knowledge/vocabulary and ensure outputs are properly formatted. Reinforcement learning from human feedback, another type of fine-tuning, can help a model understand that a workflow benefits from detailed responses versus a vague one.
What makes a workflow more fault-tolerant to an LLM failure?
1. A workflow with natural opportunities for human input while maintaining the benefits of automation. These workflows typically:
2. A workflow with relatively low business cost of making a mistake. Any use case that increases liability (e.g., loan or medical decisions) or has a high direct cost (e.g. payment automation) is not well-suited for LLMs.
Workflows with a low cost of a mistake typically have a way of remedying an error. A user talking to a customer success bot can always call for a human agent if the chatbot isn’t helpful.
As foundation model providers expand capabilities, they will serve more business workflows that are simple and globally applicable. Use cases like simple document summarization will become commoditized.
We look for two characteristics that indicate a workflow will best be served by an application-specific team:
1. The application is complex and domain-specific. To do it well requires a web of workflow-specific inference and information retrieval logic, as described above.
2. The application requires use case-specific context data, e.g. historical user activity or integration with third-party systems
On the other side of the competitive spectrum, some LLM features will be best served by incumbent software providers. New entrants face an unfair playing field as incumbents often have distribution, relationship, and data advantages.
What makes an opportunity where new entrants should win? We again look for two primary drivers:
1. An LLM-powered workflow is so different from what exists today, it will require an entire re-architecting of the product to support it. The incumbent’s existing product, infrastructure, and data aren’t relevant in the new paradigm.
Example: Accounting is dominated by decades-old software where users click through legacy ERP systems. LLM-enabled accounting software will require new types of user interfaces and workflows that don’t fit into legacy software. Legacy ERPs have lots of historical data, but not on the new user behaviors that would emerge in an assistant/automated paradigm.
2. The incumbents don’t have the talent or the incentive to build an effective LLM-based system. LLM-based automation might cannibalize existing revenue streams or decrease competitive moats. Mature companies may not have the talent to redesign the product for a new paradigm.
Example: A product that automatically fixes security vulnerabilities in application code for enterprises should be platform-agnostic, to work with the variety of vulnerability scanners and development platforms. Existing platforms might build in some automation features, but are unlikely to make a neutral application that could commoditize their current product.
As discussed in our last post, LLMs change the shape of data moats.
Companies can no longer assume their model will remain a defensible asset.
Instead, the best LLM applications will provide the opportunity to build data moats in the surrounding LLM stack:
LLM applications will also benefit from traditional product moats, such as:
LLMs provide a powerful new building block for business applications. It’s tempting to try to apply them everywhere.
But there are a specific set of characteristics that make a B2B workflow well-suited to a new LLM application. The workflow must be one that LLMs can do well, and be a good fit for LLM’s non-deterministic behavior. It must be novel and complex enough that it won’t be served by a general LLM platform or incumbent software provider. And it must allow for system and product differentiation to grow over time.
In upcoming blog posts, we’ll start to dive into specific application areas we’re excited about.
If you’re building an LLM-powered B2B application, we’d love to hear from you at info@theory.ventures.
Introduction
LLMs will all but eliminate the moat of a proprietary ML model.
ML companies have typically built a business with their model as a moat. Create a product, collect data, improve the model, repeat. With this flywheel, competitors can never catch up.
Companies building LLM applications will not be able to create truly defensible models. Instead, they should focus on data moats in the surrounding LLM system and infrastructure.
The power of modern foundation models like LLMs is that a single massive pre-training run (sponsored by a big tech company or well-funded startup) allows them to do a wide variety of things well out of the box. With a small number of additional examples, these models can be refined to do even more specialized jobs.
When you’re training a model from scratch, the more data you have, the better – the model has a lot to learn.
With a foundation model, you’re starting much further up the curve. There’s less room to improve – and less distance for others to make up.

Fine-tuning is and will remain important to refine how a model responds to a query – whether it should be concise, use certain language, or be in a specific structured format. But it will not be a durable advantage that grows over time.
Many business workflows are composed of simple tasks, like data entry or document summarization. LLMs can do these jobs well enough off the shelf or with limited fine-tuning, and there will be diminishing returns from additional data.
Use cases that challenge today’s LLMs will benefit from larger amounts of fine-tuning data. But with each new generation of model, LLMs handle more tasks with ease. These moats could be destroyed at any time as models progress.
The new data moats in LLM systems
As explored in the LLM infrastructure stack, the model is just one small piece of a broader system that will be necessary for complex LLM applications.
While maintaining model superiority is no longer a sure bet, there are new opportunities to build data moats in other parts of the LLM stack.
Context and retrieval systems
For the foreseeable future, LLMs will rely on external systems to serve them with relevant information at inference. A support chatbot will be provided with a customer’s order history. A financial research analysis platform will pull in relevant company filings.
As LLM systems mature, context and retrieval systems will be just as, if not more, important than the models themselves. Data moats will provide huge advantages in this space:
Orchestration and workflow design
LLMs are great when instructed to do a simple knowledge task. They’re not as good at being their own boss and deciding what to do next.
LLM applications will rely on traditional software, logic, and other learned models to coordinate their broader workflows.
Imagine an LLM data analyst. You could instruct it to “show me a chart of our active users.” It might generate a reasonable chart, but it would be hard to tell if the result is accurate.
A well-designed workflow might instruct the LLM to complete a series of discrete tasks. First, evaluate the request and identify which table it’s asking about. Next, select the relevant metric from a predefined list. Third, generate a SQL query (via an LLM or other SQL generation tool) and return the results. Last, plot and format them.
Proprietary data will allow companies to improve these systems over time. Usage data might inform new rules-based logic or learned helper models to help guide an LLM workflow. Historical queries can be used to optimize prompts, caching, or model choice.
Handling edge cases as a competitive advantage
LLM systems in production will be complex systems almost like living organisms.
Today, engineering organizations are designed to handle deterministic software. Bugs and new features get ticketed, and then an engineer creates a PR to fix them. Companies that have ML teams have a separate workflow where they compile data, update the model, and backtest it.
Companies building LLM systems will need new workflows to handle non-deterministic edge cases and shape an LLM’s behavior over time. They’ll trace each interaction throughout the infra stack to understand what went into each request and response. Resolving an issue might require fixing a retrieval system, re-engineering a prompt, fine-tuning the model, or improving the orchestration system. These might all be done by different roles in an organization.
Building data-informed systems to deal with edge cases will be a key differentiator over time. This workflow is most similar to historical ML moats. But instead of putting data towards model re-training, it will drive a more holistic process to improve the broader LLM system.
Conclusion
Data moats in foundation model applications will look very different from previous generations of applied ML systems.
We believe the strongest and most durable ones over time will not be fixated on maintaining the best-performing model over time.
Instead, founders should take a high-performing model as a given, and try to build data moats in everything surrounding the model – the systems that tell the model what to do, provide it with data, and monitor its outputs. Operational data accelerates the capabilities of each of these systems, increasing defensibility over time.
If you’re building an LLM application and are thinking about data moats we’d love to hear from you at info@theory.ventures.