Stay up-to-date on our team's latest theses and research
The most successful products in the internet age have all been built around personalization. Tiktok’s feed, Netflix’s recommendations, and Spotify’s discovery playlists are so powerful because they provide truly unique experiences for billions of different people.
For most businesses, the most common interaction they have with their customers is outreach – emails, texts, push notifications. But the vast majority of these lack any real personalization. Marketing and customer lifecycle teams blast out effectively the same message to thousands or millions of users, knowing that only a tiny fraction will respond.
This is why we are thrilled to lead the Series A for Aampe, which has built the first agentic infrastructure for personalization. For each customer, millions of agents explore different personalization strategies to determine the most effective one for each user. As new preferences and behaviors emerge, Aampe agents evolve to reflect them. This has a dramatic impact on the business metrics teams care about, allowing them to focus on personalization strategy and content creation instead of execution.
Last week, we wrote about jobs that AIs do better than humans. Personalization is a prime example of this, as AI can make millions of data-driven decisions in ways people simply can't.
Marketing and customer lifecycle teams are experts at understanding customer needs and figuring out how to best connect with them. But even the largest and most skilled team in the world couldn’t possibly hand-write the millions of messages they send each day. This forces them to build rules-based journeys that decide what to send to whom. This is the welcome sequence we’ll share with every new user. Here are a set of reminders we’ll send to every person who leaves an item in their cart.
The problem with journeys is that they force you to design for the typical user. But of course, every user is different. Even if you correctly identify and speak to the most “regular” groups, you won’t connect with all your other users with different preferences.
Existing platforms integrate AI into the rules-based journey. An email on Black Friday sales might highlight a product category a customer frequently peruses. That is a step in the right direction, but only a small one. Real personalization is multivariate: Do they want to buy the same product they bought earlier, or something complementary? Do they care more about value, convenience, or quality? Do they prefer email or text? Messages that are short and to the point, funny, or full of pictures? A couple reminders, or to be left alone after one?
The action space is enormous and further complicated by several factors:
It’s clear why even with the best tooling and team, it is impossible to do broad-scale personalization with rules-based automation.
Aampe builds AI agents that make 1:1 personalization decisions for each individual user. They connect into existing data sources (e.g. customer, content, and inventory management systems) to understand a company’s products and customers. Next, marketing teams craft content. Instead of perfecting a single message to blast out to everyone, they can experiment with dozens of different offers, messages, and tones, creating tens of thousands of combinations that could resonate with different audiences.
Finally, Aampe agents explore the actions they can take – choosing what to say, how to say it, when, and via what channel. They experiment over time via reinforcement learning, deciphering what strategies are most effective for a given user at a given point in time. While traditional customer engagement platforms typically only track click-through rates, Aampe tracks hundreds of product and business outcomes. For example, the agents might notice that a message generates a lot of clicks but no purchases. Another might not prompt an immediate response but substantially boost retention over time. Across large-scale consumer apps, Aaampe has demonstrated double-digit improvement in business metrics like retention and transaction volume compared to previous engagement strategies
Building these types of systems is extremely hard, and traditionally has been done only in the Amazons and Netflixes of the world. Aampe founders Paul Meinshausen, Schaun Wheeler, and Sami Abboud are a rare group, combining expertise in both large-scale data and quantitative social sciences, with experience building similar infrastructure across industries – from defense to consumer applications.
Deciding how to engage with customers via email, text, and SMS is an easy place for AI agents to add value over rules-based journeys. But Aampe’s agentic infrastructure is much broader than that.
Already, Aampe is starting to personalize in-app experiences, another area where one-size-all approaches leave customers dissatisfied.
They are also the first personalization company to generate valuable new data on customers. Unlike existing customer engagement platforms, Aampe (1) knows what kind of content is in each message, and (2) runs continuous experiments matching messages to users. This experimentation not only improves personalization, but also creates a rich new dataset of user preferences and behaviors that is valuable for other parts of the organization — product, data science/ML, merchandising, etc.
Aampe identifies which customers prefer certain products, respond to discounts, or resonate with particular value propositions. For example, they have discovered brand new user segments that were previously unknown to customer teams (e.g. late night snackers).
Aampe is leading the charge on agentic infrastructure as a core technology underpinning the future of consumer applications. We are so excited to partner with them and lead the Series A, with participation from Z47 (Matrix Partners India).
AI systems are typically evaluated with humans as the gold standard. How many college-level math problems can it solve? How many medical questions can it answer? How accurately can it extract information from a contract or purchase order?
Even as LLMs become superhuman test-takers, it’s clear they still can’t reliably do many actual jobs. Most of the thought and work going into AI applications is oriented towards building value with a system that’s only ~80% as good as a human. How do you limit AI to tasks that it can do more reliably? How do you incorporate humans to check AI activities? This is critical, and drives our research in many areas of AI software.
But there are also jobs where AI systems are structurally better than humans. For these, AI won’t be struggling to keep up with the average employee; it will be many multiples better than any person could be. And while automation in general can reduce costs, integrating AI into jobs they do better than humans can create new company capabilities, improve customer experience, and drive revenue.
We think these categories are particularly ripe for new AI software startups.
No matter how smart humans are, they are fundamentally limited in how much they can do. You can only read so many pages, or draft so many emails.
For jobs that require high throughput, humans must build systems to make decisions in aggregate. They might just search for specific keywords across those pages, or use rules-based decision trees to select from pre-drafted email responses. Perhaps a fraction of the tasks are escalated to human review.
Regardless of how carefully you draft your email campaigns or how many rules are in your decision tree, we know that these systems will always fail. Every page or email is slightly different. While humans can handle that variability, the rules-based systems they implement cannot.
In an earlier post, we described LLMs as an infinite supply of near-free interns.
Like a new intern, the tasks they do should be straightforward. That could be because the task:
When tasks are simple, imagine how much work an infinite number of these AI interns can do. They can process 10,000 pages just as easily as they process 100. They can operate 24/7, without getting tired or bored. While there can be issues with hallucinations, when provided with contextual information they are less likely than a human to forget a name or make a typo.
Perhaps most important of all, the AI interns look at each task and apply human-like reasoning independently for each one. For high-throughput jobs, the alternative to an AI doing a task is not actually a human doing it: it’s a legacy rules-based system, or nobody doing it at all.
What will happen to these jobs that AI does best? As discussed in a previous post, they won’t go away entirely, but could dramatically change in scope. Generally, they will uplevel – from execution to orchestration, or first-pass to escalation/review. Without entry-level work, it may be challenging to onboard and train talent.
Security operations
Analysts in security operations centers (SOCs) get overwhelmed by a deluge of alerts generated by their detection tools. Often there are dozens of near-identical ones; each could take the better part of an hour to investigate fully. To stay afloat, they compile rules-based filters and playbooks, even though they know these let things slip through the cracks.
Dropzone AI, a Theory portfolio company, has shown that agentic systems can replicate manual investigations. Their AI systems have expert skillsets – for example, they have deep knowledge of dozens of tool-specific querying languages. But they wouldn’t need to be the best analyst in the world to have a massive impact. The fact that AI agents can review each individual alert, in minutes, at any time of day or night, and with perfect memory, is a dramatic step change from the small fraction of alerts that get reviewed (often far too late) today.
Customer engagement
Customer engagement platforms like Salesforce and Braze send billions of texts, notifications, and emails per week. Of course, it would be impossible for a human to write each one of them. Instead, marketing teams must draw out rules-based journeys. This is the sequence we’ll send to new users. Here’s what we’ll do when a customer leaves something in their cart. But every user is different. Even if you could identify and define micro cohorts/segments, it’s just not possible to manage many thousands of different messaging campaigns at once.
AI agents don’t have this constraint. They can use millions of different strategies for millions of users, experimenting with content, channel, and timing. It doesn’t matter how perfectly crafted a message is if it’s directed at the wrong person – agents that personalize messages for each individual are much more likely to find the specific attributes that drive business outcomes. We’ll share more on this space next week!
Investment research
In investment research, ideas mean nothing unless they can be translated into actions. Say you want to invest in businesses that will benefit from increased AI usage. Of course, Microsoft and NVIDIA will be on the list, but there are scores more companies along the value chain – datacenter component manufacturers, system integrators, REITs, etc. Researching this thesis would require analysts comb through thousands of pages of documents and create massively complex financial models.
Human analysts can cover a small number of companies and race to update models when new earnings reports drop. AI analysts can easily screen hundreds of companies on an ongoing basis, just as fast/accurately as they could scan one. We expect this will dramatically change how investment firms operate, allowing firms to systematize qualitative strategies in the way they run quant strategies today.
Supply chain operations
Supply chain organizations manage hundreds of vendors supplying thousands of goods and services (if not more). The challenging cognitive work is dealing with inevitable problems that arrive daily, and figuring out how to optimize procurement/logistics over time. But most of the day-to-day work is data collection and relationship management. Each day, professionals spend hours tracking status updates, copying numbers, reviewing RFP responses, and matching invoices.
AI systems can easily maintain thousands of email conversations at once. They can instantly read through lengthy PDFs and spreadsheets and extract just the relevant information. In addition to freeing up time for humans to focus on more important work, they will enable new strategic capabilities – like dramatically expanding the frequency and scope of RFPs, or proactively monitoring and alerting for supplier issues.
—
If you’re building automation for jobs where AI has a structural advantage, we’d love to hear from you! Reach out to at@theory.ventures.
Recent online discussions have raised alarms about “the end of scaling laws” and a plateau in LLM performance.
The main catalyst for these musings was a Reuters article with anecdotes of disappointing performance from frontier models in development, along with quotes from experts in the field like Ilya Sutskever. In parallel, some great research has demonstrated the limits of scaling effective compute via quantization/low-precision training.
It is certainly possible that we are starting to see diminishing returns directly scaling foundation model pre-training, though there isn’t enough evidence to say for sure.
But even if true, we have high confidence that AI application capabilities will continue to expand dramatically in the coming years. Foundation model progress slowing would not impact the prospects for teams building AI products. In fact, it might benefit them.
Foundation models like GPT, Gemini, and Claude are pre-trained with massive amounts of compute. Over the past 2 years, scaling these training runs with more compute (via more data and/or more model parameters) has resulted in improvements in capabilities across the board. When people refer to the end of scaling, they mean that we will see diminishing returns as training runs get larger and larger.
But when you think about what AI applications can actually do, the pre-trained base model is only part of the picture. Capabilities will depend just as much on:
Surrounding systems & infrastructure: We have long believed that LLMs are only a small part of AI applications. LLM inference endpoints are surrounded by systems to (1) find and retrieve relevant information, (2) orchestrate and execute actions, and (3) integrate responses into the broader application/interface. In each of these areas, we see rapid development and emerging best practices.
Today’s LLMs are capable enough to power years of new AI applications and workflow automations, as surrounding infrastructure improves. For example, a workplace assistant will answer questions more accurately as its internal search/retrieval system improves. It will be more functional when deeply integrated with every enterprise tool, and easier to use as interfaces for human-AI collaboration mature.
Domain-specific reasoning data: As discussed in more depth in our previous blog post, OpenAI’s o1 model showed the massive impact of adding reasoning data. For an LLM, human reasoning is just another data distribution, no different from basketball stats or European history. The problem is that we typically don’t write down the inner monologue of our reasoning and all the assumptions behind it. OpenAI generated a lot of math, physics, and general reasoning data; but consider the wealth of domain-specific data yet to be harnessed. How does a security analyst work through solving a problem? What about a software engineer, accountant, or lawyer?
Building a dataset of collected and synthetic reasoning data dramatically increases today’s LLMs’ performance in that application. Out of the box, a foundation model might get stuck on a particularly difficult security investigation. But when augmented (via fine-tuning or in-context learning) with thousands of analyses done by security experts, it will be able to reason through more and more complex ones on its own.
Inference-time scaling laws: OpenAI’s o1 announcement also formalized inference-time scaling laws. Giving a model more time to try different options, evaluate paths, and iterate on its responses substantially improved its ability to find the right answer. This is particularly true in applications with complex multi-step reasoning or tool calls.
Many use cases will involve multi-step reasoning – say you want an AI assistant to analyze some data, or find and book a restaurant. Inference time search/scaling improvements will make these work more reliably, so long as it’s possible for a model to do it in the first place.
Model cost & speed: Applications today are often limited by model cost and latency. As the cost and latency of model inference continues to drop precipitously, developers can step up to larger model sizes, iterate more during inference, or create new product experiences that wouldn’t be feasible today.
A huge number of AI applications will be offered for free because model costs are so low. Other applications will be able to process more data, or take more tries to get to an answer, because they can iterate hundreds of times cheaply and quickly.
No matter the job, many of the fundamental tasks we do are simple enough that today’s LLMs can do them. With levers to pull in application infra, domain-specific data, and inference-time compute, there is massive headroom to continue to expand the capabilities of AI applications – even if foundation model progress were to halt today.
We expect to see no slowdown in the number of new AI applications or new capabilities in those domains. For the vast majority of companies building AI apps, incremental improvement on today’s models will be sufficient to build nearly anything they want to.
(Note: There are a small number of extremely difficult frontier tasks, like general software engineering. These companies may face more risk if foundation model progress slows.)
If anything, a plateau in foundation model performance is a net positive for app developers: it decreases the risk that competitive advantages are obviated by the next generation of models. Investments in complex engineered systems and domain-specific data will be more durable moats. Companies can focus on these areas and benefit from rapidly dropping inference costs.
We remain as excited as ever about the next generation of AI applications, and would love to hear from anyone working on them — please reach out to at@theory.ventures.