Last month, I started a three-part series on building a mental model for AI agents. Part 1 covered the capabilities and limitations of reasoning models today and how they are going to be tireless and unreliable geniuses. This means that verification (whether programmatic or human) will be essential in AI-driven work.
Part 2 below is an exploration of agentic UX and design patterns that are working effectively in the market. It took me weeks of using and coding multiple new agents myself to write this and I also realized the following:
The quality gap between average agent products and the top agent products is widening rapidly. The fact that everyone has access to the same models means nothing.
First, a quick recap.
Starting in 2023, AI-native startups largely launched with a chat-only UX for copilots. ChatGPT’s astounding progress makes it clear that a chat-only UX is quite flexible and can go a long way. For startups that found PMF, many have been able to upgrade these to agents by simply removing the outdated components they built in 2023-24. Perplexity, Glean, Harvey, etc, are all going to be success stories built on chat-first UX.
The primary shortcomings of the chat-only approach are flow and extensibility. It’s separate from most of your existing workflows. AI-native startups need to recreate some/most of an incumbent’s workflows beyond chat to take significant market share from them. Successful coding assistant startups like Cursor and Windsurf are great examples of this. I don’t think we will still be using IDEs like we do now in a few years, but for them to get the kind of rapid adoption they have in 2025, they had to build on the VSCode standard that their end users already knew. Github nicely primed the market for them by elegantly integrating AI features into a familiar interface for developers all the way back in 2021.
The SaaS products that added AI chat (thoughtfully) to an existing complex workflow only started to work towards the latter half of 2024, as context windows started getting bigger. They work by packaging the current state of the workflow into the context window and spitting the output into a reasonably application-specific GUI. These are going to be the next wave of AI SaaS success stories and have started taking market share away from incumbents.
Lean vs Super?
Many incumbent SaaS companies are taking a different approach to agent UX. They are currently shipping what I would describe as “add-ons”. These are typically stand-alone deterministic workflows with LLM-driven tool use that best resemble OpenAI’s custom GPTs and scheduled tasks.
As a result, we have wildly different examples of solutions being launched as agents in the market today. For instance, let’s compare Zapier and a new Manus alternative called Genspark (>2M MAUs), which has allegedly crossed $35M ARR in just 45 days after their launch.
Zapier offers its customers hundreds of agents, each as granularly defined as “meeting prep agent”. Yes, this agent is an LLM using tools to interact with its environment, but it isn’t dynamically making any decisions about how to achieve its goals. The leading AI labs currently refer to these as AI workflows, not agents. I like Amjad Masad’s (Replit CEO) definition of autonomy here:
A fundamental feature of agents is that the agent needs to decide when to halt. If you have a pre-set definition of that, it’s not an agent.
On the other hand, we can see that Genspark is taking the polar opposite approach from Zapier of a single multi-agent system that serves as an all-in-one assistant that can “do anything”. Depending on the task, Genspark’s agent could autonomously kick off a simple workflow (like the Zapier meeting prep example) or a complex multi-agent system - the decision is being made by the model, not by the user or the interface.
While both of these UX approaches have their trade-offs, the latter is far more extensible and scalable than the former. What happens to customer experience when people start customizing your OOTB templated agents on their own? How do they work when there are a thousand “agents” to choose from? What about the 73 “meeting prep agents” that are slightly different versions of the same workflow and have been enthusiastically shared by their creators with the rest of their org? And how many times during the day will they need to choose yet another agent?
You can see where I am going with this.
As this industry matures, I predict two things will happen:
The explosion of narrowly defined task-specific agents, AI teammates, and agent marketplaces will cease to exist. Single-purpose agents will become automated routines and MCP/tool calls that just anticipate user needs.
Successful AI products will pick a persona and embrace multiple agentic UX patterns within the same GUI to achieve great results for that persona.
This is already happening. Three clear UX patterns are working well: collaborative, embedded, and asynchronous.
Successful AI products are already using all three of these UX modes right now within a single product to accomplish 100s of tasks instead of offering a 100 different agents to customers. Implementing the right one, for the right use case, with the right agentic design pattern, is where all the magic is! For example, Cursor has 3 primary features, each with a different agent UX pattern:
Chat/Cmd+K: collaborative, inline edit to describe what code you want
Tab complete: embedded, automatic code recommendations
Cmd+I: asynchronous, parallelized agents that run in the background
Below is a closer look at each pattern, where it’s working, and when to use it.
Type 1: Collaborative - the original chatty mode
2-way chat is ideal for situations where we (the users) don’t really know exactly what we want (else we could have just scheduled an async task), and neither can the LLM guess from the current state of the workflow what we might want.
Brainstorming, searching, planning, creating, editing, etc, are all parts of the workflow where this applies. When building this part of the agent experience, you need to optimize for low latency while still using the largest/best possible model you can so that it can understand the user’s intent and generalize well across corner cases.
The wrong way to do this is to *stay* in chat with text mode, with no ability to tweak the output directly. While this chat mode seems very prominent today, I think it will fade to <20% of the UI over time as people realize it’s not the right UX for all AI features.
Even for active research, you will notice that Perplexity doesn’t let its users stay in chat-with-text mode once they ask their first question. The output is getting richer and full of multimedia with recommended follow-up questions as a way of directing the research with embedded AI ux patterns (see next).
This is obviously where we also see voice emerge as an alternative to text, though for it to go mainstream, we need more mobile use cases of creative collaboration.
Since chat is collaborative, the agentic design patterns used here need to have low latency. ReACT/CodeACT with traditional RAG and tool use are the most common patterns utilized with a chat UX. Self-reflection, agentic RAG, and multi-agentic systems can create too much latency for a good user experience.
Type 2: Embedded - the invisible magician mode
Over time, I believe > 50% of AI will be invisibly embedded within our surviving workflows. There won’t be any prominent labels like “AI Mode” or “AI Teammate” floating around because these are low taste. (Yes, they are.)
All good software will have AI embedded in it.
Apart from the Tab Completions invented by Github Copilot, my favorite examples of embedded AI are Perplexity’s follow-up questions and Notion’s Database Autofill.
While Notion has launched many AI features, the most prominent being their “Ask AI”/Clippy cousin and the recent AI meeting notetaker, the launch that got the most resounding community applause was Database Autofill. This is a Notion feature where you can use an LLM to automatically generate page fields in a database every time you create a new page entry in it. The how-to videos created by end users racked up millions of views after this launch, and I’m sure this feature is contributing heavily to their wild 50 %+ AI attach rate for paid subscriptions.
A lot of agent<>MCP use is going to be via embedded agents, just pulling in/pushing out data from/to other systems as needed in our workflows, without us having to ask for it. I don’t think we will need an independent “automations” company like Zapier in the future because tools could become more interoperable out of the box.
Type 3: Asynchronous - the overnight workhorse
The third, most autonomous UX pattern for agents is for asynchronous, background tasks. Currently popular for deep research, scientific assistance, and some types of coding work, this has only become possible in 2025 after models became more capable of long-horizon reasoning.
I believe this is where many SaaS companies will need to come up with novel workflows and UI that simply haven’t existed before. If, instead of creating one image, my primary workflow is to choose 1/20 images that were generated by a background process, the GUI we need is novel.
Without innovation here, reviewing AI work will become a massive bottleneck in enterprise workflows and limit our productivity gains rather severely, especially in mid-sized and larger companies with higher risk aversion.
Self-reflection and multi-system agent systems are the two novel techniques utilized here, with some recent, fun controversy over whether the latter is good for coding tasks.
Picking the right UX for each feature
As I said before, the quality gap between the average AI team and the top AI teams is widening rapidly. You might think this is odd because, after all, they are all using the same models!
The user experience of an AI product is extremely sensitive to seemingly small design choices made by a product team. For instance:
How is the user engaging with the feature (ux pattern)?
How much autonomy does this UX pattern allow the agent to have? Can the agent ask clarifying questions if it needs more context to give better answers?
Can the user easily edit/review/validate the output(s) generated by the agent?
How does the agent handle memory for the next interaction?
.. the list goes on
The average AI team, especially in big companies, is caught up in giving the model as much context as possible because they believe this is their competitive advantage. But they are iterating too slowly (or not at all) on the design choices in their agent product development. So, despite using the same SOTA models, their output feels somewhat dumb.
Right now, 2025-era agentic software development with reasoning models is quickly disrupting 2024-era LLM software development. Unless you started with a very simple chat interface, you probably need to delete/rewrite most of the software you built in 2024 and before.
Andrej Karpathy said this well in his YC talk yesterday about how Software is Changing Again.
There are 3 different software paradigms that we are developing products in - in parallel - that you need to be fluent in. Are you training a neural net? Are you prompting an LLM? Are you writing explicit code?
While he wasn’t talking about agent UX patterns, the analogy works.
You need a product team that actually understands the capabilties of the underlying model and is constantly experimenting to see what works to make the decision about which programming technique, which UX pattern, which agent design technique is going to be the best way to deliver each single feature to the customer.
There is a night and day difference between the teams who pore over their product’s failure modes and get 1% better every day vs those still debating whether “prompt engineer” should be a new job title in the orgnaization.
There is a night and day difference between the teams that invest in quantifying every possible aspect of their customer’s work and their product’s quality and those who still haven’t invested in anything more than basic evals for their AI apps.
Turns out, the fact that the everyone has access to the same models might mean a big fat nothing in applied AI.
Can you share examples of which SaaS products have added AI chat (thoughtfully) to an existing complex workflow ?