System 1 vs. System 2 automation

The biggest bottleneck to the adoption of LLM-based assistants is not model performance but product design

AI assistants today are trying to solve two classes of automation:

  • System 1 automation: the end state is easily verified by human intuition
  • System 2 automation: the end state is hard to confirm with human intuition

Let me illustrate with two popular examples.

The first is scheduling, e.g., find a time to meet with Joe next week on a weekday morning. This is an example of System 1 automation. You can quickly tell the automation is "correct" when an invite appears on your calendar.

On the other hand, suppose you wanted to automate planning for your next vacation. You give an AI assistant some dates and a budget and ask it to do that for you. This task is an example of System 2 automation. There is a chain of decisions that each require a verification step. If you skip to the end, there is no easy way to verify the output.

Complex automations need interactive experiences

When Kahneman popularized System 1 and System 2 thinking, he was, of course, referring to how our brains work — intuitively or deliberately. This lens is helpful for thinking about how we design AI assistants. Well-designed AI assistants should blend seamlessly into our lives and mental functions. To do this, we must account for cognitive load.

System 1 automation — like automated scheduling — has a lower cognitive load. Its execution is transactional. Some prompt leads to some output, and the automation is done. There’s not a lot of mental effort required to scrutinize the execution. End-to-end automation via a text interface is a fine UX for this.

On the other hand, we need a more interactive interface for complex automation. Let's go back to our vacation planning example. We can think of a few sub-automations that deserve a standalone experience. One is discovery. Given the constraints of my vacation dates and budget, an AI assistant should be able to help me discover some top options. The other automation task in this example is calling the right APIs to book flights and hotels once a user has settled on some option — the execution step. Decoupling discovery from execution is one way to make complex automation less mentally taxing on users. It’s also a great way to build unique and engaging user experiences. For example, what would a vacation planning assistant that automated discovery via a visual feed look like?

I'm still thinking through other examples. However, I suspect the most successful and engaging AI assistants will go beyond our current chatbot and plugins paradigm and embrace more contextual interactive experiences.

LLMs are good enough. Our design affordances are not.

The promise of AutoGPT was that you could chain together a series of LLM calls to accomplish some goal. On the surface, this is the perfect implementation for complex automations.

However, AutoGPT still needs to live up to the hype. An uninterrupted chain of unsupervised LLM calls might not be the best way to design complex automations. The error rates at each step compound, and the final output is unusable. These error rates will get better as large language models (LLMs) get better and we build more guardrails. Nevertheless, considering various benchmarks of LLM performance, it turns out that LLMs are already good enough for most economically useful tasks. It is our design affordances that need to catch up.