The Augmented Web
The first ever website was a simple text page with hyperlinks. The early web was a collection of static text and image websites. Let's call this the content web era.
Later, we found many ways to manipulate HTML and CSS and execute Javascript. Frontend frameworks like jQuery and React and server-side frameworks like php, Django, and Ruby let us build dynamic full-stack web apps that embed business logic. These advances birthed the application web — what we have today.
The web is evolving again. This time, we're embedding generative experiences, personalization, and reasoning. We might call what comes out of this the augmented web.
I'm interested in how this changes the way we create, consume, and transact on the web. Here are some specific directions I'm excited about.
Create once, publish anyhow
The web currently has a rigid format problem.
Everything you create on the web is a fossil in its original format. If you write a text blog post, it remains a text blog post for its lifetime. If you create a video, it remains a video for its lifetime. Web content cannot be consumed in multiple formats without the creator translating manually.
Video transcription tools and text-to-speech readers have tried to fill this gap at a surface level. AI will take this further. We'll be able to create web content once and turn it into whatever format is best for distribution and consumption. For example, a browser extension will turn this blog post into a high-production video with one click or into a podcast discussing this topic. PocketPod is one product that is heading in this direction. It lets you turn any piece of content into a podcast. This simple idea has two interesting side effects:
- Content that requires your full attention becomes ambient. If you're too busy to read an interesting post, you can hear it as a podcast with a grand narrative while driving.
- Source content can be enriched with more detail in an engaging way. Imagine listening to this optimistic post on AI-powered browsing in conversation with another writer’s critique.
Rigid content formats reward media channel expertise over creativity. If you're good at using the right YouTube thumbnails and opening sound effects, you'll have more reach despite the quality of the content. If you can create something once and share it in any format, you don't have to be an expert in a particular medium to reach a bigger audience.
Generative personalization
Personalization today is limited to ads or media content rendered in a fixed app layout. To understand this, consider the prerequisites for building a fully personalized website today.
First, you must segment your users into cohorts. Then, you must imagine all the various variants of your website you want to show each cohort. Next, you build multiple components for each variant of your website separately. If you're using a frontend component framework like React, you abuse conditional rendering to display different components per cohort. You’re left with if statements scattered across your codebase that scale with the number of cohorts and user journeys you want to personalize.
As a result, you create cohorts that reflect technical feasibility rather than ones that reflect user needs. You make two or three variations of landing page text. Two or three variants of buttons for CTAs. Two or three variants of page layouts. You hope all our users fit into two or three cohorts. It's just not feasible to do more.
Until now.
Language models allow us to personalize deeper aspects like content, layout, or functionality for dynamically generated cohorts. Two things make this possible: 1) Language models are very good at writing code, and 2) they are good at understanding user intent.
These two advancements combined mean that we can selectively render bespoke user journeys based on what users are doing on our website. We can test several variations of our website and have a more granular view of what works. Coframe is a good example of a tool that lets you do this for text. I’m interested in tools that let us do this for all web page elements.
AI native web browsers
New web experiences follow new web browser capabilities.
AJAX paved the way for interactive and real-time web apps, making services like Gmail and Google Maps possible.
WebGL opened the door to immersive 3D web experiences, leading to virtual reality (VR) apps and interactive visualizations.
WebRTC led to video conferencing tools like Zoom.
WebAssembly lets web apps run as fast as native apps, making performance-sensitive apps like Figma possible.
Following this history, WebGPU will allow native AI features in the browser that possibly use local AI models. This will have two effects:
- Browsers will ship with a set of baseline AI features: These features fall into the bucket of things that don't need user context or application-specific data. For example, writing copilots will be inserted into every text box in the browser. They'll always be available; developers don't have to implement this in their apps. This is similar to how browsers natively ship with autocomplete or embedded language translation.
- Browsers go from simply displaying information to agents executing browsing actions for you. Arc Search's recent launch is a good example of this.
Better creators, better consumers
We switch back and forth between creator mode and consumer mode on the web. Both modes are more enjoyable when we create and consume content in ways that we find more natural. That is the promise of AI-powered web experiences.
The examples in this post assume that AI improves discrete layers of the stack: content, websites, and web browsers. In reality, progress at one layer interacts and compounds with progress at other layers. Ultimately, so many emergent benefits will be hard to predict today.