Post-training architecture is differentiated IP

Why do similar AI products feel qualitatively different despite using the same models? Consider the Zoom transcription apps: Granola feels qualitatively better than competitors. Also, consider the code-gen tools: Cursor and Windsurf both have qualitative quirks. It's not a UX difference. The apps look similar; they just perform differently.

One explanation is that post-training architecture is differentiated intellectual property (IP). Teams make many tiny decisions about RAG, function calling, routing, chain-of-thought, evals, etc., which congeal into a unique user experience.

What model you’re using matters less than how you’re using it. This is a shift from just over a year ago when certain startups pitched early access to GPT-4 as their competitive advantage, and there was a frenzy to get access to the latest, biggest model. That now feels quaint. A legitimate differentiation pitch today would likely have some story about how your post-training architecture lets you build a 10x better experience.

Some thoughts that follow from post-training as a differentiator:

1. Product taste manifests in both UX and architecture

Great product decisions happen at two levels: what users see and what powers the experience. Take Granola's intuitive approach to meeting transcripts: they leverage template headings and keywords to transform messy conversations into clean summaries. This isn't just a UX choice. The clean interface is likely enabled by some clever chunking of raw notes and multi-stage prompting.

Just as great product teams make sophisticated UX decisions ("this button should be blue, not red"), they now make equally nuanced architectural choices ("this RAG pipeline should use hybrid search, not pure semantic"). These architectural decisions compound into meaningful product differences that competitors can't easily copy just by looking at the UX.

Differentiation through post-training architecture drives model commoditization

When your secret sauce lives in your post-training architecture, the underlying model becomes interchangeable. Smart teams are building architectures that can plug into GPT-4 today, Claude tomorrow, and Llama 3 next week. The real moat isn't which model you use but how elegantly you orchestrate it.

Infra toolchains get more important

As teams focus on architectural innovation, they hit the limitations of existing ML tools. Take evals as an example: in traditional ML, you mostly need accuracy metrics. But with complex prompt chains and RAG systems, you need tools that can:

  • Run the same input through multiple variations of your architecture
  • Compare outputs across different model / architecture combinations
  • Track how architectural changes impact output quality over time
  • Debug which part of a complex chain is causing issues