Voice AI: hook, not hammer

The simplest application of voice agents is information retrieval. Think about calling your insurance provider to check if a prescription is covered or calling your airline to check the status of your missing luggage. Voice is the perfect hammer for these Q&A tasks because it mirrors how we naturally seek information — through conversation.

Another way to think about voice agents is as a hook that initiates and orchestrates complex business processes. Consider an easy example: you call a restaurant not just to ask about available reservations but to actually book one. Or a more complex scenario: you call customer support about a broken device, and a voice AI agent sets multiple processes in motion - confirming device registration and warranty, updating case status in an internal CRM, assigning priority levels, and dispatching field technicians. Voice becomes the starting point for automating tasks that usually need multiple people and system interactions.

Voice is rich with context, which makes it ideal as a starting point for automation tasks. Each customer interaction creates a natural map of your organization's processes - every call reveals both the customer's intent and the business process they need to access.

The tech stack for making use of all this rich context is now in place:

  1. The accuracy of automatic speech recognition (ASR), speech-to-text (STT), text-to-speech (TTS), user emotion, and intent have all dramatically increased. Companies such as Retell, Vocode, Vapi, and Hume bundle these capabilities into clean, easy-to-use APIs.
  2. APIs for popular systems of record have been maturing for a decade. Today, you can bulk integrate with many CRMs, HRIS systems, accounting systems, or ERPs with tools like Ampersand and Merge.
  3. Long-tail products with no published APIs can use a combination of browser automation and API-reverse engineering to mimic stable integrations.

The above allows startups to go after the greenfield of complex coordination problems that start with a voice call.

Complex coordination problems are notoriously hard to automate and have high average handling times. Consider a typical insurance claim: today, a support agent might need to toggle between their claims management system, customer database, policy documentation system, payment processing portal, and scheduling tool. Each context switch introduces friction and the potential for error. A single claim might take 20-30 minutes to process as human agents manually coordinate across these systems.

A well-integrated voice agent, by contrast, can seamlessly orchestrate this entire workflow. Imagine calling about a fender bender: the voice agent can simultaneously verify your policy, create a claim in the claims management system, schedule an appointment with a local body shop based on their real-time availability, arrange a rental car for the interim period, and send all required documentation for signatures - all while maintaining a natural conversation with the customer. What previously required 30 minutes of human coordination can be compressed into a 5-minute automated call.

The biggest opportunity for voice AI startups isn't just IVR 2.0 or call center automation – it's complex coordination as a service.