10 Dec 2024 • 1 min read

Investing in Gentrace

Intuitive LLM evals for product developers

The way we write software is changing. Language models now deliver large chunks of user-facing experiences. Developers no longer explicitly write instructions for these experiences. Instead, they describe the intended behavior, and language models deliver the output.

Language models create new opportunities but also new problems. These models are non-deterministic, making them prone to errors and hallucinations. For example, a chatbot might respond inappropriately to a customer service query, or a dynamically generated UI could include unwanted elements.

How can we leverage the benefits of language models without the downsides?

Welcoming Gentrace

Gentrace is an evaluation platform for product developers testing generative AI apps. It’s the missing part of the stack that helps teams realize the benefits of language models and adopt the new way of writing software. Three core product principles resonated deeply as I spent time with the team:

Comprehensive testing: while existing evaluation solutions focus on prompts, Gentrace tests all parts of your LLM system, including prompts, data, RAG pipelines, function calls, and model outputs.
Flexible evaluations: Developers can use different testing methods, such as LLM-as-judge, human-in-the-loop, or custom rules.
Collaborative approach: Gentrace is the first collaborative LLM product testing environment. Its UI connects to application code, allowing product managers to contribute to evaluation and last-mile tuning.

Today, several marquee customers such as Webflow, Quizlet, Jasper, and a Fortune 100 retailer rely on Gentrace to ship reliable AI systems.

Gentrace team

Founders Doug Safreno (CEO), Vivek Nair (CTO), and Daniel Liem (COO) have spent much of their careers focused on observability and monitoring. Doug and Vivek previously launched and exited StacksWare, a VMware observability company, while Daniel led teams at Uber and Dropbox, scaling test, CI/CD, and release infrastructure.

Altogether, I've known this team for nearly a decade, and I'm thrilled to lead their $8M Series A round and join the board.