Why most AI products fail: Lessons from 50+ AI deployments at OpenAI, Google & Amazon (1h 26m)
- Release date: 2026-01-11
- Listen on Spotify: Open episode
- Episode description:
Aishwarya Naresh Reganti and Kiriti Badam have helped build and launch more than 50 enterprise AI products across companies like OpenAI, Google, Amazon, and Databricks. Based on these experiences, they’ve developed a small set of best practices for building and scaling successful AI products. The goal of this conversation is to save you and your team a lot of pain and suffering.We discuss:Two key ways AI products differ from traditional software, and why that fundamentally changes how they should be builtCommon patterns and anti-patterns in companies that build strong AI products versus those that struggleA framework they developed from real-world experience to iteratively build AI products that create a flywheel of improvementWhy obsessing about customer trust and reliability is an underrated driver of successful AI productsWhy evals aren’t a cure-all, and the most common misconceptions people have about themThe skills that matter most for builders in the AI era—Brought to you by:Merge—The fastest way to ship 220+ integrationsStrella—The AI-powered customer research platformBrex—The banking solution for startups—Transcript: https://www.lennysnewsletter.com/p/what-openai-and-google-engineers-learned—My biggest takeaways (for paid newsletter subscribers): https://www.lennysnewsletter.com/i/183007822/referenced—Get 15% off Aishwarya and Kiriti’s Maven course, Building Agentic AI Applications with a Problem-First Approach, using this link: https://bit.ly/3V5XJFp—Where to find Aishwarya Naresh Reganti:• LinkedIn: https://www.linkedin.com/in/areganti• GitHub: https://github.com/aishwaryanr/awesome-generative-ai-guide• X: https://x.com/aish_reganti—Where to find Kiriti Badam:• LinkedIn: https://www.linkedin.com/in/sai-kiriti-badam• X: https://x.com/kiritibadam—Where to find Lenny:• Newsletter: https://www.lennysnewsletter.com• X: https://twitter.com/lennysan• LinkedIn: https://www.linkedin.com/in/lennyrachitsky/—In this episode, we cover:(00:00) Introduction to Aishwarya and Kiriti(05:03) Challenges in AI product development(07:36) Key differences between AI and traditional software(13:19) Building AI products: start small and scale(15:23) The importance of human control in AI systems(22:38) Avoiding prompt injection and jailbreaking(25:18) Patterns for successful AI product development(33:20) The debate on evals and production monitoring(41:27) Codex team’s approach to evals and customer feedback(45:41) Continuous calibration, continuous development (CC/CD) framework(58:07) Emerging patterns and calibration(01:01:24) Overhyped and under-hyped AI concepts(01:05:17) The future of AI(01:08:41) Skills and best practices for building AI products(01:14:04) Lightning round and final thoughts—Referenced:• LevelUp Labs: https://levelup-labs.ai/• Why your AI product needs a different development lifecycle: https://www.lennysnewsletter.com/p/why-your-ai-product-needs-a-different• Booking.com: https://www.booking.com• Research paper on agents in production (by Matei Zaharia’s lab): https://arxiv.org/pdf/2512.04123• Matei Zaharia’s research on Google Scholar: https://scholar.google.com/citations?user=I1EvjZsAAAAJ&hl=en• The coming AI security crisis (and what to do about it) | Sander Schulhoff: https://www.lennysnewsletter.com/p/the-coming-ai-security-crisis...References continued at: https://www.lennysnewsletter.com/p/what-openai-and-google-engineers-learned—Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com.—Lenny may be an investor in the companies discussed.
Summary
- 🔄 Embrace Non-Determinism: AI products differ from software due to unpredictable inputs and outputs, requiring behavior anticipation and calibration to deliver reliable experiences.
- ⚖️ Agency-Control Balance: Start with low AI agency and high human control, progressing iteratively to build trust and avoid risks like hallucinations in complex workflows.
- 📈 Problem-First Iteration: Adopt CCCD framework for continuous development and calibration, obsessing over workflows and flywheels rather than one-click agents.
- 👥 Leadership & Culture: Hands-on leaders, empowering cultures, and cross-team collaboration form the success triangle, rebuilding intuitions for AI transformations.
- 🔮 Future Moats: Persistence through ‘pain’ creates moats; expect proactive multimodal agents and underrated coding tools to drive 2026 value.
Insights
Why is ‘pain the new moat’ for AI builders and companies?
Time: 1:16 – 73:58
Category: AI-Driven Innovation EconomyAnswer: Persistence through trial-error in messy data and workflows builds unique knowledge no textbook provides, creating defensible advantages. Successful firms endure this learning pain for flywheels and ROI in 4-6 months. It separates hype from real value in a young field. (Start at 1:16)
How does non-determinism in user inputs and LLM outputs revolutionize AI product building?
Time: 8:19 – 9:49
Category: AI-Driven Innovation EconomyAnswer: AI products face unpredictable user behaviors via natural language interfaces and probabilistic LLM responses, unlike deterministic traditional software, requiring teams to anticipate varied interactions and outputs. This shift demands new strategies for behavior calibration to ensure reliable experiences. It matters because ignoring it leads to unreliable products and lost trust. (Start at 8:19)
Why must teams navigate the agency-control trade-off when deploying AI agents?
Time: 9:56 – 12:29
Category: AI in Workforce DisruptionAnswer: Granting AI more decision-making agency reduces human control, so start with low agency and high oversight to build trust gradually. Examples like customer support show progression from suggestions to autonomous actions. This prevents risks like hallucinations or errors while enabling safe iteration. (Start at 9:56)
What if starting AI products ‘problem-first’ with minimal agency avoids common pitfalls?
Time: 20:35 – 21:24
Category: AI-Driven Innovation EconomyAnswer: Focusing on core problems before complex solutions forces clarity and prevents slippery slopes into over-engineering. Build step-by-step from high-control versions to calibrate behavior safely. This resonates widely, reducing overwhelm and enabling flywheels for improvement. (Start at 20:35)
How can hands-on leaders rebuild intuitions to drive AI success?
Time: 25:43 – 27:51
Category: AI in Workforce DisruptionAnswer: CEOs like Rackspace’s dedicate time daily to AI catch-up, embracing vulnerability and learning from teams. This top-down approach aligns expectations and fosters buy-in, distinguishing successful companies. It counters outdated intuitions in this young field without playbooks. (Start at 25:43)
Is the ‘success triangle’ of leaders, culture, and tech the key to thriving AI teams?
Time: 25:43 – 31:42
Category: AI-Driven Innovation Economy, AI in Workforce DisruptionAnswer: Great leaders stay hands-on, empowering cultures augment experts instead of fearing replacement, and technical teams obsess over workflows with quick iterations. This holistic view addresses people problems first in AI transformations. It builds flywheels over one-click hype. (Start at 25:43)
Do evals alone suffice for AI reliability, or must they pair with production monitoring?
Time: 33:43 – 40:44
Category: AI Bias & FairnessAnswer: Evals test known issues via curated datasets, while monitoring captures implicit user signals like regenerations for emerging patterns. Neither extreme works; use both in a feedback loop tailored to the problem. Overhype confuses evals with benchmarks or vibes. (Start at 33:43)
How does the Continuous Calibration Continuous Development (CCCD) framework minimize AI risks?
Time: 46:05 – 57:17
Category: AI-Driven Innovation EconomyAnswer: CCCD iterates from scoping data and low-agency versions to deploying, analyzing traces, and calibrating behaviors in a flywheel. It constraints autonomy progressively, logging human actions for improvement without eroding trust. Inspired by real failures like hotfix overloads. (Start at 46:05)