Durable Execution and the Infrastructure Powering AI Agents (1h 4m)
- Release date: 2026-02-19
- Listen on Spotify: Open episode
- Episode description:
Raghu Raghuram, Managing Partner at a16z, and Sarah Wang, General Partner at a16z, speak with Samar Abbas, CEO of Temporal, about how durable execution became the infrastructure layer behind some of the world’s most widely used AI agents. They cover why long-running agents require state management and recoverability, how Temporal powers OpenAI’s Codex and Snap’s Story processing, and why the shift from interactive to background agents is creating distributed systems challenges at a scale that didn’t exist two years ago. Resources: Follow Samar Abbas: https://x.com/SamarAtTemporal Follow Sarah Wang: https://x.com/sarahdingwang Follow Raghu Raghuram: https://x.com/RaghuRaghuram Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts. Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Summary
- 🔧 Durable Execution Core: Temporal abstracts state management for reliable workflows in chaos, born from Uber’s microservices, powering apps like tipping and loyalty with exactly-once guarantees.
- 💻 AI Agent Lifeline: Prevents costly restarts in long-running agents like Codex and deep research by resuming from failures, handling spikes to 150K actions/sec.
- 📈 Reliability Revolution: Offers 5-nines SLA, multi-region failover, and bug recovery via event replay, free observability traces for non-deterministic agents.
- 🤖 Agentic Future: Evolving to swarms of specialized agents needing durable RPC; boosts coding productivity 15x, real-time context retrieval for prompts.
- 🚀 Business Boom: SaaS thrives via APIs/orchestration; explosion of cheap apps drives Temporal’s consumption model, not replacement.
Insights
How does Temporal transform chaotic microservices into reliable workflows, much like a busy restaurant kitchen?
Time: 0:10 – 0:40
Category: AI in Workforce DisruptionAnswer: In distributed systems with failures and spikes, developers focus on business logic while Temporal handles state management and guarantees exactly-once execution. This abstraction mirrors a kitchen’s chaos but ensures every order delivers perfectly, now vital for AI agent orchestration. (Start at 0:10)
Why is durable execution critical for preventing massive token waste in long-running AI agents?
Time: 0:55 – 1:07
Category: AI-Driven Innovation EconomyAnswer: AI agents performing deep research or complex tasks can burn thousands of tokens over hours; failures midway mean restarting from scratch, losing real money and time. Temporal ensures seamless recovery by remembering state and resuming exactly where it failed, making it essential as agents become more autonomous. (Start at 0:55)
Can platforms like Temporal rewind buggy workflows to recover customer data instantly?
Time: 8:59 – 10:21
Category: AI-Driven Innovation EconomyAnswer: During an Uber loyalty bug that reset points, Temporal’s event sourcing allowed resetting affected workflows to a prior state and replaying events with fixed code, avoiding manual fixes. This versioning capability prevents churn and economic loss in mission-critical apps. (Start at 8:59)
What makes 5-nines reliability a game-changer for AI amid cloud outages?
Time: 11:02 – 12:36
Category: AI Governance & LawsAnswer: Temporal Cloud delivers 99.999% SLA with multi-region failover, enabling seconds of disruption during major outages versus weeks for others. This business continuity is free for developers, crucial as AI agents power high-stakes operations. (Start at 11:02)
What infrastructure gap will swarms of specialized agents expose next?
Time: 19:32 – 21:10
Category: AI-Driven Innovation EconomyAnswer: As agents break sandboxes for multi-agent collaboration on complex tasks like vacation booking, durable RPC is missing to handle long async calls across dozens. Temporal pushes standards like Project Nexus for industry-wide stitching. (Start at 19:32)
How are coding agents evolving to boost developer productivity 15x via background orchestration?
Time: 22:51 – 26:20
Category: AI in Workforce DisruptionAnswer: From tab completion to handling complex tasks asynchronously, agents like Claude and OpenAI Codex use Temporal to spin off work, test, and deliver without interrupting engineers. This enables massive productivity leaps as agents run longer and manage multiple streams. (Start at 22:51)
Why is observability a ‘free’ superpower for non-deterministic AI agents on Temporal?
Time: 32:29 – 35:04
Category: AI-Driven Innovation EconomyAnswer: Event-sourced execution histories provide full traces of agent actions, enabling runtime debugging, training improvements, and business analytics like tool failure rates. This visibility unites dev and business teams, turning agent data into a goldmine. (Start at 32:29)
Is SaaS doomed in the agent era, or will APIs and orchestrators explode?
Time: 35:45 – 38:05
Category: AI-Driven Innovation EconomyAnswer: Cheaper app building via agents won’t kill SaaS; value shifts to specialized APIs (e.g., Stripe, airlines) and durable orchestration for swarms. Temporal positions as execution authority, fueling faster world automation. (Start at 35:45)
How does real-time context retrieval from diverse sources power effective agent prompts?
Time: 46:34 – 49:00
Category: AI in Everyday LifeAnswer: Agents pull live data from APIs, Slack, Docs with varying reliability; Temporal orchestrates high-throughput retrievals for RAG, bypassing rigid data warehouses. This ‘retrieval’ use case surprises with its scale in agent apps. (Start at 46:34)