Why Ensemble Architectures Win Against Real-Time Voice Risk - with Mike Pappas of Modulate (29 min)

ai-driven-innovation-economy

ai-governance-laws

ai-in-cybersecurity

ai-in-workforce-disruption

ai-investment-trends

ai-driven-innovation-economy ai-governance-laws ai-in-cybersecurity ai-in-workforce-disruption ai-investment-trends

← Back to week overview

Release date: 2026-03-20
Listen on Spotify: Open episode

Episode description:

The contact center is shifting from a service channel to a primary point of fraud, where text‑based systems fail to catch the signals that surface only in live voice. In this episode, Mike Pappas, CEO & Co‑Founder at Modulate, joins Emerj's Nick Gertsch to examine how audio‑native, multi‑model analysis exposes social engineering and deepfake‑driven threats that traditional tools routinely miss. The discussion highlights how leaders can strengthen real‑time detection, limit downstream financial and regulatory impact, and evaluate voice‑AI systems based on accuracy, speed, and adaptability to evolving fraud tactics. This episode is sponsored by Modulate. Learn how brands work with Emerj and other Emerj Media options atgo.emerj.com/partner. Want to share your AI adoption story with executive peers? Click go.emerj.com/expert for more information and to be a potential future guest on the 'AI in Business' podcast!

Summary

🚨 Contact Centers as Fraud Battlegrounds: Sophisticated social engineering in live calls drives financial losses, agent attrition, trust erosion, and regulatory risks if not caught in real-time.
📝 Text AI’s Voice Blind Spots: LLMs miss critical audio cues like fake sounds, emotions, and pauses that reveal fraud, as they’re built for text and sycophantic support, not adversarial scrutiny.
🔍 Ensemble Models Unlock Voice Intelligence: Over 100 specialized models analyze nuances for superior accuracy, transparency via explainable scores, and cost savings, tailored for production-scale fraud detection.
💸 Layered Costs of Fraud and Prevention: Direct losses compound with attrition, regulations, and friction from poor tools; effective voice AI minimizes all by enabling proactive, low-latency interventions.
📈 Smart Voice AI Evaluation Framework: Focus on cost, accuracy, speed, adaptability to threats, and reasoning transparency to ensure resilient deployments beyond demos.

Insights

Why have contact centers become the new frontline for sophisticated fraudsters?

Time: 2:43 – 5:19

Category: AI in Cybersecurity

Answer: Fraudsters leverage social engineering in live voice calls, exploiting agent training for pleasant experiences, leading to immediate financial losses, high agent attrition from exhaustion, regulatory penalties, and erosion of customer trust if undetected in real-time. (Start at 2:43)
What hidden costs lurk beyond direct financial fraud losses in call centers?

Time: 6:33 – 8:34

Category: AI in Workforce Disruption

Answer: Fraud causes agent burnout and attrition, regulatory fines for inadequate prevention, and customer trust erosion; ineffective countermeasures add user friction, longer handle times, and staffing costs. (Start at 6:33)
Why do general-purpose LLMs fail at real-time voice fraud detection?

Time: 11:15 – 13:49

Category: AI in Cybersecurity

Answer: LLMs are text-based, missing voice nuances like emotion, pauses, timbre, and audio artifacts (e.g., fake baby cries); designed to be supportive rather than adversarial, they lack scrutiny for subtle fraud signals. (Start at 11:15)
How do Ensemble Listening Models outperform monolithic AI in voice analysis?

Time: 16:02 – 20:48

Category: AI-Driven Innovation Economy, AI Governance & Laws

Answer: ELMs combine 100+ specialized models for emotion, deepfakes, pauses, and more, enabling accurate native audio understanding, transparent explainability (e.g., specific risk scores), and lower costs via smaller, focused models. (Start at 16:02)
Why is transparency essential for trusting voice AI in high-stakes fraud detection?

Time: 18:31 – 20:23

Category: AI Governance & Laws

Answer: Ensemble models provide clear reasoning behind flags (e.g., deepfake probability, urgency scores), aiding agents, building platform confidence, and satisfying regulators like the AI Act, unlike opaque black-box LLMs. (Start at 18:31)
What key criteria should leaders use to evaluate voice AI investments?

Time: 24:28 – 26:30

Category: AI Investment Trends

Answer: Prioritize low cost, high accuracy, minimal latency, adaptability to evolving threats, and explainability; avoid narrow solutions that can’t handle new tactics like social engineering beyond deepfakes. (Start at 24:28)