Explore every episode of the podcast AI Evals and Analytics Podcast
Dive into the complete episode list for AI Evals and Analytics Podcast. Each episode is cataloged with detailed descriptions, making it easy to find and explore specific topics. Keep track of all episodes from your favorite podcast and never miss a moment of insightful content.
Rows per page:
50
1–2 of 2
Title
Pub. Date
Duration
Build AI Evals from Scratch: When and How?
07 Feb 2026
00:17:48
What is Evaluation-driven development? When should you start building evals for your product? How to build it from scrach?
Using a real-world example of a customer chatbot for a medical insurance company, we walk through the process of setting up evals from scratch: translating product requirements into quantifiable metrics, curating quality test datasets (hint: you need fewer examples than you think), and making go/no-go decisions based on eval scores.
You'll learn why accuracy and safety require different approaches, how to avoid the trap of AI-generated test data, and why 94% vs 95% accuracy matters less than you'd expect—but safety guardrails are non-negotiable. This is the practical blueprint for anyone building AI products who wants to catch problems before users do.
00:00 – Introduction: Why We Need to Talk About Evals Now 00:39 – When to Start AI Evals? 03:20 – Example Setup: Medical Insurance Customer Chatbot 04:30 – Defining Evals in Product Requirements 07:19 – What Is Evaluation-Driven Development? 08:27 – Breaking Down "Accuracy": What Does It Really Mean? 09:42 – Dataset Curation: Quality Over Quantity 11:24 – How Big Should Your Test Set Be? 12:25 – Safety Guardrails: Knowledge Boundary and PII Leakage 15:29 – Making Release Decisions with Eval Metrics 17:33 – Start with What's Critical to Your Use Case
We (Stella & Amy) created the AI Evaluation & Analytics Playbook, a practical framework that helps teams ship production-ready, trustworthy AI systems.
AI Evals Skills: Why Data Scientists Have a Natural Advantage
26 Jan 2026
00:22:10
What are the skills required for AI evals? Why data scientists have a natural advantage in AI evals?
Evaluating AI isn’t just about "vibe coding" with an AI assistant. It actually requires a solid foundation in statistics for picking sample sizes and coding to build your own testing frameworks. Data scientists have a huge head start here because they are already pros at designing metrics and communicating risks.
In the augural episode, we also explain why Evals (pre-launch testing) and Analytics (post-launch user feedback) are two sides of the same coin: one makes sure the AI works, and the other makes sure people actually love using it.
00:00 – Introduction to AI Evals & Analytics 01:31 – Why Data Scientists Have a Natural Advantage 01:59 – Technical Pillar: Statistics 02:48 – Technical Pillar: Coding & Prompt Engineering 05:03 – Technical Pillar: Dataset Generation 08:35 – Soft Skills & Stakeholder Collaboration 11:17 – Domain Expertise in Regulated Industries 15:50 – New Skills for the GenAI Era 19:25 – Why Evals and Analytics Must Come Together
We (Stella & Amy) created the AI Evaluation & Analytics Playbook, a practical framework that helps teams ship production-ready, trustworthy AI systems.