AI Evals and Analytics Podcast – Détails, épisodes et analyse
Détails du podcast
Informations techniques et générales issues du flux RSS du podcast.
AI Evals and Analytics Podcast
Stella and Amy
Fréquence : 1 épisode/13j. Total Éps: 2

Build trustworthy AI products through evaluation-driven development.
Each episode covers practical evaluation strategies, industry trends, and best practices for building safe, reliable AI systems. From dataset generation and evals metrics design to cross-functional collaboration and post-launch analytics, we talk about how to build trustworthy and lasting AI products with a good AI evals and analytics framework.
Subscribe for practical techniques, industry insights, and guest interviews on AI evaluation and analytics.
More about AI Evals and Analytics -- https://ai-evals.org/
We (Stella & Amy) created the AI Evaluation & Analytics Playbook, a practical framework that helps teams ship production-ready, trustworthy AI systems.
Powered by Firstory Hosting
Classements récents
Dernières positions dans les classements Apple Podcasts et Spotify.
Apple Podcasts
🇺🇸 États-Unis - techNews
14/02/2026#85🇺🇸 États-Unis - techNews
13/02/2026#74🇺🇸 États-Unis - techNews
12/02/2026#70🇺🇸 États-Unis - techNews
11/02/2026#58🇺🇸 États-Unis - techNews
10/02/2026#84🇺🇸 États-Unis - techNews
09/02/2026#74🇺🇸 États-Unis - techNews
08/02/2026#92🇺🇸 États-Unis - techNews
07/02/2026#99🇺🇸 États-Unis - techNews
06/02/2026#72🇺🇸 États-Unis - techNews
05/02/2026#52
Spotify
Aucun classement récent disponible
Liens partagés entre épisodes et podcasts
Liens présents dans les descriptions d'épisodes et autres podcasts les utilisant également.
See all- https://firstory.me/zh
175752 partages
- https://ai-evals.org/
5 partages
- https://www.linkedin.com/in/wenxingl/
6 partages
- https://www.linkedin.com/in/amy17519/
4 partages
Qualité et score du flux RSS
Évaluation technique de la qualité et de la structure du flux RSS.
See allScore global : 38%
Historique des publications
Répartition mensuelle des publications d'épisodes au fil des années.
Build AI Evals from Scratch: When and How?
samedi 7 février 2026 • Durée 17:48
What is Evaluation-driven development? When should you start building evals for your product? How to build it from scrach?
Using a real-world example of a customer chatbot for a medical insurance company, we walk through the process of setting up evals from scratch: translating product requirements into quantifiable metrics, curating quality test datasets (hint: you need fewer examples than you think), and making go/no-go decisions based on eval scores.
You'll learn why accuracy and safety require different approaches, how to avoid the trap of AI-generated test data, and why 94% vs 95% accuracy matters less than you'd expect—but safety guardrails are non-negotiable. This is the practical blueprint for anyone building AI products who wants to catch problems before users do.
00:00 – Introduction: Why We Need to Talk About Evals Now
00:39 – When to Start AI Evals?
03:20 – Example Setup: Medical Insurance Customer Chatbot
04:30 – Defining Evals in Product Requirements
07:19 – What Is Evaluation-Driven Development?
08:27 – Breaking Down "Accuracy": What Does It Really Mean?
09:42 – Dataset Curation: Quality Over Quantity
11:24 – How Big Should Your Test Set Be?
12:25 – Safety Guardrails: Knowledge Boundary and PII Leakage
15:29 – Making Release Decisions with Eval Metrics
17:33 – Start with What's Critical to Your Use Case
Stella Liu: https://www.linkedin.com/in/wenxingl/
Amy Chen: https://www.linkedin.com/in/amy17519/
More about AI Evals and Analytics -- https://ai-evals.org/
We (Stella & Amy) created the AI Evaluation & Analytics Playbook, a practical framework that helps teams ship production-ready, trustworthy AI systems.
Powered by Firstory Hosting
AI Evals Skills: Why Data Scientists Have a Natural Advantage
lundi 26 janvier 2026 • Durée 22:10
What are the skills required for AI evals? Why data scientists have a natural advantage in AI evals?
Evaluating AI isn’t just about "vibe coding" with an AI assistant. It actually requires a solid foundation in statistics for picking sample sizes and coding to build your own testing frameworks. Data scientists have a huge head start here because they are already pros at designing metrics and communicating risks.
In the augural episode, we also explain why Evals (pre-launch testing) and Analytics (post-launch user feedback) are two sides of the same coin: one makes sure the AI works, and the other makes sure people actually love using it.
00:00 – Introduction to AI Evals & Analytics
01:31 – Why Data Scientists Have a Natural Advantage
01:59 – Technical Pillar: Statistics
02:48 – Technical Pillar: Coding & Prompt Engineering
05:03 – Technical Pillar: Dataset Generation
08:35 – Soft Skills & Stakeholder Collaboration
11:17 – Domain Expertise in Regulated Industries
15:50 – New Skills for the GenAI Era
19:25 – Why Evals and Analytics Must Come Together
Stella Liu: https://www.linkedin.com/in/wenxingl/
Amy Chen: https://www.linkedin.com/in/amy17519/
More about AI Evals and Analytics -- https://ai-evals.org/
We (Stella & Amy) created the AI Evaluation & Analytics Playbook, a practical framework that helps teams ship production-ready, trustworthy AI systems.
Powered by Firstory Hosting









