Back

Explore every episode of the podcast The Data Stack Show

Dive into the complete episode list for The Data Stack Show. Each episode is cataloged with detailed descriptions, making it easy to find and explore specific topics. Keep track of all episodes from your favorite podcast and never miss a moment of insightful content.

Rows per page:

1–50 of 426

TitlePub. DateDuration
225: The Stone Cold Truth About Data: False Hopes and Hard Truths with The Cynical Data Guy22 Jan 202500:33:30

Highlights from this week’s conversation include:

  • False Hope in Data Roles (1:17)
  • Naivety of Junior Data Analysts (4:27)
  • The Challenge of Defining Data (6:41)
  • Struggles with Enterprise BI Tools (9:43)
  • Career Advice for Data Professionals (12:36)
  • Generational Shifts in Data Roles (16:51)
  • Self-Service Data Requests (18:17)
  • The Importance of Analysis Skills (19:46)
  • The Broader Context of Analysis (21:44)
  • Boring Challenges in AI Deployment (23:29)
  • Technology Development vs. Human Absorption (26:14)
  • VC Resolutions for 2025 (27:00)
  • Value Addition in Leadership (32:08)
  • Final Thoughts and Wrap-Up (33:06)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

The PRQL: The End of Data Silos (and Other Myths) with The Cynical Data Guy20 Jan 202500:02:33

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

220: From Box Office to Big Data: Bridging Marketing and Technology Through Data-Driven Leadership with Brian Schwartz of SIZE18 Dec 202401:01:40

Highlights from this week’s conversation include:

  • Brian’s Background and Journey in Data and Marketing (0:56)
  • AI and Data Strategy (2:12)
  • Experience at DreamWorks Animation (3:15)
  • Marketing Timeline for Movies (5:18)
  • Data-Driven Decisions at Expedia (9:04)
  • Advising High-Growth Companies (14:59)
  • LinkedIn Connections and Networking (17:57)
  • Tension Between Marketing and Data Teams (19:59)
  • Technology Spending in Marketing (22:07)
  • Advice for Tech Leaders Facing Brand Marketers (25:50)
  • Frequency of Replatforming (30:11)
  • Understanding Data Accessibility (33:58)
  • Data as a Product (00:37:58)
  • Overhyped AI Applications (39:00)
  • Underutilized AI Opportunities (00:41:51)
  • AI's ROI Challenges (47:01)
  • Effective AI Support Systems (52:33)
  • Potential Ventures to Pursue Outside of Data (56:04)
  • Final Thoughts and Takeaways (1:00:11)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

The PRQL: Exploring the Evolution of AI and ML with Rishabh Bhargava of refuel12 Feb 202400:04:04
In this bonus episode, Eric and Kostas preview their upcoming conversation with Rishabh Bhargava of refuel.
176: The Fundamentals of Event-Driven Orchestration and How Generative AI Is Shaping Its Future with Viren Baraiya of orkes.io07 Feb 202400:53:09

Highlights from this week’s conversation include:

  • Viren’s background in data (0:39)
  • Evolution of Orchestration (1:52)
  • AI Orchestration (3:00)
  • Understanding Conductor and orkes (6:26)
  • Event-Driven Orchestration (8:10)
  • Viren’s Transition to Founder (12:27)
  • Non-Technical Aspects of Being a Founder (15:50)
  • Democratizing AI for Developers (18:16)
  • The evolution of microservices orchestration (21:56)
  • Challenges in appealing to the 99% developer group (24:32)
  • Value of orchestration for developers (30:31)
  • Role of orchestrators in managing faults (37:37)
  • The intersection of AI and orchestration (40:27)
  • Evolution of AI (44:04)
  • Thriving in AI Environment (47:58)
  • Final thoughts and takeaways (51:25)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

The PRQL: The Evolution of Application Orchestration Featuring Viren Baraiya of orkes.io05 Feb 202400:04:01
In this bonus episode, Eric and Kostas preview their upcoming conversation with Viren Baraiya of orkes.io.
175: The Parts, Pieces, and Future of Composable Data Systems, Featuring Wes McKinney, Pedro Pedreira, Chris Riccomini, and Ryan Blue31 Jan 202401:18:30

Highlights from this week’s conversation include:

  • Introduction of the panel (0:05)
  • Defining composable data stack (5:22)
  • Components of a composable data stack (7:49)
  • Challenges and incentives for composable components (10:37)
  • Specialization and modularity in data workloads (13:05)
  • Organic evolution of composable systems (17:50)
  • Efficiency and common layers in data management systems (22:09)
  • The IR and Data Computation (23:00)
  • Components of the Storage Layer (26:16)
  • Decoupling Language and Execution (29:42)
  • Apache Calcite and Modular Frontend (36:46)
  • Data Types and Coercion (39:27)
  • Describing Data Sets and Schema (42:00)
  • Open Standards and Frontiers (46:22)
  • Challenges of standardizing APIs (48:15)
  • Trade-offs in building composable systems (54:04)
  • Evolution of data system composability (56:32)
  • Exciting new projects in data systems (1:01:57)
  • Final thoughts and takeaways (1:17:25)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

The PRQL: Exploring the Evolution, Challenges, and Benefits of Composable Data Stacks Featuring Wes McKinney, Pedro Pedreira, Chris Riccomini, and Ryan Blue29 Jan 202400:04:59
In this bonus episode, Eric and Kostas preview their upcoming discussion with a panel of experts as Wes McKinney (Co-Founder, Voltron), Pedro Pedreira Software Engineer, Meta), Chris Riccomini (Seed Investor, various startups), and Ryan Blue (Co-Founder and CEO, Tabular) join the show.
174: Does Your Data Stack Need a Semantic Layer? Featuring Artyom Keydunov of Cube Dev24 Jan 202400:58:14

Highlights from this week’s conversation include:

  • Artyom’s background in the data space (0:32)
  • The growth and changes at Cube (5:58)
  • Pain points of managing metrics definitions across different tools (9:39)
  • Trade-offs between coupled and decoupled semantic layers (12:12)
  • Making a case for implementing a semantic layer (14:17)
  • The evolution of semantic layers (23:28)
  • Challenges in designing a decoupled semantic layer (24:16)
  • Different approaches to solving the interface problem (26:58)
  • Implementing a SQL engine in Cube (35:58)
  • Overhead and debugging in semantic layers (39:08)
  • The semantic layer and its importance (46:26)
  • The need for semantics in data products (47:34)
  • What’s the future of semantic layers and user experience? (51:49)
  • Final thoughts and takeaways (57:34)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

The PRQL: Why is a Semantic Layer Important in the Modern Data Stack? Featuring Artyom Keydunov of Cube Dev22 Jan 202400:03:08
In this bonus episode, Eric and Kostas preview their upcoming conversation with Artyom Keydunov of Cube Dev.
173: Data Analytics Is a Team Sport, Featuring Jay Henderson of Alteryx17 Jan 202400:46:31

Highlights from this week’s conversation include:

  • No Code Analytics (1:22)
  • Analytics as a Team Sport (2:31)
  • The workflow of someone without Alteryx (11:27)
  • Alteryx's ability to handle diverse data sources (14:32)
  • The balance between ease of use and complexity (23:06)
  • Enabling casual end users with a no code interface (24:19)
  • Taking analytics to the data (31:47)
  • The boundaries between data engineers and end users (33:44)
  • The importance of collaboration in analytics (34:12)
  • The potential of every employee being a data worker (35:28)
  • The human nature of the product and users in large enterprises (00:45:38)
  • Final thoughts and takeaways (46:21)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

The PRQL: Bridging the Gap Between Messy Data and Sophisticated Analytics with Jay Henderson of Alteryx15 Jan 202400:03:31
In this bonus episode, Eric and Kostas preview their upcoming conversation with Jay Henderson of Alteryx.
172: How WebAssembly is Enabling the Third Wave of Cloud Compute with Matt Butcher of Fermyon Technologies10 Jan 202400:56:03

Highlights from this week’s conversation include:

  • Matt’s background and journey with Fermyon (2:32)
  • WebAssembly and enhanced security models (3:43)
  • The IOT Startup and Google Acquisition (10:49)
  • Google's Early Containers (11:50)
  • Scaling and anticipating requests (20:22)
  • Introduction to WebAssembly and its importance (23:32)
  • The Benefits of WebAssembly (30:57)
  • Comparison of Virtual Machines, Containers, and Micro VMs (33:12)
  • The Importance of Fast Startup Times in WebAssembly (37:39)
  • Metaphysics and software development (42:12)
  • The importance of effective communication in code development (43:18)
  • The challenges and progress of WebAssembly (47:40)
  • Requirements of different teams and different jobs (52:17)
  • Final thoughts and takeaway (53:14)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

The PRQL: Bridging Marketing and Technology Through Data-Driven Leadership with Brian Schwartz of SIZE16 Dec 202400:02:55

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

The PRQL: WebAssembly: The Future of Cloud Workloads Made Simple with Matt Butcher of Fermyon Technologies08 Jan 202400:04:40
In this bonus episode, Eric and Kostas preview their upcoming conversation with Matt Butcher of Fermyon Technologies.
171: Machine Learning Pipelines Are Still Data Pipelines with Sandy Ryza of Dagster03 Jan 202400:55:50

Highlights from this week’s conversation include:

  • The role of an orchestrator in the lifecycle of data (1:34)
  • Relevance of orchestration in data pipelines (00:02:45)
  • Changes around data ops and MLOps (3:37)
  • Data Cleaning (11:42)
  • Overview of Dagster (13:50)
  • Assets vs Tasks in Data Pipeline (19:15)
  • Building a Data Pipeline with Dexter (25:40)
  • Difference between Data Asset and Materialized Dataset (28:28)
  • Defining Lineage and Data Assets in Dagster (29:32)
  • The boundaries of software and organizational structures (37:25)
  • The benefits of a unified orchestration framework (39:56)
  • Orchestration in the development phase (45:29)
  • The emergence of analytics engineer role (51:53)
  • Fluidity in data pipeline and infrastructure roles (52:40)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

The PRQL: Does Machine Learning Need Its Own Orchestrator? Featuring Sandy Ryza of Dagster02 Jan 202400:03:48
In this bonus episode, Eric and Kostas preview their upcoming conversation with Sandy Ryza of Dagster.
170: Discussing Data Roles and Solving Data Problems with Katie Bauer of GlossGenius27 Dec 202300:53:43

Highlights from this week’s conversation include:

  • The evolution of the data scientist role (1:03)
  • Common problems in different companies (2:05)
  • Measuring and curating content on Reddit (4:29)
  • The challenges of working with unstructured content at Reddit and Twitter (11:03)
  • Lessons learned from Reddit and applying them at Twitter (13:17)
  • Data challenges and customer behavior analysis at GlossGenius (20:16)
  • How the data scientist's role has changed over time (00:25:10)
  • The essence of the data scientist/engineer role (29:00)
  • Dynamics and overlaps between different data roles (32:09)
  • The perfect data team for Twitter (34:19)
  • Building a data team at a startup like GlossGenius (36:36)
  • The right time to bring in a dedicated data person in a startup (38:52)
  • The analytics engineer role (46:25)
  • Challenges in implementing telemetry (50:31)
  • Final thoughts and takeaways (52:24)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

The PRQL: What is a Data Scientist? Featuring Katie Bauer of GlossGenius26 Dec 202300:02:36
In this bonus episode, Eric and Kostas preview their upcoming conversation with Katie Bauer of GlossGenius.
169: Data Models: From Warehouse to Business Impact with Tasso Argyros of ActionIQ20 Dec 202301:05:54

Highlights from this week’s conversation include:

  • The Evolution of Databases and Data Systems (2:33)
  • Abstracting Data for Business Users (4:31)
  • Building a Database for Google-like Search (7:58)
  • The Big Data Explosion (11:10)
  • Selling Myspace as First Customer (13:14)
  • Starting ActionIQ (16:57)
  • The customer-centric organization (22:46)
  • Transitioning to customer data focus (23:53)
  • Understanding business users' needs (28:30)
  • Supporting Arbitrary Queries and Data Models (34:42)
  • Unique Technical Perspective of Clickstream Data (37:01)
  • The value per terabyte of data (46:45)
  • Building a product for multiple personas (50:45)
  • Composability and Benefits (58:05)
  • Evolution of Storage and Compute (1:00:09)
  • Composability and Treasure Data (1:02:10)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

The PRQL: From Databases to Customer Data Platforms with Tasso Argyros of ActionIQ18 Dec 202300:06:28
In this bonus episode, Eric and Kostas preview their upcoming conversation with Tasso Argyros of ActionIQ.
168: Decoding Data Mesh: Principles, Practices, and Real-World Applications Featuring Paolo Platter, Zhamak Dehghani, and Melissa Logan13 Dec 202300:56:40

Highlights from this week’s conversation include:

  • Defining data mesh (6:37)
  • Addressing the scale of organizational complexity and usage (9:04)
  • The shift from monolithic to microservices (12:24)
  • The sociological structure in data mesh (13:59)
  • Data product generation and sharing in data mesh (17:27)
  • Data Mesh: Simplifying Data Work (24:09)
  • Getting Started with Data Mesh (29:14)
  • Building products for Data Mesh (36:42)
  • Building a customizable and extensible platform to shape data practice (39:28)
  • The characteristics of a data product (48:40)
  • Defining what a data product is not (50:45)
  • The origin of the term "mesh" in data mesh (53:32)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

The PRQL: A Data Mesh Deep Dive with Paolo Platter, Zhamak Dehghani, and Melissa Logan11 Dec 202300:03:01
In this bonus episode, Eric and Kostas preview their upcoming conversation regarding Data Mesh with Paolo Platter, Zhamak Dehghani, and Melissa Logan.
167: Data-Driven Investing and Company Building with Ben Miller of Fundrise06 Dec 202300:57:04

Highlights from this week’s conversation include:

  • Ben’s background in real estate (3:27)
  • Why Fundrise was Started (4:37)
  • Democratizing Investment Opportunities (6:35)
  • Investment Thesis for Venture (11:55)
  • Challenges with Data and Technology (12:34)
  • Importance of Data Model Abstraction (20:03)
  • Data Infrastructure and Investments (23:22)
  • Evolution of Data Engineering (25:12)
  • Closing the Tooling Gap (34:23)
  • The user base segmentation (36:28)
  • The emotional reality of investment decisions (40:50)
  • Data inputs for real estate investment (47:07)
  • The work of data infrastructure (48:28)
  • The limitations of underwriting analysis (49:36)
  • Improving accuracy with data infrastructure (52:43)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

219: The First 90 Days of Data Leadership: What the LinkedIn Posts Don't Tell You with Matt Kelliher-Gibson, The Cynical Data Guy11 Dec 202400:33:21

Highlights from this week’s conversation include:

  • Lightning Round Setup (1:15)
  • Scenarios for New Data Leaders (2:33)
  • Optimism vs. Reality (3:14)
  • Cynical Perspective on Data Roles (5:32)
  • Monitoring Systems Discussion (9:31)
  • Executive Alignment Challenges (12:54)
  • Understanding Team Dynamics (17:32)
  • Head of Data vs. Head of Product (20:13)
  • Product Development Steps (22:14)
  • Consequences of Product Decisions (24:14)
  • Challenges in Data Team Dynamics (26:03)
  • Attribution Reporting Complexity (28:24)
  • Long-Term Vision for Data Teams (29:22)
  • AI Summaries Discussion (30:19)
  • Closing Thoughts on AI Nuance (32:02)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

The PRQL: Fundrise's Data-Driven Approach to Investment in Real Estate and Tech with Ben Miller04 Dec 202300:03:24
In this bonus episode, Eric and Kostas preview their upcoming conversation with Ben Miller of Fundrise.
166: Data Processing Fundamentals and Building a Unified Execution Engine Featuring Pedro Pedreira of Meta29 Nov 202301:12:16

Highlights from this week’s conversation include:

  • The concept of composable at a lower level of data infrastructure (1:28)
  • New architectures and components that allow developers to build databases (3:44)
  • Pedro's background and experience in data infrastructure (6:18)
  • The Spectrum of Latency and Analytics (12:59)
  • Different Query Engines for Different Use Cases (16:32)
  • Vectorized vs Code Gen Data Processing (19:33)
  • Vectorization and Code Generation (21:21)
  • Examples of Vectorized Engines (24:33)
  • Rewriting Execution Engine in C++ (27:22)
  • Different Organization of Presto and Spark (33:17)
  • Arrow and its Extensions (37:15)
  • The similarities between analytics and ML (44:33)
  • Offline feature engineering and data preprocessing for training (48:00)
  • Dialect and semantic differences in using Velox for different engines (50:01)
  • The convergence of dialects (52:23)
  • Challenges of substrate and semantics (53:18)
  • Future plans for Velox (58:09)
  • The discussion on evolving Parquet (1:03:38)
  • The integration of the relational model and the tensor model (1:07:29)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

The PRQL: How Does Composability in Data Infrastructure Differ at Different Levels of Abstraction? Featuring Pedro Pedreira of Meta27 Nov 202300:06:03
In this bonus episode, Eric and Kostas preview their upcoming conversation with Pedro Pedreira of Meta.
165: SQL Queries, Data Modeling, and Data Visualization with Colin Zima of Omni22 Nov 202300:54:23

Highlights from this week’s conversation include:

  • Colin's Background and Starting Omni (1:48)
  • Defining “good” at Google search early in his career (4:42)
  • Looker's Unique Approach to Analytics (9:48)
  • The paradigm shift in analytics (10:52)
  • The architecture of Looker and its influence (12:04)
  • Combatting the challenge of unbundling in the data stack (14:26)
  • The evolution of analytics engineering (21:50)
  • Enhancing user flexibility in Omni (23:44)
  • The evolution of BI tools (32:53)
  • What does the future look like for BI tools? (35:14)
  • The role of Python and notebooks in BI (39:48)
  • The product experience of Omni and its vision (45:27)
  • Expectations for the future of Omni (47:52)
  • The relationship between algorithms and business logic (50:51)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

The PRQL: Building a Data Product for Data People: Looker's Vision and Omni's Future with Colin Zima20 Nov 202300:01:57
In this bonus episode, Eric and Kostas preview their upcoming conversation with Colin Zima of Omni.
164: How The GTM and Data Teams at Snowflake Work Together with Travis Henry and Hillary Carpio15 Nov 202300:56:56

Highlights from this week’s conversation include:

  • The Unique Perspective of Practitioners (2:10)
  • Account-based Marketing (6:30)
  • Sales Development Representatives (SDR) (8:05)
  • Descriptive, People, and Engagement Data (11:38)
  • Data Overload and Actionable Data (14:20)
  • Working with Data Teams and Internal Data (17:52)
  • The relationship between business and data teams (22:27)
  • The importance of collaboration between marketing and data teams (24:17)
  • Travis and Hillary writing a book (25:33)
  • The taxonomy of personas (34:23)
  • Bucketing and grouping people in data systems (35:37)
  • Account-based marketing and sales alignment (39:00)
  • The data-driven approach and reliance on technology (44:25)
  • Managing complexity in data and account-based marketing (45:35)
  • Adapting to change and evolving data artifacts (51:58)
  • The importance of understanding the business (54:58)
  • Collaboration between data and go-to-market teams (55:56)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

The PRQL: Navigating the World of Data Overload with Travis Henry and Hillary Carpio of Snowflake13 Nov 202300:04:39
In this bonus episode, Eric and Kostas preview their upcoming conversation with Travis Henry and Hillary Carpio of Snowflake.
163: Simplifying Real-Time Streaming with David Yaffe and Johnny Graettinger of Estuary08 Nov 202301:03:57

Highlights from this week’s conversation include:

  • Johnny and David’s background in working together (1:56)
  • The background story of Estuary (4:15)
  • The challenges of ad tech and the need for low latency (5:44)
  • Use cases for moving data at scale (10:35)
  • Real-time data replication methods (11:54)
  • Challenges with Kafka and the birth of Gazette (13:54)
  • Comparing Kafka and Gazette (20:22)
  • The importance of existing streaming tools (22:28)
  • Challenges of managing Kafka and the need for a different approach (23:40)
  • The role of compaction in streaming applications (26:54)
  • The challenge of relaxing state management (34:01)
  • Replication and the problem of data synchronization (36:48)
  • Incremental Back Fills and Risk-Free Production Database (46:03)
  • Estuary as a Platform and Connectors (47:45)
  • The challenges of real-time streaming (57:56)
  • Orchestration in real-time streaming (1:00:51)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

The PRQL: The Shortcomings of Apache Kafka with David Yaffe and Johnny Graettinger of Estuary06 Nov 202300:03:51
In this bonus episode, Eric and Kostas preview their upcoming conversation with David Yaffe and Johnny Graettinger of Estuary.
162: Accelerating Enterprise AI Transformation With Open Source LLMs Featuring Mark Huang of Gradient01 Nov 202300:57:27

Highlights from this week’s conversation include:

  • The potential of AI-driven applications (1:34)
  • The need for hardware infrastructure in AI experimentation (2:40)
  • Oligopoly on the closed side (11:50)
  • Advantages of private side vs. open source (13:18)
  • Leveraging valuable data within enterprises (16:00)
  • The urgency of adopting LLMs in the enterprise (24:02)
  • Expansion of LLMs into new business verticals (25:06)
  • The challenges of operationalizing LLMs (29:32)
  • Seamless experience with OpenAI (37:29)
  • Operationalizing with Gradient (38:36)
  • The early genesis of Gradient (48:53)
  • The democratization of AI through endpoints (51:44)
  • What is the future of language models? (54:07)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

The PRQL: The Unspoken Truths of Data Leadership with Matt Kelliher-Gibson, The Cynical Data09 Dec 202400:02:53

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

The PRQL: How LLMs are Transforming Enterprise Workflows with Mark Huang of Gradient30 Oct 202300:03:36
In this bonus episode, Eric and Kostas preview their upcoming conversation with Mark Huang of Gradient.
161: The Intersection of Generative AI and Data Infrastructure with Chang She of LanceDB25 Oct 202301:21:05

Highlights from this week’s conversation include:

  • Chang’s background and journey with Pandas (6:26)
  • The persisting challenges in data collection and preparation (10:37)
  • The resistance to change in using Python for data workflows (13:05)
  • AI hype and its impact (14:09)
  • The success and evolution of Pandas as a data framework (20:04)
  • The vision for a next-generation data infrastructure (26:48]
  • LanceDB's file and table format (34:35)
  • Trade-Offs in Lance Format (42:45)
  • Introducing the Vector Database (46:30)
  • The split between production and serving databases (51:14)
  • The importance of unstructured data and multimodal use cases (57:01)
  • The potential of generative AI and the balance between value and hype (1:01:34)
  • Changing expectations of interacting with information systems (1:13:53)
  • Final thoughts and takeaways (1:15:32)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

The PRQL: How Did Pandas Become a Data Science Powerhouse? Featuring Chang She of Eto Labs23 Oct 202300:04:43
In this bonus episode, Eric and Kostas preview their upcoming conversation with Chang She of Eto Labs.
160: Closing the Gap Between Dev Teams and Data Teams with Santona Tuli of Upsolver18 Oct 202301:05:42

Highlights from this week’s conversation include:

  • Santona’s journey from nuclear physics to data science (4:59)
  • The appeal of startups and wearing multiple hats (8:12)
  • The challenge of pseudoscience in the news (10:24)
  • Approaching data with creativity and rigor (13:22)
  • Challenges and differences in data workflows (14:39)
  • Schema Evolution and Quality Problems (27:01)
  • Real-time Data Monitoring and Anomaly Detection (30:34)
  • The importance of data as a business differentiator (35:48)
  • The SQL job creation process (46:25)
  • Different options for creating solver jobs (47:20)
  • Adding column-level expectations (50:17)
  • Discussing the differences of working with data as a scientist and in a startup (1:00:18)
  • Final thoughts and takeaways (1:04:01)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

The PRQL: The Intersection of Physics, Data Science, and Product Development with Santona Tuli of Upsolver16 Oct 202300:05:47
In this bonus episode, Eric and Kostas preview their upcoming conversation with Santona Tuli of Upsolver.
159: What Is a Vector Database? Featuring Bob van Luijt of Weaviate11 Oct 202301:08:48

Highlights from this week’s conversation include:

  • How music impacted Bob’s data journey (3:16)
  • Music’s relationship with creativity and innovation (11:38)
  • The genesis of Weaviate and the idea of vector databases (14:09)
  • The joy of creation (19:02)
  • OLAP Databases (22:21)
  • The progression of complexity in databases (24:31)
  • Vector database (29:23)
  • Scaling suboptimal algorithms (34:34)
  • The future of vector space representation (35:51)
  • Databases role in different industries (39:14)
  • The brute force approach to discovery (45:57)
  • Retrieval augmented generation (51:26)
  • How generative model interacts with the database (57:55)
  • Final thoughts and takeaways (1:03:20)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

The PRQL: Enhancing Search and Recommendation Systems with Vector Databases with Bob van Luijt of Weaviate09 Oct 202300:05:10
In this bonus conversation, Eric and Kostas preview their upcoming conversation with Bob van Luijt of Weaviate.
158: The Orchestration Layer as the Data Platform Control Plane With Nick Schrock of Dagster Labs04 Oct 202301:02:18

Highlights from this week’s conversation include:

  • Nick’s background and journey in data (2:28)
  • Founding Dagster Labs (7:50)
  • The evolution of data engineering (12:32)
  • Fragmentation in data infrastructure (15:04)
  • The role of orchestration in data platforms (19:53)
  • The importance of operational tools for data pipelines (25:01)
  • Lessons learned from working with GraphQL (26:19)
  • The role of the orchestrator in data engineering (34:51)
  • The boundaries between data infrastructure and product engineering (37:33)
  • Different orchestrators in the data infrastructure landscape(42:03)
  • The role of MLOps in data engineering (46:04)
  • Data Quality and Orchestration (51:04)
  • Future of Data Teams and Orchestration (54:27)
  • Final thoughts and takeaways from (58:01)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

The PRQL: The Power of Data Orchestration: A Game-Changer for Data Infrastructure, Featuring Nick Schrock of Dagster Labs02 Oct 202300:03:29
In this bonus episode, Eric and Kostas preview their upcoming conversation with Nick Schrock of Dagster Labs.
157: From Search Engine to Answer Engine Using Grounded Generative AI, Featuring Amr Awadallah of Vectara27 Sep 202301:03:57

Highlights from this week’s conversation include:

  • Amr’s extensive background in data (3:23)
  • The evolution of neural networks (9:21)
  • The role of supervised learning in AI (11:17)
  • Explaining Vectara (13:07)
  • Papers that laid the foundation for AI (15:02)
  • Contextualized translation and personalization (20:07)
  • Ease of use and answer-based search (25:01)
  • AI and potential liabilities (35:54)
  • Minimizing difficulties in large language models (36:43)
  • The process of extracting documents in multidimensional space (44:47)
  • Summarization process (46:33)
  • The danger of humans misusing technology (54:59)
  • Final thoughts and takeaways (57:12)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

218: Breaking the Language Barrier Between Data and Business with Joyce Myers of Modern Technology Solutions04 Dec 202400:43:30

Highlights from this week’s conversation include:

  • Joyce's Background and Journey in Data (0:39)  
  • Technological Growth in Logistics (3:51)
  • Leadership and Communication in Logistics (6:54)
  • Impact of Data Quality (9:13)
  • Significance of Data Entry Accuracy (12:05)
  • Data's Role in Decision Making (16:01)
  • The Cost of Adding Data Points (21:26)
  • Real-Time Data in Logistics (24:28)
  • Understanding Master Data (31:15)
  • Data vs. Information Distinction (33:21)
  • Navigating Change in Data Management (37:35)
  • Career Advice for Data Practitioners and Parting Thoughts (41:10)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

The PRQL: How Can Large Language Models Revolutionize Decision-Making? Featuring Amr Awadallah of Vectara25 Sep 202300:05:00
In this bonus episode, Eric and Kostas preview their upcoming conversation with Amr Awadallah of Vectara.
156: Simple, Performant, Cost-effective Data Streaming with Alex Gallego of Redpanda Data20 Sep 202300:54:45

Highlights from this week’s conversation include:

  • Alex’s background in the data space and the creation of Redpanda (4:23)
  • The cost and complexity of streaming (11:07)
  • The evolution of storage with Kafka (12:04)
  • The distinction between streaming technologies (15:10)
  • Simplicity as a Core Design Principle (27:03)
  • Cost Efficiency in a Cloud Native Era (30:44)
  • Removing complexity with Redpanda (34:21)
  • Migrations and compatibility with Redpanda (40:35)
  • The Future of Redpanda (43:44)
  • The Story Behind Redpanda (46:45)
  • Final thoughts and takeaways (50:25)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

The PRQL: Redpanda: Revolutionizing Streaming Systems and Challenging the Kafka Status Quo with Alex Gallego18 Sep 202300:03:46
In this bonus episode, Eric and Kostas preview their upcoming conversation with Alex Gallego of Redpanda.
© My Podcast Data