The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI – Details, episodes & analysis

Podcast details

Technical and general information from the podcast's RSS feed.

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI

Astronomer

Technology

Frequency: 1 episode/48d. Total Eps: 57

CoHost
Welcome to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI— the podcast where we keep you up to date with insights and ideas propelling the Airflow community forward. Join us each week, as we explore the current state, future and potential of Airflow with leading thinkers in the community, and discover how best to leverage this workflow management system to meet the ever-evolving needs of data engineering and AI ecosystems. Podcast Webpage: https://www.astronomer.io/podcast/
Site
RSS
Apple

Recent rankings

Latest chart positions across Apple Podcasts and Spotify rankings.

Apple Podcasts

  • 🇫🇷 France - technology

    23/05/2025
    #95
  • 🇫🇷 France - technology

    22/05/2025
    #53

Spotify

    No recent rankings available



RSS feed quality and score

Technical evaluation of the podcast's RSS feed quality and structure.

See all
RSS feed quality
To improve

Score global : 48%


Publication history

Monthly episode publishing history over the past years.

Episodes published by month in

Latest published episodes

Recent episodes with titles, durations, and descriptions.

See all

Building an End-to-End Data Observability System at Netflix with Joseph Machado

Season 1 · Episode 40

jeudi 15 mai 2025Duration 38:54

Building reliable data pipelines starts with maintaining strong data quality standards and creating efficient systems for auditing, publishing and monitoring. In this episode, we explore the real-world patterns and best practices for ensuring data pipelines stay accurate, scalable and trustworthy.


Joseph Machado, Senior Data Engineer at Netflix, joins us to share practical insights gleaned from supporting Netflix’s Ads business as well as over a decade of experience in the data engineering space. He discusses implementing audit publish patterns, building observability dashboards, defining in-band and separate data quality checks, and optimizing data validation across large-scale systems.


Key Takeaways:

.

(03:14) Supporting data privacy and engineering efficiency within data systems.

(10:41) Validating outputs with reconciliation checks to catch transformation issues.

(16:06) Applying standardized patterns for auditing, validating and publishing data.

(19:28) Capturing historical check results to monitor system health and improvements.

(21:29) Treating data quality and availability as separate monitoring concerns.

(26:26) Using containerization strategies to streamline pipeline executions.

(29:47) Leveraging orchestration platforms for better visibility and retry capability.

(31:59) Managing business pressure without sacrificing data quality practices.

(35:46) Starting simple with quality checks and evolving toward more complex frameworks.


Resources Mentioned:


Joseph Machado

https://www.linkedin.com/in/josephmachado1991/


Netflix | LinkedIn

https://www.linkedin.com/company/netflix/


Netflix | Website

https://www.netflix.com/browse


Start Data Engineering

https://www.startdataengineering.com/


Apache Airflow

https://airflow.apache.org/


dbt Labs

https://www.getdbt.com/


Great Expectations

https://greatexpectations.io/


https://www.astronomer.io/events/roadshow/london/


https://www.astronomer.io/events/roadshow/new-york/

 

https://www.astronomer.io/events/roadshow/sydney/


https://www.astronomer.io/events/roadshow/san-francisco/


https://www.astronomer.io/events/roadshow/chicago/





Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.




#AI #Automation #Airflow #MachineLearning

Why Developer Experience Shapes Data Pipeline Standards at Next Insurance with Snir Israeli

Season 1 · Episode 39

jeudi 8 mai 2025Duration 30:28

Creating consistency across data pipelines is critical for scaling engineering teams and ensuring long-term maintainability.


In this episode, Snir Israeli, Senior Data Engineer at Next Insurance, shares how enforcing coding standards and investing in developer experience transformed their approach to data engineering. He explains how implementing automated code checks, clear documentation practices and a scoring system helped drive alignment across teams, improve collaboration and reduce technical debt in a fast-growing data environment.


Key Takeaways:


(02:59) Inconsistencies in code style create challenges for collaboration and maintenance.

(04:22) Programmatically enforcing rules helps teams scale their best practices.

(08:55) Performance improvements in data pipelines lead to infrastructure cost savings.

(13:22) Developer experience is essential for driving adoption of internal tools.

(19:44) Dashboards can operationalize standards enforcement and track progress over time.

(22:49) Standardization accelerates onboarding and reduces friction in code reviews.

(25:39) Linting rules require ongoing maintenance as tools and platforms evolve.

(27:47) Starting small and involving the team leads to better adoption and long-term success.


Resources Mentioned:


Snir Israeli

https://www.linkedin.com/in/snir-israeli/


Next Insurance | LinkedIn

https://www.linkedin.com/company/nextinsurance/


Next Insurance | Website

https://www.nextinsurance.com/


Apache Airflow

https://airflow.apache.org/


https://www.astronomer.io/events/roadshow/london/

   

https://www.astronomer.io/events/roadshow/new-york/   

 

https://www.astronomer.io/events/roadshow/sydney/ 

  

https://www.astronomer.io/events/roadshow/san-francisco/ 

  

https://www.astronomer.io/events/roadshow/chicago/ 





Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.




#AI #Automation #Airflow #MachineLearning

Customizing Airflow for Complex Data Environments at Stripe with Nick Bilozerov and Sharadh Krishnamurthy

Season 1 · Episode 30

jeudi 6 mars 2025Duration 27:40

Keeping data pipelines reliable at scale requires more than just the right tools — it demands constant innovation. In this episode, Nick Bilozerov, Senior Data Engineer at Stripe, and Sharadh Krishnamurthy, Engineering Manager at Stripe, discuss how Stripe customizes Airflow for its needs, the evolution of its data orchestration framework and the transition to Airflow 2. They also share insights on scaling data workflows while maintaining performance, reliability and developer experience. 


Key Takeaways:



(02:04) Stripe’s mission is to grow the GDP of the internet by supporting businesses with payments and data.

(05:08) 80% of Stripe engineers use data orchestration, making scalability critical.

(06:06) Airflow powers business reports, regulatory needs and ML workflows.

(08:02) Custom task frameworks improve dependencies and validation.

(08:50) "User scope mode" enables local testing without production impact.

(10:39) Migrating to Airflow 2 improves isolation, safety and scalability.

(16:40) Monolithic DAGs caused database issues, prompting a service-based shift.

(19:24) Frequent Airflow upgrades ensure stability and access to new features.

(21:38) DAG versioning and backfill improvements enhance developer experience.

(23:38) Greater UI customization would offer more flexibility.



Resources Mentioned:


Nick Bilozerov -

https://www.linkedin.com/in/nick-bilozerov/


Sharadh Krishnamurthy -

https://www.linkedin.com/in/sharadhk/


Apache Airflow -

https://airflow.apache.org/


Stripe | LinkedIn -

https://www.linkedin.com/company/stripe/


Stripe | Website -

https://stripe.com/




Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.




#AI #Automation #Airflow #MachineLearning

Harnessing Airflow for Data-Driven Policy Research at CSET with Jennifer Melot

Season 1 · Episode 29

jeudi 27 février 2025Duration 17:54

Turning complex datasets into meaningful analysis requires robust data infrastructure and seamless orchestration. In this episode, we’re joined by Jennifer Melot, Technical Lead at the Center for Security and Emerging Technology (CSET) at Georgetown University, to explore how Airflow powers data-driven insights in technology policy research. Jennifer shares how her team automates workflows to support analysts in navigating complex datasets. 


Key Takeaways:



(02:04) CSET provides data-driven analysis to inform government decision-makers.

(03:54) ETL pipelines merge multiple data sources for more comprehensive insights.

(04:20) Airflow is central to automating and streamlining large-scale data ingestion.

(05:11) Larger-scale databases create challenges that require scalable solutions.

(07:20) Dynamic DAG generation simplifies Airflow adoption for non-engineers.

(12:13) DAG Factory and dynamic task mapping can improve workflow efficiency.

(15:46) Tracking data lineage helps teams understand dependencies across DAGs.

(16:14) New Airflow features enhance visibility and debugging for complex pipelines.


Resources Mentioned:


Jennifer Melot -

https://www.linkedin.com/in/jennifer-melot-aa710144/


Center for Security and Emerging Technology (CSET) -

https://www.linkedin.com/company/georgetown-cset/


Apache Airflow -

https://airflow.apache.org/


Zenodo -

https://zenodo.org/


OpenLineage -

https://openlineage.io/


Cloud Dataplex -

https://cloud.google.com/dataplex




Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.




#AI #Automation #Airflow #MachineLearning

Leveraging Airflow To Build Scalable and Reliable Data Platforms at 99acres.com with Samyak Jain

Season 1 · Episode 28

jeudi 20 février 2025Duration 25:08

Data orchestration is evolving rapidly, with dynamic workflows becoming the cornerstone of modern data engineering. In this episode, we are joined by Samyak Jain, Senior Software Engineer - Big Data at 99acres.com. Samyak shares insights from his journey with Apache Airflow, exploring how his team built a self-service platform that enables non-technical teams to launch data pipelines and marketing campaigns seamlessly.


Key Takeaways:


(02:02) Starting a career in data engineering by troubleshooting Airflow pipelines.

(04:27) Building self-service portals with Airflow as the backend engine.

(05:34) Utilizing API endpoints to trigger dynamic DAGs with parameterized templates.

(09:31) Managing a dynamic environment with over 1,400 active DAGs.

(11:14) Implementing fault tolerance by segmenting data workflows into distinct layers.

(14:15) Tracking and optimizing query costs in AWS Athena to save $7K monthly.

(16:22) Automating cost monitoring with real-time alerts for high-cost queries.

(17:15) Streamlining Airflow metadata cleanup to prevent performance bottlenecks.

(21:30) Efficiently handling one-time and recurring marketing campaigns using Airflow.

(24:18) Advocating for Airflow features that improve resource management and ownership tracking.


Resources Mentioned:


Samyak Jain -

https://www.linkedin.com/in/samyak-jain-ab5830169/


99acres.com -

https://www.linkedin.com/company/99acres/


Apache Airflow -

https://airflow.apache.org/


AWS Athena -

https://aws.amazon.com/athena/


Kafka -

https://kafka.apache.org/




Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.



#AI #Automation #Airflow #MachineLearning

Hybrid Testing Solutions for Autonomous Driving at Bosch with Jens Scheffler and Christian Schilling

Season 1 · Episode 27

jeudi 13 février 2025Duration 33:45

Testing autonomous vehicles demands precision, scalability and powerful orchestration tools — enter Apache Airflow, a key component of Bosch’s cutting-edge testing framework. In this episode, we sit down with Jens Scheffler, Test Execution Cluster Technical Architect, and Christian Schilling, Product Owner Open Loop Testing Automated Driving, both at Bosch, to explore how Bosch harnesses Airflow to streamline complex testing scenarios. They share insights on scaling workflows, integrating hybrid infrastructures and ensuring vehicle safety through rigorous automated testing.


Key Takeaways:


(01:35) Airflow orchestrates millions of test hours for autonomous systems.

(03:15) Jens scales distributed systems with Kubernetes for job orchestration.

(06:02) Airflow runs hundreds of tests simultaneously.

(06:44) Virtual testing reduces costs and on-road trials.

(12:19) Unified APIs and GUIs streamline operations.

(15:05) Self-service setups empower Bosch teams.

(18:00) Physical hardware integration ensures real-world timing.

(20:30) Dynamic task mapping scales workflows efficiently.

(25:22) Open-source contributions improve stability.

(31:06) Edge and Celery executors power Bosch's hybrid scheduling.



Resources Mentioned:


Jens Scheffler -

https://www.linkedin.com/in/jens-scheffler/


Christian Schilling -

https://www.linkedin.com/in/christian-schilling-a5078831a/


Bosch -

https://www.linkedin.com/company/bosch/


Apache Airflow -

https://airflow.apache.org/


Kubernetes -

https://kubernetes.io


GitHub -

https://github.com


Edge Executor -

https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/index.html




Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.



#AI #Automation #Airflow #MachineLearning

Overcoming Airflow Scaling Challenges at Monzo Bank with Jonathan Rainer

Season 1 · Episode 26

vendredi 7 février 2025Duration 43:39

Scaling a data orchestration platform to manage thousands of tasks daily demands innovative solutions and strategic problem-solving. In this episode, we explore the complexities of scaling Airflow and the challenges of orchestrating thousands of tasks in dynamic data environments. Jonathan Rainer, Former Platform Engineer at Monzo Bank, joins us to share his journey optimizing data pipelines, overcoming UI limitations and ensuring DAG consistency in high-stakes scenarios. 


Key Takeaways:

(03:11) Using Airflow to schedule computation in BigQuery.

(07:02) How DAGs with 8,000+ tasks were managed nightly.

(08:18) Ensuring accuracy in regulatory reporting for banking.

(11:35) Handling task inconsistency and DAG failures with automation.

(16:09) Building a service to resolve DAG consistency issues in Airflow.

(25:05) Challenges with scaling the Airflow UI for thousands of tasks.

(27:03) The role of upstream and downstream task management in Airflow.

(37:33) The importance of operational metrics for monitoring Airflow health.

(39:19) Balancing new tools with root cause analysis to address scaling issues.

(41:35) Why scaling solutions require both technical and leadership buy-in



Resources Mentioned:


Jonathan Rainer -

https://www.linkedin.com/in/jonathan-rainer/


Monzo Bank -

https://www.linkedin.com/company/monzo-bank/


Apache Airflow -

https://airflow.apache.org/


BigQuery -

https://airflow.apache.org/docs/apache-airflow-providers-google/stable/operators/cloud/bigquery.html


Kubernetes -

https://kubernetes.io/




Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.




#AI #Automation #Airflow #MachineLearning

Orchestrating Analytics and AI Workflows at Telia with Arjun Anandkumar

Season 1 · Episode 25

jeudi 30 janvier 2025Duration 26:00

The future of data engineering lies in seamless orchestration and automation. In this episode, Arjun Anandkumar, Data Engineer at Telia, shares how his team uses Airflow to drive analytics and AI workflows. He highlights the challenges of scaling data platforms and how adopting best practices can simplify complex processes for teams across the organization. Arjun also discusses the transformative role of tools like Cosmos and Terraform in enhancing efficiency and collaboration. 


Key Takeaways:


(02:16) Telia operates across the Nordics and Baltics, focusing on telecom and energy services.

(03:45) Airflow runs dbt models seamlessly with Cosmos on AWS MWAA.

(05:47) Cosmos improves visibility and orchestration in Airflow.

(07:00) Medallion Architecture organizes data into bronze, silver and gold layers.

(08:34) Task group challenges highlight the need for adaptable workflows.

(15:04) Scaling managed services requires trial, error and tailored tweaks.

(19:46) Terraform scales infrastructure, while YAML templates manage DAGs efficiently.

(20:00) Templated DAGs and robust testing enhance platform management.

(24:15) Open-source resources drive innovation in Airflow practices.


Resources Mentioned:


Arjun Anandkumar -

https://www.linkedin.com/in/arjunanand1/?originalSubdomain=dk


Telia -

https://www.linkedin.com/company/teliacompany/


Apache Airflow -

https://airflow.apache.org/


Cosmos by Astronomer -

https://www.astronomer.io/cosmos/


Terraform -

https://www.terraform.io/


Medallion Architecture by Databricks -

https://www.databricks.com/glossary/medallion-architecture





Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.




#AI #Automation #Airflow #MachineLearning

The Role of Airflow in Finance Transformation at Etraveli Group with Mihir Samant

Season 1 · Episode 24

jeudi 23 janvier 2025Duration 21:19

Transforming bottlenecked finance processes into streamlined, automated systems requires the right tools and a forward-thinking approach. In this episode, Mihir Samant, Senior Data Analyst at Etraveli Group, joins us to share how his team leverages Airflow to revolutionize finance automation. With extensive experience in data workflows and a passion for open-source tools, Mihir provides valuable insights into building efficient, scalable systems. We explore the transformative power of Airflow in automating workflows and enhancing data orchestration within the finance domain. 


Key Takeaways:


(02:14) Etraveli Group specializes in selling affordable flight tickets and ancillary services.

(03:56) Mihir’s finance automation team uses Airflow to tackle month-end bottlenecks.

(06:00) Airflow's flexibility enables end-to-end automation for finance workflows.

(07:00) Open-source Airflow tools offer cost-effective solutions for new teams.

(08:46) Sensors and dynamic DAGs are pivotal features for optimizing tasks.

(13:30) GitSync simplifies development by syncing environments seamlessly.

(16:27) Plans include integrating Databricks for more advanced data handling.

(17:58) Airflow and Databricks offer multiple flexible methods to trigger workflows and execute SQL queries seamlessly.



Resources Mentioned:


Mihir Samant -

https://www.linkedin.com/in/misamant/?originalSubdomain=ca


Etraveli Group -

https://www.linkedin.com/company/etraveli-group/


Apache Airflow -

https://airflow.apache.org/


Docker -

https://www.docker.com/


Databricks -

https://www.databricks.com/





Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.




#AI #Automation #Airflow #MachineLearning

Inside Ford’s Data Transformation: Advanced Orchestration Strategies with Vasantha Kosuri-Marshall

Season 1 · Episode 23

jeudi 16 janvier 2025Duration 38:54

Data engineering is entering a new era, where orchestration and automation are redefining how large-scale projects operate. This episode features Vasantha Kosuri-Marshall, Data and ML Ops Engineer at Ford Motor Company. Vasantha shares her expertise in managing complex data pipelines. She takes us through Ford's transition to cloud platforms, the adoption of Airflow and the intricate challenges of orchestrating data in a diverse environment.



Key Takeaways:


(03:10) Vasantha’s transition to the Advanced Driving Assist Systems team at Ford.

(05:42) Early adoption of Airflow to orchestrate complex data pipelines.

(09:29) Ford's move from on-premise data solutions to Google Cloud Platform.

(12:03) The importance of Airflow's scheduling capabilities for efficient data management.

(16:12) Using Kubernetes to scale Airflow for large-scale data processing.

(19:59) Vasantha’s experience in overcoming challenges with legacy orchestration tools.

(22:22) Integration of data engineering and data science pipelines at Ford.

(28:03) How deferrable operators in Airflow improve performance and save costs.

(32:12) Vasantha’s insights into tuning Airflow properties for thousands of DAGs.

(36:09) The significance of monitoring and observability in managing Airflow instances.



Resources Mentioned:


Vasantha Kosuri-Marshall -

https://www.linkedin.com/in/vasantha-kosuri-marshall-0b0aab188/


Apache Airflow -

https://airflow.apache.org/


Google Cloud Platform (GCP) -

https://cloud.google.com/


Ford Motor Company | LinkedIn -

https://www.linkedin.com/company/ford-motor-company/


Ford Motor Company | Website -

https://www.ford.com/


Astronomer -

https://www.astronomer.io/




Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.




#AI #Automation #Airflow #MachineLearning


Related Shows Based on Content Similarities

Discover shows related to The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI, based on actual content similarities. Explore podcasts with similar topics, themes, and formats, backed by real data.
The Informed Life
On n'a jamais fait comme ça - Le podcast RH par & pour les DRH : Ressources humaines recrutement marque employeur et veille
In Depth
The Analytics Power Hour
The CPG Guys
Teaching in Higher Ed
Thinking Elixir Podcast
DevOps and Docker Talk: Cloud Native Interviews and Tooling
Business Travel 360
Social Pros Podcast
© My Podcast Data