The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI – Details, episodes & analysis
Podcast details
Technical and general information from the podcast's RSS feed.

The Data Flowcast: Mastering Apache Airflow ® for Data Engineering and AI
Astronomer
Frequency: 1 episode/48d. Total Eps: 57

Recent rankings
Latest chart positions across Apple Podcasts and Spotify rankings.
Apple Podcasts
🇫🇷 France - technology
23/05/2025#95🇫🇷 France - technology
22/05/2025#53
Spotify
No recent rankings available
Shared links between episodes and podcasts
Links found in episode descriptions and other podcasts that share them.
See all- https://stripe.com/
333 shares
- https://www.datadoghq.com/
237 shares
- https://cloud.google.com/
218 shares
RSS feed quality and score
Technical evaluation of the podcast's RSS feed quality and structure.
See allScore global : 48%
Publication history
Monthly episode publishing history over the past years.
Building an End-to-End Data Observability System at Netflix with Joseph Machado
Season 1 · Episode 40
jeudi 15 mai 2025 • Duration 38:54
Building reliable data pipelines starts with maintaining strong data quality standards and creating efficient systems for auditing, publishing and monitoring. In this episode, we explore the real-world patterns and best practices for ensuring data pipelines stay accurate, scalable and trustworthy.
Joseph Machado, Senior Data Engineer at Netflix, joins us to share practical insights gleaned from supporting Netflix’s Ads business as well as over a decade of experience in the data engineering space. He discusses implementing audit publish patterns, building observability dashboards, defining in-band and separate data quality checks, and optimizing data validation across large-scale systems.
Key Takeaways:
.
(03:14) Supporting data privacy and engineering efficiency within data systems.
(10:41) Validating outputs with reconciliation checks to catch transformation issues.
(16:06) Applying standardized patterns for auditing, validating and publishing data.
(19:28) Capturing historical check results to monitor system health and improvements.
(21:29) Treating data quality and availability as separate monitoring concerns.
(26:26) Using containerization strategies to streamline pipeline executions.
(29:47) Leveraging orchestration platforms for better visibility and retry capability.
(31:59) Managing business pressure without sacrificing data quality practices.
(35:46) Starting simple with quality checks and evolving toward more complex frameworks.
Resources Mentioned:
https://www.linkedin.com/in/josephmachado1991/
Netflix | LinkedIn
https://www.linkedin.com/company/netflix/
Netflix | Website
https://www.netflix.com/browse
https://www.startdataengineering.com/
https://airflow.apache.org/
https://www.getdbt.com/
https://greatexpectations.io/
https://www.astronomer.io/events/roadshow/london/
https://www.astronomer.io/events/roadshow/new-york/
https://www.astronomer.io/events/roadshow/sydney/
https://www.astronomer.io/events/roadshow/san-francisco/
https://www.astronomer.io/events/roadshow/chicago/
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow #MachineLearning
Why Developer Experience Shapes Data Pipeline Standards at Next Insurance with Snir Israeli
Season 1 · Episode 39
jeudi 8 mai 2025 • Duration 30:28
Creating consistency across data pipelines is critical for scaling engineering teams and ensuring long-term maintainability.
In this episode, Snir Israeli, Senior Data Engineer at Next Insurance, shares how enforcing coding standards and investing in developer experience transformed their approach to data engineering. He explains how implementing automated code checks, clear documentation practices and a scoring system helped drive alignment across teams, improve collaboration and reduce technical debt in a fast-growing data environment.
Key Takeaways:
(02:59) Inconsistencies in code style create challenges for collaboration and maintenance.
(04:22) Programmatically enforcing rules helps teams scale their best practices.
(08:55) Performance improvements in data pipelines lead to infrastructure cost savings.
(13:22) Developer experience is essential for driving adoption of internal tools.
(19:44) Dashboards can operationalize standards enforcement and track progress over time.
(22:49) Standardization accelerates onboarding and reduces friction in code reviews.
(25:39) Linting rules require ongoing maintenance as tools and platforms evolve.
(27:47) Starting small and involving the team leads to better adoption and long-term success.
Resources Mentioned:
https://www.linkedin.com/in/snir-israeli/
Next Insurance | LinkedIn
https://www.linkedin.com/company/nextinsurance/
Next Insurance | Website
https://www.nextinsurance.com/
https://airflow.apache.org/
https://www.astronomer.io/events/roadshow/london/
https://www.astronomer.io/events/roadshow/new-york/
https://www.astronomer.io/events/roadshow/sydney/
https://www.astronomer.io/events/roadshow/san-francisco/
https://www.astronomer.io/events/roadshow/chicago/
Thanks for listening to “The Data Flowcast: Mastering Apache Airflow® for Data Engineering and AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow #MachineLearning
Customizing Airflow for Complex Data Environments at Stripe with Nick Bilozerov and Sharadh Krishnamurthy
Season 1 · Episode 30
jeudi 6 mars 2025 • Duration 27:40
Keeping data pipelines reliable at scale requires more than just the right tools — it demands constant innovation. In this episode, Nick Bilozerov, Senior Data Engineer at Stripe, and Sharadh Krishnamurthy, Engineering Manager at Stripe, discuss how Stripe customizes Airflow for its needs, the evolution of its data orchestration framework and the transition to Airflow 2. They also share insights on scaling data workflows while maintaining performance, reliability and developer experience.
Key Takeaways:
(02:04) Stripe’s mission is to grow the GDP of the internet by supporting businesses with payments and data.
(05:08) 80% of Stripe engineers use data orchestration, making scalability critical.
(06:06) Airflow powers business reports, regulatory needs and ML workflows.
(08:02) Custom task frameworks improve dependencies and validation.
(08:50) "User scope mode" enables local testing without production impact.
(10:39) Migrating to Airflow 2 improves isolation, safety and scalability.
(16:40) Monolithic DAGs caused database issues, prompting a service-based shift.
(19:24) Frequent Airflow upgrades ensure stability and access to new features.
(21:38) DAG versioning and backfill improvements enhance developer experience.
(23:38) Greater UI customization would offer more flexibility.
Resources Mentioned:
https://www.linkedin.com/in/nick-bilozerov/
https://www.linkedin.com/in/sharadhk/
https://airflow.apache.org/
Stripe | LinkedIn -
https://www.linkedin.com/company/stripe/
Stripe | Website -
https://stripe.com/
Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow #MachineLearning
Harnessing Airflow for Data-Driven Policy Research at CSET with Jennifer Melot
Season 1 · Episode 29
jeudi 27 février 2025 • Duration 17:54
Turning complex datasets into meaningful analysis requires robust data infrastructure and seamless orchestration. In this episode, we’re joined by Jennifer Melot, Technical Lead at the Center for Security and Emerging Technology (CSET) at Georgetown University, to explore how Airflow powers data-driven insights in technology policy research. Jennifer shares how her team automates workflows to support analysts in navigating complex datasets.
Key Takeaways:
(02:04) CSET provides data-driven analysis to inform government decision-makers.
(03:54) ETL pipelines merge multiple data sources for more comprehensive insights.
(04:20) Airflow is central to automating and streamlining large-scale data ingestion.
(05:11) Larger-scale databases create challenges that require scalable solutions.
(07:20) Dynamic DAG generation simplifies Airflow adoption for non-engineers.
(12:13) DAG Factory and dynamic task mapping can improve workflow efficiency.
(15:46) Tracking data lineage helps teams understand dependencies across DAGs.
(16:14) New Airflow features enhance visibility and debugging for complex pipelines.
Resources Mentioned:
https://www.linkedin.com/in/jennifer-melot-aa710144/
Center for Security and Emerging Technology (CSET) -
https://www.linkedin.com/company/georgetown-cset/
https://airflow.apache.org/
Zenodo -
https://zenodo.org/
https://openlineage.io/
https://cloud.google.com/dataplex
Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow #MachineLearning
Leveraging Airflow To Build Scalable and Reliable Data Platforms at 99acres.com with Samyak Jain
Season 1 · Episode 28
jeudi 20 février 2025 • Duration 25:08
Data orchestration is evolving rapidly, with dynamic workflows becoming the cornerstone of modern data engineering. In this episode, we are joined by Samyak Jain, Senior Software Engineer - Big Data at 99acres.com. Samyak shares insights from his journey with Apache Airflow, exploring how his team built a self-service platform that enables non-technical teams to launch data pipelines and marketing campaigns seamlessly.
Key Takeaways:
(02:02) Starting a career in data engineering by troubleshooting Airflow pipelines.
(04:27) Building self-service portals with Airflow as the backend engine.
(05:34) Utilizing API endpoints to trigger dynamic DAGs with parameterized templates.
(09:31) Managing a dynamic environment with over 1,400 active DAGs.
(11:14) Implementing fault tolerance by segmenting data workflows into distinct layers.
(14:15) Tracking and optimizing query costs in AWS Athena to save $7K monthly.
(16:22) Automating cost monitoring with real-time alerts for high-cost queries.
(17:15) Streamlining Airflow metadata cleanup to prevent performance bottlenecks.
(21:30) Efficiently handling one-time and recurring marketing campaigns using Airflow.
(24:18) Advocating for Airflow features that improve resource management and ownership tracking.
Resources Mentioned:
https://www.linkedin.com/in/samyak-jain-ab5830169/
https://www.linkedin.com/company/99acres/
https://airflow.apache.org/
https://aws.amazon.com/athena/
Kafka -
https://kafka.apache.org/
Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow #MachineLearning
Hybrid Testing Solutions for Autonomous Driving at Bosch with Jens Scheffler and Christian Schilling
Season 1 · Episode 27
jeudi 13 février 2025 • Duration 33:45
Testing autonomous vehicles demands precision, scalability and powerful orchestration tools — enter Apache Airflow, a key component of Bosch’s cutting-edge testing framework. In this episode, we sit down with Jens Scheffler, Test Execution Cluster Technical Architect, and Christian Schilling, Product Owner Open Loop Testing Automated Driving, both at Bosch, to explore how Bosch harnesses Airflow to streamline complex testing scenarios. They share insights on scaling workflows, integrating hybrid infrastructures and ensuring vehicle safety through rigorous automated testing.
Key Takeaways:
(01:35) Airflow orchestrates millions of test hours for autonomous systems.
(03:15) Jens scales distributed systems with Kubernetes for job orchestration.
(06:02) Airflow runs hundreds of tests simultaneously.
(06:44) Virtual testing reduces costs and on-road trials.
(12:19) Unified APIs and GUIs streamline operations.
(15:05) Self-service setups empower Bosch teams.
(18:00) Physical hardware integration ensures real-world timing.
(20:30) Dynamic task mapping scales workflows efficiently.
(25:22) Open-source contributions improve stability.
(31:06) Edge and Celery executors power Bosch's hybrid scheduling.
Resources Mentioned:
https://www.linkedin.com/in/jens-scheffler/
https://www.linkedin.com/in/christian-schilling-a5078831a/
Bosch -
https://www.linkedin.com/company/bosch/
https://airflow.apache.org/
https://kubernetes.io
GitHub -
https://github.com
https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/executor/index.html
Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow #MachineLearning
Overcoming Airflow Scaling Challenges at Monzo Bank with Jonathan Rainer
Season 1 · Episode 26
vendredi 7 février 2025 • Duration 43:39
Scaling a data orchestration platform to manage thousands of tasks daily demands innovative solutions and strategic problem-solving. In this episode, we explore the complexities of scaling Airflow and the challenges of orchestrating thousands of tasks in dynamic data environments. Jonathan Rainer, Former Platform Engineer at Monzo Bank, joins us to share his journey optimizing data pipelines, overcoming UI limitations and ensuring DAG consistency in high-stakes scenarios.
Key Takeaways:
(03:11) Using Airflow to schedule computation in BigQuery.
(07:02) How DAGs with 8,000+ tasks were managed nightly.
(08:18) Ensuring accuracy in regulatory reporting for banking.
(11:35) Handling task inconsistency and DAG failures with automation.
(16:09) Building a service to resolve DAG consistency issues in Airflow.
(25:05) Challenges with scaling the Airflow UI for thousands of tasks.
(27:03) The role of upstream and downstream task management in Airflow.
(37:33) The importance of operational metrics for monitoring Airflow health.
(39:19) Balancing new tools with root cause analysis to address scaling issues.
(41:35) Why scaling solutions require both technical and leadership buy-in
Resources Mentioned:
https://www.linkedin.com/in/jonathan-rainer/
https://www.linkedin.com/company/monzo-bank/
https://airflow.apache.org/
BigQuery -
https://airflow.apache.org/docs/apache-airflow-providers-google/stable/operators/cloud/bigquery.html
https://kubernetes.io/
Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow #MachineLearning
Orchestrating Analytics and AI Workflows at Telia with Arjun Anandkumar
Season 1 · Episode 25
jeudi 30 janvier 2025 • Duration 26:00
The future of data engineering lies in seamless orchestration and automation. In this episode, Arjun Anandkumar, Data Engineer at Telia, shares how his team uses Airflow to drive analytics and AI workflows. He highlights the challenges of scaling data platforms and how adopting best practices can simplify complex processes for teams across the organization. Arjun also discusses the transformative role of tools like Cosmos and Terraform in enhancing efficiency and collaboration.
Key Takeaways:
(02:16) Telia operates across the Nordics and Baltics, focusing on telecom and energy services.
(03:45) Airflow runs dbt models seamlessly with Cosmos on AWS MWAA.
(05:47) Cosmos improves visibility and orchestration in Airflow.
(07:00) Medallion Architecture organizes data into bronze, silver and gold layers.
(08:34) Task group challenges highlight the need for adaptable workflows.
(15:04) Scaling managed services requires trial, error and tailored tweaks.
(19:46) Terraform scales infrastructure, while YAML templates manage DAGs efficiently.
(20:00) Templated DAGs and robust testing enhance platform management.
(24:15) Open-source resources drive innovation in Airflow practices.
Resources Mentioned:
https://www.linkedin.com/in/arjunanand1/?originalSubdomain=dk
Telia -
https://www.linkedin.com/company/teliacompany/
https://airflow.apache.org/
https://www.astronomer.io/cosmos/
https://www.terraform.io/
Medallion Architecture by Databricks -
https://www.databricks.com/glossary/medallion-architecture
Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow #MachineLearning
The Role of Airflow in Finance Transformation at Etraveli Group with Mihir Samant
Season 1 · Episode 24
jeudi 23 janvier 2025 • Duration 21:19
Transforming bottlenecked finance processes into streamlined, automated systems requires the right tools and a forward-thinking approach. In this episode, Mihir Samant, Senior Data Analyst at Etraveli Group, joins us to share how his team leverages Airflow to revolutionize finance automation. With extensive experience in data workflows and a passion for open-source tools, Mihir provides valuable insights into building efficient, scalable systems. We explore the transformative power of Airflow in automating workflows and enhancing data orchestration within the finance domain.
Key Takeaways:
(02:14) Etraveli Group specializes in selling affordable flight tickets and ancillary services.
(03:56) Mihir’s finance automation team uses Airflow to tackle month-end bottlenecks.
(06:00) Airflow's flexibility enables end-to-end automation for finance workflows.
(07:00) Open-source Airflow tools offer cost-effective solutions for new teams.
(08:46) Sensors and dynamic DAGs are pivotal features for optimizing tasks.
(13:30) GitSync simplifies development by syncing environments seamlessly.
(16:27) Plans include integrating Databricks for more advanced data handling.
(17:58) Airflow and Databricks offer multiple flexible methods to trigger workflows and execute SQL queries seamlessly.
Resources Mentioned:
https://www.linkedin.com/in/misamant/?originalSubdomain=ca
https://www.linkedin.com/company/etraveli-group/
https://airflow.apache.org/
Docker -
https://www.docker.com/
https://www.databricks.com/
Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow #MachineLearning
Inside Ford’s Data Transformation: Advanced Orchestration Strategies with Vasantha Kosuri-Marshall
Season 1 · Episode 23
jeudi 16 janvier 2025 • Duration 38:54
Data engineering is entering a new era, where orchestration and automation are redefining how large-scale projects operate. This episode features Vasantha Kosuri-Marshall, Data and ML Ops Engineer at Ford Motor Company. Vasantha shares her expertise in managing complex data pipelines. She takes us through Ford's transition to cloud platforms, the adoption of Airflow and the intricate challenges of orchestrating data in a diverse environment.
Key Takeaways:
(03:10) Vasantha’s transition to the Advanced Driving Assist Systems team at Ford.
(05:42) Early adoption of Airflow to orchestrate complex data pipelines.
(09:29) Ford's move from on-premise data solutions to Google Cloud Platform.
(12:03) The importance of Airflow's scheduling capabilities for efficient data management.
(16:12) Using Kubernetes to scale Airflow for large-scale data processing.
(19:59) Vasantha’s experience in overcoming challenges with legacy orchestration tools.
(22:22) Integration of data engineering and data science pipelines at Ford.
(28:03) How deferrable operators in Airflow improve performance and save costs.
(32:12) Vasantha’s insights into tuning Airflow properties for thousands of DAGs.
(36:09) The significance of monitoring and observability in managing Airflow instances.
Resources Mentioned:
https://www.linkedin.com/in/vasantha-kosuri-marshall-0b0aab188/
https://airflow.apache.org/
https://cloud.google.com/
Ford Motor Company | LinkedIn -
https://www.linkedin.com/company/ford-motor-company/
Ford Motor Company | Website -
https://www.ford.com/
https://www.astronomer.io/
Thanks for listening to “The Data Flowcast: Mastering Airflow for Data Engineering & AI.” If you enjoyed this episode, please leave a 5-star review to help get the word out about the show. And be sure to subscribe so you never miss any of the insightful conversations.
#AI #Automation #Airflow #MachineLearning









