Humans of Reliability – Détails, épisodes et analyse

Détails du podcast

Informations techniques et générales issues du flux RSS du podcast.

Humans of Reliability

Rootly

Technologie

Fréquence : 1 épisode/13j. Total Éps: 19

Behind every reliable software system, there are people working hard to keep it online.

Humans of Reliability is a series that spotlights the engineers, leaders, and innovators at the heart of incident management and system reliability. Through candid conversations, we explore the challenges, lessons, and personal journeys of those navigating complex technical landscapes to ensure the systems we rely on run smoothly.

From unforgettable incident stories to favorite tools, workflows, and hobbies, Humans of Reliability uncovers the human side of technology—offering insights and inspiration for anyone passionate about building and maintaining resilient systems.

https://rootly.com/humans-of-reliability

Site

RSS

Apple

Classements récents

Dernières positions dans les classements Apple Podcasts et Spotify.

Apple Podcasts

🇨🇦 Canada - technology
14/09/2025
#89
🇨🇦 Canada - technology
13/09/2025
#61

Spotify

Aucun classement récent disponible

Liens partagés entre épisodes et podcasts

Liens présents dans les descriptions d'épisodes et autres podcasts les utilisant également.

See all

https://www.linkedin.com/in/robzuber/
13 partages
https://www.linkedin.com/in/stevemcghee/
7 partages
https://www.linkedin.com/in/justinreock/
7 partages

https://www.youtube.com/watch?v=oL5U5-JFyxo&amp
1 partage
https://www.youtube.com/watch?v=LlU64gn8FIM&amp
1 partage

Qualité et score du flux RSS

Évaluation technique de la qualité et de la structure du flux RSS.

See all

Qualité du flux RSS

À améliorer

Score global : 53%

Historique des publications

Répartition mensuelle des publications d'épisodes au fil des années.

Year

Episodes published by month in

Derniers épisodes publiés

Liste des épisodes récents, avec titres, durées et descriptions.

See all

The End of “Good Code”? AI, Throughput, and Reliability with CircleCI CTO Rob Zuber

Saison 1 · Épisode 19

mercredi 10 septembre 2025 • Durée 37:38

Is “good code” still the right measure of engineering success in an AI-driven world? In this episode of Humans of Reliability, Rob Zuber, CircleCI CTO, joins Sylvain to explore how coding assistants are reshaping developer workflows and changing what teams value.

Rob shares what he’s seeing across CircleCI’s customer base: a clear boost in throughput, new bottlenecks shifting from code creation to code review, and the rise of “vibe coding,” where engineers trust AI-generated code they may not fully understand.

He challenges long-held assumptions about readability and maintainability, arguing that software engineering is on the edge of a paradigm shift. For SREs and developers alike, this conversation is a candid look at how to stay relevant, embrace simplicity, and rethink reliability in the age of AI.

Frontline Reliability: Protecting User Journeys with SLOs with Shery Brauner (Razor, ex-Zalando)

Saison 1 · Épisode 18

mercredi 20 août 2025 • Durée 31:03

What does it really take to move from firefighting incidents to building reliability at scale? In this episode of Humans of Reliability, Shery Brauner (Razor, ex-Zalando) shares her unique journey from frontend and backend engineering to leading site reliability practices. She explains why protecting the user journey is the key to effective incident management, how SLOs cut through noisy alerts, and why observability must come first.

Shery also talks about practical steps teams can take to adopt an SLO-driven strategy, the pitfalls of over-instrumentation, and how AI is shaping the future of incident response. Whether you’re an engineer, manager, or reliability leader, you’ll walk away with concrete ideas on how to protect what matters most: your customers’ experience.

Are AI and Platforms Making SRE Obsolete? With Kaspar von Grünberg, Humanitec’s CEO

lundi 24 mars 2025 • Durée 25:44

Last year, over 89% of companies claimed to have adopted platform engineering. And, in the past month, LLMs have been disrupting how we think about software development. In this context, Kaspar, asks if the role of Site Reliability Engineers is being obsolete as we know it. Kaspar argues that while SREs aren’t going anywhere, their responsibilities are evolving—fast.

We talk about:

The need for the SRE role to be transformed
How to build reliability as part every golden path
The role of AI and LLMs in Developer Experience
The limits of LLMs for reliability and infrastructure

Scientific Incident Management with Dan Slimmon

Saison 1 · Épisode 7

vendredi 14 mars 2025 • Durée 37:35

Dan Slimmon is an incident management veteran who's worked at Etsy, HashiCorp, and now leads consulting and training on pragmatic, non-bureaucratic incident response.

In this episode, Dan shares his philosophy on "scientific incident response," the importance of hypothesis-driven troubleshooting, and why incidents should be seen as normal in complex systems.

We also explore:

Why asking the right questions is more important than knowing all the answers.
How to use nerd sniping to unlock insights from engineers.
Common failure patterns he sees across organizations.

EPISODE LINKS:

Video and key takeaways
D2E Incident Leadership Course

How AI broke serverless and what to do about it with Vercel’s Mariano Fernández Cocirio

jeudi 6 mars 2025 • Durée 13:52

Mariano, Staff Product Manager at Vercel, explains why serverless architectures are hitting unexpected limits—they’re too fast.

The industry has spent millions optimizing serverless for speed, but AI workloads are changing the game. In the AI realm, slower execution often leads to better results. The challenge? Paying for all that idle compute time while waiting for AI responses.

Mariano explains how Vercel Fluid is introducing a new execution model that blends the best of serverless and traditional servers—scaling efficiently while reducing costs. Mariano breaks down Fluid’s architecture, its built-in reliability features, and how it redefines cloud computing for LLM-powered applications.

Tune in to learn how Fluid could reshape the industry and what it means for developers.

EPISODE LINKS:

I Want My Shoes Fast! Observability, SRE Burnout, and OTel with Dynatrace’s Adriana Villela

Saison 1 · Épisode 6

jeudi 27 février 2025 • Durée 34:23

In this episode, we sit down with Adriana Villela, Principal DevRel at Dynatrace and OpenTelemetry contributor to break down how observability impacts reliability.

We dive into what contributes to SRE burnout and how managers can create psychologically safer spaces for responders.

Adriana also shares her perspective on AI as an observability-buddy to navigate incidents.

SHOW LINKS:

Video and takeaways
Adriana’s podcast: Geeking Out with Adriana
Podcast with Hazel Weakly mentioned by Adriana

AI in Production with GitHub’s Sean Goedecke

Saison 1 · Épisode 5

mardi 18 février 2025 • Durée 17:33

In this episode, we sit down with Sean Goedecke, Staff Software Engineer at GitHub, to discuss where LLMs fit into real-world development.

Sean shares how he’s using LLMs how he’s drawing the line for AI-assistance in the codebases he manages—though, as he says, this might all change by next summer.

Sean also weighs in on how LLMs could assist SREs during outages—especially when you’re only half-awake at 3 a.m. after a rather inconvinient page.

Tune in for a nuanced take on the future of AI in software engineering, “vibe coding,” and the evolution of rubber ducks.

LINKS:

Video version and show notes
Sean’s blog (recommended)

The Reliability Diagnosis: Google’s Steve McGhee on Debugging and Incident Response

Saison 1 · Épisode 3

lundi 10 février 2025 • Durée 15:32

In this episode of Humans of Reliability, we sit down with Steve McGhee, Reliability Advocate at Google, to discuss his journey from early SRE work to advocating for reliability best practices.

Steve shares fascinating stories from his time at Google, the challenges of implementing SRE in enterprises, and what people often misunderstand about the discipline.

He also offers valuable insights on incident response, distributed systems, and the underrated skill every reliability engineer should master. Whether you're new to SRE or a seasoned professional, this conversation is packed with wisdom and practical takeaways.

This episode is also available as a video interview on YouTube.

No CS Degree, No Problem: Building a Career in Tech Leadership

Saison 1 · Épisode 3

mercredi 5 février 2025 • Durée 11:09

What does it take to lead service delivery at a company experiencing massive growth?

Hannah Hammonds, Service Delivery Lead at Prolific, shares her journey from an IT networking apprentice to a tech leader shaping reliability and incident response.

We discuss the evolving role of service delivery, the power of mentorship, and how confidence transforms careers.

Plus, we debate hot dogs, spoilers, and The Office.

Tune in for career insights, leadership lessons, and a few laughs! 🎙️🚀

This podcast episode is also available on YouTube if you want to see a video version of this interview.

Beyond SLOs: How an ex-Google SRE scaled reliability at the largest e-commerce in the nordics

Saison 1 · Épisode 2

lundi 3 février 2025 • Durée 07:34

What happens when a Google-trained SRE joins a fast-moving e-commerce company?

Gastón Rial Saibene, SRE Lead at Boozt.com, joins Humans of Reliability to talk about adapting reliability practices for different company sizes, the limits of SLOs, and the importance of automation.

We also dive into decision-making, his favorite books, and—just for fun—whether he’d survive a zombie apocalypse. Tune in for insights, laughs, and a fresh perspective on the world of reliability engineering!

Podcasts Similaires Basées sur le Contenu

Découvrez des podcasts liées à Humans of Reliability. Explorez des podcasts avec des thèmes, sujets, et formats similaires. Ces similarités sont calculées grâce à des données tangibles, pas d'extrapolations !