Data Skeptic – Details, episodes & analysis

Podcast details

Technical and general information from the podcast's RSS feed.

Data Skeptic

Data Skeptic

Kyle Polich

Technology
Science

Frequency: 1 episode/7d. Total Eps: 599

Libsyn
The Data Skeptic Podcast features interviews and discussion of topics related to data science, statistics, machine learning, artificial intelligence and the like, all from the perspective of applying critical thinking and the scientific method to evaluate the veracity of claims and efficacy of approaches.
Site
RSS

Recent rankings

Latest chart positions across Apple Podcasts and Spotify rankings.

Apple Podcasts

    No recent rankings available

Spotify

    No recent rankings available



RSS feed quality and score

Technical evaluation of the podcast's RSS feed quality and structure.

See all
RSS feed quality
To improve

Score global : 43%


Publication history

Monthly episode publishing history over the past years.

Episodes published by month in

Latest published episodes

Recent episodes with titles, durations, and descriptions.

See all

Shilling Attacks on Recommender Systems

mercredi 5 novembre 2025Duration 34:48

In this episode of Data Skeptic's Recommender Systems series, Kyle sits down with Aditya Chichani, a senior machine learning engineer at Walmart, to explore the darker side of recommendation algorithms. The conversation centers on shilling attacks—a form of manipulation where malicious actors create multiple fake profiles to game recommender systems, either to promote specific items or sabotage competitors. Aditya, who researched these attacks during his undergraduate studies at SPIT before completing his master's in computer science with a data science specialization at UC Berkeley, explains how these vulnerabilities emerge particularly in collaborative filtering systems. From promoting a friend's ska band on Spotify to inflating product ratings on e-commerce platforms, shilling attacks represent a significant threat in an industry where approximately 4% of reviews are fake, translating to $800 billion in annual sales in the US alone.

The discussion delves deep into collaborative filtering, explaining both user-user and item-item approaches that create similarity matrices to predict user preferences. However, these systems face various shilling attacks of increasing sophistication: random attacks use minimal information with average ratings, while segmented attacks strategically target popular items (like Taylor Swift albums) to build credibility before promoting target items. Bandwagon attacks focus on highly popular items to connect with genuine users, and average attacks leverage item rating knowledge to appear authentic. User-user collaborative filtering proves particularly vulnerable, requiring as few as 500 fake profiles to impact recommendations, while item-item filtering demands significantly more resources. Aditya addresses detection through machine learning techniques that analyze behavioral patterns using methods like PCA to identify profiles with unusually high correlation and suspicious rating consistency. However, this remains an evolving challenge as attackers adapt strategies, now using large language models to generate more authentic-seeming fake reviews. His research with the MovieLens dataset tested detection algorithms against synthetic attacks, highlighting how these concerns extend to modern e-commerce systems. While companies rarely share attack and detection data publicly to avoid giving attackers advantages, academic research continues advancing both offensive and defensive strategies in recommender systems security.

Music Playlist Recommendations

mercredi 29 octobre 2025Duration 52:29

In this episode, Rebecca Salganik, a PhD student at the University of Rochester with a background in vocal performance and composition, discusses her research on fairness in music recommendation systems. She explores three key types of fairness—group, individual, and counterfactual—and examines how algorithms create challenges like popularity bias (favoring mainstream content) and multi-interest bias (underserving users with diverse tastes). Rebecca introduces LARP, her multi-stage multimodal framework for playlist continuation that uses contrastive learning to align text and audio representations, learn song relationships, and create playlist-level embeddings to address the cold start problem.

A significant contribution of Rebecca's work is the Music Semantics dataset, created by scraping Reddit discussions to capture how people naturally describe music using atmospheric qualities, contextual comparisons, and situational associations rather than just technical features. This dataset, available on Hugging Face, enables more nuanced recommendation systems that better understand user preferences and support niche tastes. Her research utilizes industry datasets including Last.fm and Spotify's Million Playlist Dataset, and points toward exciting future applications in music generation and multimodal systems that combine audio, text, and video.

 

Complex Dynamic in Networks

samedi 28 juin 2025Duration 56:00

In this episode, we learn why simply analyzing the structure of a network is not enough, and how the dynamics - the actual mechanisms of interaction between components - can drastically change how information or influence spreads.  Our guest, Professor Baruch Barzel of Bar-Ilan University, is a leading researcher in network dynamics and complex systems ranging from biology to infrastructure and beyond. 

BarzelLab

BarzelLab on Youtube

Paper in focus: Universality in network dynamics, 2013

AGI Can Be Safe

lundi 26 juin 2023Duration 45:57

We are joined by Koen Holtman, an independent AI researcher focusing on AI safety. Koen is the Founder of Holtman Systems Research, a research company based in the Netherlands.

Koen started the conversation with his take on an AI apocalypse in the coming years. He discussed the obedience problem with AI models and the safe form of obedience.

Koen explained the concept of Markov Decision Process (MDP) and how it is used to build machine learning models.

Koen spoke about the problem of AGIs not being able to allow changing their utility function after the model is deployed. He shared another alternative approach to solving the problem. He shared how to engineer AGI systems now and in the future safely. He also spoke about how to implement safety layers on AI models.

Koen discussed the ultimate goal of a safe AI system and how to check that an AI system is indeed safe. He discussed the intersection between large language Models (LLMs) and MDPs. He shared the key ingredients to scale the current AI implementations.

AI Fails on Theory of Mind Tasks

lundi 19 juin 2023Duration 52:21

An assistant professor of Psychology at Harvard University, Tomer Ullman, joins us. Tomer discussed the theory of mind and whether machines can indeed pass it. Using variations of the Sally-Anne test and the Smarties tube test, he explained how LLMs could fail the theory of mind test.

AI for Mathematics Education

lundi 12 juin 2023Duration 35:36

The application of LLMs cuts across various industries. Today, we are joined by Steven Van Vaerenbergh, who discussed the application of AI in mathematics education. He discussed how AI tools have changed the landscape of solving mathematical problems. He also shared LLMs' current strengths and weaknesses in solving math problems.

Evaluating Jokes with LLMs

mardi 6 juin 2023Duration 43:11

Fabricio Goes, a Lecturer in Creative Computing at the University of Leicester, joins us today. Fabricio discussed what creativity entails and how to evaluate jokes with LLMs. He specifically shared the process of evaluating jokes with GPT-3 and GPT-4. He concluded with his thoughts on the future of LLMs for creative tasks.

Why Machines Will Never Rule the World

lundi 29 mai 2023Duration 55:15

Barry Smith and Jobst Landgrebe, authors of the book "Why Machines will never Rule the World," join us today. They discussed the limitations of AI systems in today's world. They also shared elaborate reasons AI will struggle to attain the level of human intelligence.

A Psychopathological Approach to Safety in AGI

mardi 23 mai 2023Duration 49:00

While the possibilities with AGI emergence seem great, it also calls for safety concerns. On the show, Vahid Behzadan, an Assistant Professor of Computer Science and Data Science, joins us to discuss the complexities of modeling AGIs to accurately achieve objective functions. He touched on tangent issues such as abstractions during training, the problem of unpredictability, communications among agents, and so on.

The NLP Community Metasurvey

lundi 15 mai 2023Duration 49:45

Julian Michael, a postdoc at the Center for Data Science, New York University, joins us today. Julian's conversation with Kyle was centered on the NLP community metasurvey: a survey aimed at understanding expert opinions on controversial NLP issues. He shared the process of preparing the survey as well as some shocking results.


Related Shows Based on Content Similarities

Discover shows related to Data Skeptic, based on actual content similarities. Explore podcasts with similar topics, themes, and formats, backed by real data.
The Informed Life
UI Breakfast: UI/UX Design and Product Strategy
The Diary Of A CEO with Steven Bartlett
Le Panier
TheBoldWay
The Startup Ideas Podcast
The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch
Inside the Strategy Room
REWORK
Serialously with Annie Elise
© My Podcast Data