Back

Explore every episode of the podcast Super Data Science: ML & AI Podcast with Jon Krohn

Dive into the complete episode list for Super Data Science: ML & AI Podcast with Jon Krohn. Each episode is cataloged with detailed descriptions, making it easy to find and explore specific topics. Keep track of all episodes from your favorite podcast and never miss a moment of insightful content.

Rows per page:

1–50 of 1004

TitlePub. DateDuration
814: Summer Reflections30 Aug 202400:04:06
As summer winds down, this episode shifts focus from the usual tech discussions to something more personal: reflecting on the importance of balancing work with life’s simple pleasures. While the world of data science and AI continues to evolve rapidly, it's essential to remember that true success isn't just about professional milestones. It’s also about cherishing the moments that make life meaningful. Tune in for a brief but impactful reflection on how to redefine success to include not just achievements, but also the everyday joys that often go unnoticed. Additional materials: www.superdatascience.com/814  Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.
813: Solving Business Problems Optimally with Data, with Jerry Yurchisin27 Aug 202401:43:30
Jerry Yurchisin from Gurobi joins Jon Krohn to break down mathematical optimization, showing why it often outshines machine learning for real-world challenges. Find out how innovations like NVIDIA’s latest CPUs are speeding up solutions to problems like the Traveling Salesman in seconds. Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information. In this episode you will learn: • The Burrito Optimization Game and mathematical optimization use cases [03:36] • Key differences between machine learning and mathematical optimization [05:45] • How mathematical optimization is ideal for real-world constraints [13:50] • Gurobi’s APIs and the ease of integrating them [21:33] • How LLMs like GPT-4 can help with optimization problems [39:39] • Why integer variables are so complex to model [01:02:37] • NP-hard problems [01:11:01] • The history of optimization and its early applications [01:26:23] Additional materials: www.superdatascience.com/813
804: AI x Solar Power = Abundant Energy26 Jul 202400:13:57
Solar power now provides 6% of the world's electricity, thanks to rapid growth. Host Jon Krohn discusses the factors driving this rise, the challenges ahead, and how AI and data science are optimizing solar technologies. Tune in for insights on the future of solar power, and don't forget to like, share, and subscribe! Additional materials: www.superdatascience.com/804 Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.
714: Using A.I. to Overcome Blindness and Thrive as a Data Scientist15 Sep 202300:36:49
In this Friday episode, guest Tim Albiges explores with host Jon Krohn how people with blindness can have a lucrative and fulfilling career in data science, how Tim’s PhD thesis applied machine learning to help diagnose chronic respiratory diseases, and the communication tools that blind people can use to live a full and independent life.Additional materials: www.superdatascience.com/714Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
713: Llama 2, Toolformer and BLOOM: Open-Source LLMs with Meta's Dr. Thomas Scialom12 Sep 202301:25:35
Artificial General Intelligence, RLHF’s application in AI, and how entrepreneurs can enter the AI industry: Meta’s AI Research Scientist Thomas Scialom gives us behind-the-scenes insights into developing Llama 2 and what’s in the works for Llama 3. With host Jon Krohn, he discusses the future of Artificial General Intelligence, why the Galactica science-focused LLM was taken down, and what he learned from it.This episode is brought to you by AWS Inferentia, by Grafbase, the unified data layer, and by Modelbit, for deploying models in seconds. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• Llama 2: Behind the Scenes of Today’s Top Open-Source LLM [05:04]• Responsible use of Llama 2 [15:26]• Toolformer: LLM That Learns How to Use External Tools [24:57]• Galactica: The Science-Specific LLM and Why It Was Brought Down [36:57]• Is AGI Around the Corner? [57:03]• Advice for AI entrepreneurs [1:05:46]• How Thomas develops and manages large-scale AI projects [1:14:42]Additional materials: www.superdatascience.com/713
712: Code Llama08 Sep 202300:06:48
Code Llama might just be starting the revolution for how data scientists code. In this Five-Minute Friday, host Jon Krohn investigates the suite of models under the free-to-use Code Llama and how to find the best fit for your project’s needs.Additional materials: www.superdatascience.com/712Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
711: Image, Video and 3D-Model Generation from Natural Language, with Dr. Ajay Jain05 Sep 202301:26:03
In this episode, host Jon Krohn explores with his guest Ajay Jain, Co-Founder of Genmo.ai, how creative general intelligence could take the video industry by storm. They also discuss the models that got Genmo to this point, the applications of NeRF, and how understanding human psychology is so essential to developing models that output high-fidelity video.This episode is brought to you by the Zerve data science dev environment, by Grafbase, the unified data layer, and by Modelbit, for deploying models in seconds. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• About Genmo.ai and the term “creative general intelligence” [03:47]• Why Ajay started Genmo.ai [09:26]• The increased performance of multimodal models [21:12]• All about Denoising Diffusion Probabilistic Models (DDPMs) [31:03]• The application of Neural Radiance Fields (NeRF) [55:26]• Predicting pedestrian behavior at Uber [1:01:50]• How to save money in the process of training models [1:12:42]Additional materials: www.superdatascience.com/711
710: LangChain: Create LLM Applications Easily in Python01 Sep 202301:03:13
Discover the power of Large Language Models with Kris Ograbek as he unravels the intricacies of LangChain and showcases a chatbot in action, all while putting our host Jon Krohn in the hot seat!Additional materials: www.superdatascience.com/710Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
709: Big A.I. R&D Risks Reap Big Societal Rewards, with Meta's Dr. Laurens van der Maaten29 Aug 202301:20:39
Meta's Senior Research Director, Dr. Laurens van der Maaten, takes center stage to unravel the captivating realm of AI innovation. Learn about his groundbreaking contributions, including pioneering the t-SNE dimensionality reduction technique and harnessing AI for novel protein synthesis, climate change mitigation, and wearable materials simulation. Join us to explore the transformative power of AI across diverse domains and gain a glimpse into its future societal implications.This episode is brought to you by AWS Inferentia, by Modelbit, for deploying models in seconds, and by Grafbase, the unified data layer. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• Large-scale learning of image recognition models on web data [05:05]• Evolutionary Scale Modeling protein models [16:45]• Fighting climate change by building an A.I. model [29:49]• The CrypTen privacy-preserving ML framework [38:36]• Concerns about adversarial examples [53:25]• Laurens’ t-SNE algorithm [58:56]• How to make a big impact [1:07:25]Additional materials: www.superdatascience.com/709
708: ChatGPT Code Interpreter: 5 Hacks for Data Scientists25 Aug 202300:22:45
On this week’s Five-Minute Friday, host Jon Krohn gives five reasons why he is so excited about ChatGPT’s Code Interpreter and walks listeners through its capabilities with a practical example.Additional materials: www.superdatascience.com/708Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
707: Vicuña, Gorilla, Chatbot Arena and Socially Beneficial LLMs, with Prof. Joey Gonzalez22 Aug 202301:47:15
LLM Vicuña, Chatbot Arena, and the race to increase LLM context windows: This episode’s guest Joey Gonzalez talks to Jon Krohn about developing models and platforms that leverage and improve LLMs, as well as the future of AI development and access.This episode is brought to you by the AWS Insiders Podcast, by Modelbit, for deploying models in seconds, and by Grafbase, the unified data layer. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• Vicuña: How the revolutionary LLM came to be [03:35]• Chatbot Arena: The leading LLM leaderboard [09:47]• Trusting LLM results [17:54]• Gorilla: The open-source ChatGPT plugin alternative [32:13]• About LMSYS and long context windows [47:48]• Open- vs closed-source LLMs: Which is better? [1:01:39]• Aqueduct [1:16:49]• Founding GraphLab [1:27:02]• How AI will positively impact society in the coming decades [1:32:31]Additional materials: www.superdatascience.com/707
706: Large Language Model Leaderboards and Benchmarks18 Aug 202300:33:27
In this episode, Caterina Constantinescu dives deep into Large Language Models (LLMs), spotlighting top leaderboards, evaluation benchmarks, and real-world user perceptions. Plus, discover the challenges of dataset contamination and the intricacies of platforms like HELM and Chatbot Arena.Additional materials: www.superdatascience.com/706Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
705: Feeding the World with ML-Powered Precision Agriculture15 Aug 202301:29:11
Join Jon Krohn as he chats with Syngenta Group's Feroz Sheikh, Jeremy Groeteke, and Thomas Jung about the digital revolution in agriculture. Learn how data science is evolving farming, from precision techniques to global food solutions. A compelling blend of tech meets nature.This episode is brought to you by AWS Inferentia and by Modelbit, for deploying models in seconds. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• What is precision agriculture? [09:43]• What is computational agronomy? [12:30]• How Syngenta helps growers optimize yields [21:37]• How to bridge the gap between R&D and out in the real world [33:58]• What is generative chemistry? [37:52]• How generative chemistry accelerates the discovery of new compounds [41:55]• How you could make a big social impact in agriculture with data science [56:22]• How to go about designing ML models for agriculture [1:00:27]Additional materials: www.superdatascience.com/705
803: How to Thrive in Your (Data Science) Career, with Daliana Liu23 Jul 202401:54:43
Daliana Liu is a big name in data science teaching, and she has always been generous in sharing everything she knows about getting a job in data science. In this episode, she continues to extend her generosity, helping listeners define their approach to achieving a fulfilling career in data science and tech. This episode is brought to you by AWS Inferentia and AWS Trainium, by Babbel, the science-backed language-learning platform, and by Gurobi, the Decision Intelligence Leader. Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information. In this episode you will learn: • Common career challenges for data scientists [34:57] • Advice for people who don’t know where to go in their career [48:05] • How to build resilience and protect against Imposter Syndrome [1:06:23] • Skills that data scientists should develop today [1:39:17] • The future of the data science and AI job market [1:46:55] Additional materials: www.superdatascience.com/803
704: Jon’s “Generative A.I. with LLMs” Hands-on Training11 Aug 202300:04:54
Take on the world of GPT and learn to develop your own, commercially successful Large Language Models (LLMs) with Jon Krohn’s comprehensive, guided training video for generative AI. Get to grips with the technology, learn which tools to use, and find out how to get an eye for business-viable models with Jon’s (ad-)free educational video.Additional materials: www.superdatascience.com/704Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
703: How Data Happened: A History, with Columbia Prof. Chris Wiggins08 Aug 202301:09:20
Statistics history, interdisciplinarity, and data and society. Chris Wiggins talks with Jon Krohn about the power dynamics of data, the transformation of the field of biology through data-driven approaches to genetic sequencing, and the New York Times’ data science team’s cutting-edge approach to accommodating its tech stack.This episode is brought to you by the AWS Insiders Podcast and by Modelbit, for deploying models in seconds. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• The importance of the humanities in data science [09:18]• How data science “rearranges” power [17:19]• An overview of How Data Happened [20:36]• The controversial nature of Bayes theorem [29:16]• Why we need to consider data ethics [34:00]• How biology came to adopt data science into its field [45:44]• The data science tech stack at the New York Times [49:18]Additional materials: www.superdatascience.com/703
702: Llama 2 — It's Time to Upgrade your Open-Source LLM04 Aug 202300:10:56
This week, Jon Krohn is examining Meta's newly released open-source large language model, Llama 2, highlighting its commercial prospects, immense capacity, model variety, and unique 'time awareness' feature. He also discusses its innovative two-stage RLHF approach that enhances its performance.Additional materials: www.superdatascience.com/702Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
701: Generative A.I. without the Privacy Risks (with Prof. Raluca Ada Popa)01 Aug 202301:21:27
Dr. Raluca Ada Popa, renowned computer scientist, entrepreneur, and President of Opaque Systems, joins Jon Krohn to share her insights on securely interacting with AI APIs like OpenAI's GPT-4, the pros and cons of open vs. closed-source AI development, and the seamless operation of compute pipelines across multiple clouds.This episode is brought to you by AWS Inferentia and by Modelbit, for deploying models in seconds. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• What is a confidential computing platform? [04:31]• How to get started with confidential computing [12:10]• The challenges of confidential computing and LLMs [21:11]• How to safeguard your data while using commercial LLMs like GPT-4 [38:00]• Open-source vs closed-source [52:28]• Raluca's PreVail cybersecurity company [1:01:50]• Combining entrepreneurship and academic career [1:04:03]• DARE Program [1:10:39]Additional materials: www.superdatascience.com/701
700: "The Dream of Life" by Alan Watts28 Jul 202300:04:31
Yoga and Hindu mythology: This special episode continues the thread of our centenary episodes, SDS 500: Yoga Nidra with Jes Allen and SDS 600: Yoga Nidra Practice with Steve Fazzari, which talked through guided meditation techniques to help improve posture, sleep, and expand consciousness. Inspired by these sessions, host Jon Krohn explores Hindu mythology via Alan Watts’ “The Dream of Life”.Additional materials: www.superdatascience.com/700Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
699: The Modern Data Stack, with Harry Glaser25 Jul 202300:50:46
Model deployment, data warehouse options for running models, and how to best leverage BI tools: Harry Glaser and Jon Krohn discuss Modelbit’s capabilities to automate ML models from notebooks into production-ready models, reducing the time and effort in ‘translating’ information from one mode to another. Harry’s conversation with host Jon Krohn expanded on the importance of automating this task, and how developments in ML modeling have widened access to entire teams to analyze data, whatever their level of expertise.This episode is brought to you by the AWS Insiders Podcast. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• What the modern data stack is [03:28]• Version control for data scientists [13:30]• CI/CD, load balancing and logging [20:38]• Snowflake vs. Redshift [30:10]• How tools like Looker and Tableau help monitor models [35:26]Additional materials: www.superdatascience.com/699
698: How Firms Can Actually Adopt A.I., with Rehgan Avon21 Jul 202300:27:42
Company-wide AI adoption can take a lot of persuasion. Rehgan Avon talks to host Jon Krohn about why AI has become necessary for forward-thinking businesses and the steps to implement AI in an institution so that everyone benefits.Additional materials: www.superdatascience.com/698Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
697: The (Short) Path to Artificial General Intelligence, with Dr. Ben Goertzel18 Jul 202301:27:12
AI visionary and CEO of SingularityNET Dr. Ben Goertzel provides a deep dive into the possible realization of Artificial General Intelligence (AGI) within 3-7 years. Explore the intriguing connections between self-awareness, consciousness, and the future of Artificial Super Intelligence (ASI) and discover the transformative societal changes that could arise.This episode is brought to you by AWS Inferentia, by the AWS Insiders Podcast, and by Modelbit, for deploying models in seconds. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• Decentralized and benevolent AGI [03:13] • The SingularityNET ecosystem [13:10]• Dr. Goertzel's vision for realizing AGI - combining DL with neuro-symbolic systems, genetic algorithms and knowledge graphs [25:50]• How reaching AGI will trigger Artificial Super Intelligence [38:51]• Dr. Goertzel's approach to AGI using OpenCog Hyperon [42:34]• Why Dr. Goertzel believes AGI will be positive for humankind [53:07]• How to ensure the AGI is benevolent [1:06:43]• How AGI or ASI may act ethically [1:13:50]Additional materials: www.superdatascience.com/697
696: Brain-Computer Interfaces and Neural Decoding, with Prof. Bob Knight14 Jul 202301:02:45
Jon Krohn welcomes Professor Dr. Bob Knight to explore human intelligence, the prefrontal cortex, and the transformative potential of brain implants for data collection. Discover the pivotal role of machine learning in treating Parkinson's and delve into exciting future advancements.Additional materials: www.superdatascience.com/696Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
695: NLP with Transformers, feat. Hugging Face's Lewis Tunstall11 Jul 202301:38:04
What are transformers in AI, and how do they help developers to run LLMs efficiently and accurately? This is a key question in this week’s episode, where Hugging Face’s ML Engineer Lewis Tunstall sits down with host Jon Krohn to discuss encoders and decoders, and the importance of continuing to foster democratic environments like GitHub for creating open-source models.This episode is brought to you by the AWS Insiders Podcast, by WithFeeling.ai, the company bringing humanity into AI, and by Modelbit, for deploying models in seconds. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• What a transformer is, and why it is so important for NLP [04:34]• Different types of transformers and how they vary [11:39]• Why it’s necessary to know how a transformer works [31:52]• Hugging Face’s role in the application of transformers [57:10]• Lewis Tunstall’s experience of working at Hugging Face [1:02:08]• How and where to start with Hugging Face libraries [1:18:27]• The necessity to democratize ML models in the future [1:25:25]Additional materials: www.superdatascience.com/695
802: In Case You Missed It in June 202419 Jul 202400:23:55
How to grab investor interest with your AI startup idea, revisiting algorithms, and helping practitioners ensure AI safety with regulatory frameworks and beyond: This month, you missed a whole bunch of great interviews. But don’t worry, Jon Krohn is here to recap all the best bits for you! Additional materials: www.superdatascience.com/802 Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.
694: CatBoost: Powerful, efficient ML for large tabular datasets07 Jul 202300:07:59
Modeling tabular data and spreadsheets doesn’t have to be tedious with CatBoost’s open-source tree-boosting algorithm. CatBoost does what it says on the tin, blending categories with boosting that allows you to train your models faster and handle large datasets for ML tasks across multiple GPUs. In this week’s Five-Minute Friday, host Jon Krohn gets to grips with the technical components of CatBoost that give it the speed and accuracy so acclaimed by its users.Additional materials: www.superdatascience.com/694Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
693: YOLO-NAS: The State of the Art in Machine Vision, with Harpreet Sahota04 Jul 202301:20:15
Harpreet Sahota, a data science expert and deep learning developer at Deci AI, joins Jon Krohn to explore the fascinating realm of object detection and the revolutionary YOLO-NAS model architecture. Discover how machine vision models have evolved and the techniques driving compute-efficient edge device applications..This episode is brought to you by AWS Inferentia, by WithFeeling.ai, the company bringing humanity into AI, and by Modelbit, for deploying models in seconds. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• What is machine vision? [07:02]• Object detection and YOLO architectures [13:00]• Deci's YOLO-NAS: Optimal object detection model architecture [23:39]• Developer Relations [1:00:16]• Harpreet's 'top-down' approach to learning Deep Learning [1:06:50]Additional materials: www.superdatascience.com/693
692: Lossless LLM Weight Compression: Run Huge Models on a Single GPU30 Jun 202300:07:39
Join Jon as he navigates listeners through the innovative SpQR approach—a cutting-edge, lossless LLM weight compression technique that harnesses the power of quantization. Tune in as Jon delves into the four steps behind this groundbreaking method in this week's episode.Additional materials: www.superdatascience.com/692Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
691: A.I. Accelerators: Hardware Specialized for Deep Learning27 Jun 202301:34:34
GPUs vs CPUs, chip design and the importance of chips in AI research: This highly technical episode is for anyone who wants to learn what goes into chip development and how to get into the competitive industry of accelerator design. With advice from expert guest Ron Diamant, Senior Principal Engineer at AWS, you’ll get a breakdown of the need-to-know technical terms, what chip engineers need to think about during the design phase and what the future holds for processing hardware.This episode is brought to you by Posit, the open-source data science company, by the AWS Insiders Podcast, and by WithFeeling.ai, the company bringing humanity into AI. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• What CPUs and GPUs are [05:29]• The differences between accelerators used for deep learning [14:31]• Trainium and Inferentia: AWS's A.I. Accelerators [22:10]• If model optimizations will lead to lower demand for hardware to process them [43:14]• How a chip designer goes about production [48:34]• Breaking down the technical terminology for chips (accelerator interconnect, dynamic execution, collective communications) [55:29]• The importance of AWS Neuron, a software development kit [1:15:42]• How Ron got his foot in the door with chip design [1:26:40]Additional materials: www.superdatascience.com/691
690: How to Catch and Fix Harmful Generative A.I. Outputs23 Jun 202300:26:14
Krishna Gade, the founder and CEO of Fiddler.AI, discusses the challenges faced by Large Language Models (LLMs) in Generative AI, including inaccuracies, biases, and privacy risks. He emphasizes the importance of monitoring to build trust in AI and highlights Fiddler's explainability algorithms and pre-built bias detection tools as vital solutions.Additional materials: www.superdatascience.com/690Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
689: Observing LLMs in Production to Automatically Catch Issues20 Jun 202301:18:01
Arize's Amber Roberts and Xander Song join Jon Krohn this week, sharing invaluable insights into ML Observability, drift detection, retraining strategies, and the crucial task of ensuring fairness and ethical considerations in AI development.This episode is brought to you by Posit, the open-source data science company, by AWS Inferentia, and by Anaconda, the world's most popular Python distribution. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• What is ML Observability [05:07]• What is Drift [08:18]• The different kinds of model drift [15:31]• How frequently production models should be retrained? [25:15]• Arize's open-source product, Phoenix [30:49]• How ML Observability relates to discovering model biases [50:30]• Arize case studies [57:13]• What is a developer advocate [1:04:51]Additional materials: www.superdatascience.com/689
688: Six Reasons Why Building LLM Products Is Tricky16 Jun 202300:14:10
Prompt injection, prompt engineering, context windows, and more: In this week’s Five-Minute Friday, Jon explains why anyone looking to build their own product leveraging LLMs should stop to consider these and three more issues before jumping in. Phillip Carter first outlined these six issues in his article “All the Hard Stuff Nobody Talks About when Building Products with LLMs”.Additional materials: www.superdatascience.com/688Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
687: Generative Deep Learning, with David Foster13 Jun 202301:46:33
Autoencoders, transformers, latent space: Learn the elements of generative AI and hear what data scientist David Foster has to say about the potential for generative AI in music, as well as the role that world models play in blending generative AI with reinforcement learning.This episode is brought to you by Posit, the open-source data science company, by Anaconda, the world's most popular Python distribution, and by WithFeeling.ai, the company bringing humanity into AI. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• Generative modeling vs discriminative modeling [04:21]• Generative AI for Music [13:12]• On the threats of AI [23:15]• Autoencoders Explained [38:36]• Noise in Generative AI [48:11]• What CLIP models are (Contrastive Language-Image Pre-training) [54:07]• What World Models are [1:00:40]• What a Transformer is [1:11:14]• How to use transformers for music generation [1:19:50]Additional materials: www.superdatascience.com/687
686: Open-Source "Responsible A.I." Tools, with Ruth Yakubu09 Jun 202300:29:58
Mircosoft’s Ruth Yakubu joins Jon Krohn to discuss Responsible AI principles and the open-source Responsible AI Toolbox, allowing users to assess their models for fairness, inclusiveness, privacy, explainability, accountability, and reliability before deployment.Additional materials: www.superdatascience.com/686Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
685: Tools for Building Real-Time Machine Learning Applications, with Richmond Alake06 Jun 202301:06:19
Richmond Alake, a Machine Learning Architect at Slalom Build, sits down with Jon to share real-time ML insights, tools and career experiences for a high-energy and high impact episode. From his work at Slalom Build to his two AI startups, discover the software choices, ML tools, and front-end development techniques used by a leader in the field.This episode is brought to you by Posit, the open-source data science company, by AWS Inferentia, and by WithFeeling.ai, the company bringing humanity into AI. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• What is a Machine Learning Architect? [03:09]• Richmond's startups [12:07]• Why Richmond started a podcast [29:51]• Richmond's new course on feature stores [38:05]• Why Richmond produces data science content [43:25]• Why All Data Scientists Should Write [51:30]Additional materials: www.superdatascience.com/685
801: Merged LLMs Are Smaller And More Capable, with Arcee AI's Mark McQuade and Charles Goddard16 Jul 202401:17:05
Merged LLMs are the future, and we’re exploring how with Mark McQuade and Charles Goddard from Arcee AI on this episode with Jon Krohn. Learn how to combine multiple LLMs without adding bulk, train more efficiently, and dive into different expert approaches. Discover how smaller models can outperform larger ones and leverage open-source projects for big enterprise wins. This episode is packed with must-know insights for data scientists and ML engineers. Don’t miss out! Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information. In this episode you will learn: • Explanation of Charles' job title: Chief of Frontier Research [03:31] • Model Merging Technology combining multiple LLMs without increasing size [04:43] • Using MergeKit for model merging [14:49] • Evolutionary Model Merging using evolutionary algorithms [22:55] • Commercial applications and success stories [28:10] • Comparison of Mixture of Experts (MoE) vs. Mixture of Agents [37:57] • Spectrum Project for efficient training by targeting specific modules [54:28] • Future of Small Language Models (SLMs) and their advantages [01:01:22] Additional materials: www.superdatascience.com/801
684: Get More Language Context out of your LLM02 Jun 202300:05:49
Open-source LLMs, FlashAttention and generative AI terminology: Host Jon Krohn gives us the lift we need to explore the next big steps in generative AI. Listen to the specific way in which Stanford University’s “exact attention” algorithm, FlashAttention, could become a competitor for GPT-4’s capabilities.Additional materials: www.superdatascience.com/684Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
683: Contextual A.I. for Adapting to Adversaries, with Dr. Matar Haller30 May 202301:20:35
Monitoring malicious, user-generated content; contextual AI; adapting to novel evasion attempts: Matar Haller speaks to Jon Krohn about the challenges of identifying, analyzing and flagging malicious information online. In this episode, Matar explains how contextual AI and a “database of evil” can help resolve the multiple challenges of blocking dangerous content across a range of media, even those that are live-streamed.This episode is brought to you by Posit, the open-source data science company, by Anaconda, the world's most popular Python distribution, and by WithFeeling.ai, the company bringing humanity into AI. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• How ActiveFence helps its customers to moderate platform content [05:36]• How ActiveFence finds extreme social media users trying to evade detection [16:32]• How to monitor live-streaming content and analyze it for dangerous material [29:13]• The technologies ActiveFence uses to run its platform [35:54]• Matar’s experience of the Insight Fellows Program (Data Science Fellowship) [40:28]• Leadership opportunities for women in STEM [1:00:41]• Israel’s R&D edge for AI [1:13:19]Additional materials: www.superdatascience.com/683
682: Business Intelligence Tools, with Mico Yuk26 May 202300:27:36
In this week's episode, Mico Yuk, host of 'Analytics on Fire', joins Jon Krohn to share her effective business intelligence and analytics framework, BIDS, for persuading key decision makers. She crowns one "power" tool as the analytics king and discusses emerging tools that could challenge its dominance. Tune in for unapologetic insights on future and current BI trends and happenings from the world of BI and analytics.Additional materials: www.superdatascience.com/682Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
681: XGBoost: The Ultimate Classifier, with Matt Harrison23 May 202301:12:01
Unlock the power of XGBoost by learning how to fine-tune its hyperparameters and discover its optimal modeling situations. This and more, when best-selling author and leading Python consultant Matt Harrison teams up with Jon Krohn for yet another jam-packed technical episode! Are you ready to upgrade your data science toolkit in just one hour? Tune-in now!This episode is brought to you by Pathway, the reactive data processing framework, by Posit, the open-source data science company, and by Anaconda, the world's most popular Python distribution. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• Matt's book ‘Effective XGBoost’ [07:05]• What is XGBoost [09:09]• XGBoost's key model hyperparameters [19:01]• XGBoost's secret sauce [29:57]• When to use XGBoost [34:45]• When not to use XGBoost [41:42]• Matt’s recommended Python libraries [47:36]• Matt's production tips [57:57]Additional materials: www.superdatascience.com/681
680: Automating Industrial Machines with Data Science and the Internet of Things (IoT)19 May 202300:30:25
Industrial machinery’s dependence on data science, tech stacks to build IoT platforms, and transitioning from data science to product: This week’s Friday episode with Allegra Alessi explores the minutiae of product ownership for the Internet of Things at packaging company Bobst. Join host Jon Krohn and his guest as they unpack how the IoT is leading factory production.Additional materials: www.superdatascience.com/680Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
679: The A.I. and Machine Learning Landscape, with investor George Mathew16 May 202301:34:14
Generative AI, MLOps, and making smart investments in AI: This week’s episode is critical listening for AI investors and generative AI creators. AI investor George Mathew talks with host Jon Krohn about the emerging generative AI stack, the critical elements of MLOps to ensure a scalable model, and the tools developers can use for a saleable product.This episode is brought to you by Posit, the open-source data science company, by AWS Inferentia, and by Anaconda, the world's most popular Python distribution. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• Venture capital’s role in the technology startup ecosystem [05:59]• How RLHF helps UI become more intuitive [12:53]• The four layers of the generative AI stack [34:16]• The risks for generative AI business founders and investors [46:50]• How MLOps drive best practices and help implementation [56:33]• The importance of PLG (Product Lead Growth) [1:04:15]• How generative AI tools will impact the labor market [1:17:34]Additional materials: www.superdatascience.com/679
678: StableLM: Open-source "ChatGPT"-like LLMs you can fit on one GPU12 May 202300:11:39
StableLM, the new family of open-source language models from the brilliant minds behind Stable Diffusion is out! Small, but mighty, these models have been trained on an unprecedented amount of data for single GPU LLMs. This week, Jon breaks down the mechanics of this model–see you there! Additional materials: www.superdatascience.com/678 Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
677: Digital Analytics with Avinash Kaushik09 May 202301:27:54
How does one use marketing analytics to drive business success? Avinash Kaushik, Chief Strategy Officer at Croud and former Sr. Director of Global Strategic Analytics at Google joins Jon Krohn live for an exciting episode that covers the transformative power of AI, his 'four clusters of intent' framework and the value of hands-on data tools. This episode is brought to you by Pathway, the reactive data processing framework, by Posit, the open-source data science company, and by Anaconda, the world's most popular Python distribution. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information. In this episode you will learn: • What is a chief strategy officer? [3:55] • Brand vs performance analytics [7:23] • Incrementality-centric marketing [32:53] • Avinash's time at Google [37:54] • How to maintain human-touch with AI [48:58] • Four clusters of intent framework [1:11:28] • Avinash's most significant career challenges [1:17:18] Additional materials: www.superdatascience.com/677
676: The Chinchilla Scaling Laws05 May 202300:13:27
Chinchilla AI, and fine-tuning proprietary tasks with large language models: On this week’s Five-Minute Friday, host Jon Krohn outlines the principles of the Chinchilla Scaling Laws, the incredible power of models such as Cerebras-GPT based on these laws, and the impact of scaling on the number of viable applications and commercial use cases.Additional materials: www.superdatascience.com/676Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
675: Pandas for Data Analysis and Visualization02 May 202301:08:40
Wrangling data in Pandas, when to use Pandas, Matplotlib or Seaborn, and why you should learn to create Python packages: Jon Krohn speaks with guest Stefanie Molin, author of Hands-On Data Analysis with Pandas. This episode is brought to you by Posit, the open-source data science company, and by AWS Inferentia. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information. In this episode you will learn:• The advantages of using pandas over other libraries [07:55]• Why data wrangling in pandas is so helpful [12:05]• Stefanie’s Data Morph library [24:27]• When to use pandas, matplotlib, or seaborn [33:45]• Understanding the ticker module in matplotlib [36:48]• Where data analysts should start their learning journey [40:08]• What it’s like being a software engineer at Bloomberg [51:19] Additional materials: www.superdatascience.com/675
800: A Transformative Century of Technological Progress, with Annie P.12 Jul 202400:43:37
The SuperDataScience Podcast is celebrating its 800th episode! Host Jon Krohn speaks to his grandmother, Annie, about growing up at a time when so many technologies we take for granted today were yet to be developed. Listen in to hear Annie’s experience of the changes in technology across 94 years and how she and her family fared in 1940s Ukraine with no electricity or running water. Additional materials: www.superdatascience.com/800
674: Parameter-Efficient Fine-Tuning of LLMs using LoRA (Low-Rank Adaptation)28 Apr 202300:05:27
Models like Alpaca, Vicuña, GPT4All-J and Dolly 2.0 have relatively small model architectures, but they're prohibitively expensive to train even on a small amount of your own data. The standard model-training protocol can also lead to catastrophic forgetting. In this week's episode, Jon explores a solution to these problems, introducing listeners to Parameter-Efficient Fine-Tuning (PEFT) and the leading approach: Low-Rank Adaptation (LoRA).Additional materials: www.superdatascience.com/674Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
673: Taipy, the open-source Python application builder25 Apr 202301:12:01
Vincent Gosselin, CEO and co-founder of Taipy, an open-source Python library, joins Jon Krohn to discuss how to accelerate productivity in Python and build scalable, reusable, and maintainable data pipelines. Gosselin shares his breadth of wisdom honed over his decades-long AI career. This episode is brought to you by Pathway, the reactive data processing framework, and by Posit, the open-source data science company. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information. In this episode you will learn:• The Taipy library functionality [2:59]• The future of data pipelines [21:40]• Common trends of companies that are successful at adopting data pipelines [28:31]• How no-code and low-code trends impact the data science lifecycle [33:00]• How Vincent chose the programming languages that underpin Taipy [41:40]• Common trends on how companies manage their data to learn from it [45:06]• Vincent's perspective on AI winters [51:03] Additional materials: www.superdatascience.com/673
672: Open-source "ChatGPT": Alpaca, Vicuña, GPT4All-J, and Dolly 2.021 Apr 202300:16:50
Get started with language models: Learn about the commercial-use options available for your business in this week’s Five-Minute Friday, where host Jon Krohn discusses four models that have many of the capabilities of ChatGPT and can run at a fraction of the cost.Additional materials: www.superdatascience.com/672Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
© My Podcast Data