Back

Explore every episode of the podcast GPT Reviews

Dive into the complete episode list for GPT Reviews. Each episode is cataloged with detailed descriptions, making it easy to find and explore specific topics. Keep track of all episodes from your favorite podcast and never miss a moment of insightful content.

Rows per page:

1–50 of 301

TitlePub. DateDuration
OpenAI's Strawberry Revolution 🍓 // Nvidia's Lucrative Paychecks 💸 // Google Pipe SQL Simplification 📊29 Aug 202400:14:01

This episode dives into OpenAI's promising new model, Strawberry, which could revolutionize interactions in ChatGPT. We explore the financial envy Nvidia employees inspire in their Google and Meta counterparts due to lucrative stock options. Google’s new Pipe SQL syntax aims to simplify data querying, while concerns about research accessibility are raised. Finally, we discuss BaichuanSEED and Dolphin models, which highlight advancements in extensible data collection and energy-efficient processing, paving the way for enhanced AI capabilities.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:40 OpenAI Races to Launch Strawberry

03:07 Google, Meta workers envy Nvidia staffers’ fat paychecks: ‘Bought a 100K car … all cash’

05:01 Google's New Pipe SQL Syntax

06:12 Fake sponsor

07:47 BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline

09:20 Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models

11:09 Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders

12:50 Outro

OpenAI's 'Strawberry' AI 🚀 // World's Fastest AI Inference ⚡ // Photo-realistic 3D Avatars 🎨28 Aug 202400:14:14

OpenAI's 'Strawberry' AI tackles complex math and programming with enhanced reasoning, while Cerebras claims to have launched the fastest AI inference, enabling real-time applications at competitive prices. The GenCA model revolutionizes avatar creation with photo-realistic, controllable 3D avatars, and the "Build-A-Scene" paper introduces interactive 3D layout control for text-to-image generation, enhancing creative fields with dynamic object manipulation.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

02:02 OpenAI Shows ‘Strawberry’ AI to the Feds and Uses It to Develop ‘Orion’

03:23 Cerebras Launches the World’s Fastest AI Inference

05:07 Diffusion Models Are Real-Time Game Engines

06:15 Fake sponsor

08:06 The Mamba in the Llama: Distilling and Accelerating Hybrid Models

09:42 GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars

11:16 Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation

13:04 Outro

Nvidia's Stock Struggles 📉 // Meta's AI Hallucinations 🤖 // Superconducting Microprocessors ⚡02 Aug 202400:14:41

This episode dives into Nvidia's stock struggles amid rising competition, while also unpacking Meta's AI blunders and the implications of "hallucinations" in tech. We explore cutting-edge superconducting microprocessors that promise unprecedented energy efficiency and highlight groundbreaking AI research, including eavesdropping techniques and advancements in reinforcement learning.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:50 Nvidia Sank Again Today -- Time to Buy the Artificial Intelligence (AI) Growth Stock Hand Over Fist?

03:09 Meta blames hallucinations after its AI said Trump rally shooting didn’t happen

04:52 Superconducting Microprocessors? Turns Out They're Ultra-Efficient

06:07 Fake sponsor

07:48 Deep-TEMPEST: Using Deep Learning to Eavesdrop on HDMI from its Unintended Electromagnetic Emanations

09:22 SAPG: Split and Aggregate Policy Gradients

10:45 MindSearch: Mimicking Human Minds Elicits Deep AI Searcher

12:44 Outro

AI Secret Trading in China 💼 // Training Models at Scale 🚀 // Improving User Queries with Backtracing 🔍08 Mar 202400:14:56

A Google engineer has been indicted for allegedly stealing over 500 confidential files containing AI trade secrets while working for China-based companies seeking an edge in the AI technology race.

A tutorial series explores parallelism strategies for training large deep learning models, making it accessible to everyone regardless of the hardware you have available.

Value functions are a crucial component in deep reinforcement learning, and a new approach using categorical cross-entropy instead of regression can significantly improve performance and scalability in a variety of domains.

Backtracing is the task of retrieving the text segment that most likely caused a user query, and it can help improve content delivery and communication by identifying linguistic triggers that influence user queries.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:33 Google engineer indicted over allegedly stealing AI trade secrets for China

03:57 Training Models at Scale Tutorial

05:24 Autogenerating a Book Series From Three Years of iMessages

06:22 Fake sponsor

08:16 Design2Code: How Far Are We From Automating Front-End Engineering?

10:09 Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

11:43 Backtracing: Retrieving the Cause of the Query

13:27 Outro

Perplexity vs. Google 🔍 // Microsoft vs. NYT ⚖️ // General Computer Control 💻07 Mar 202400:14:00

Perplexity AI is a search startup that's looking to take on Google by solving the inadequacies of searching the web. They are nearing unicorn status with a valuation of around $1 billion.

Microsoft is being sued by The New York Times for copyright infringement and abusing the newspaper’s intellectual property in training LLMs. Microsoft accuses the Times of "unsubstantiated" claims and compares the lawsuit to Hollywood's resistance to the VCR in the 70s.

A new paper introduces the concept of General Computer Control (GCC), which is the idea of building agents that can master any computer task by taking only screen images and producing keyboard and mouse operations as output. The authors propose a framework called Cradle that has strong reasoning abilities to ensure generalizability and self-improvement across various tasks.

A paper evaluates different tokenizer inference methods and their impact on the performance of downstream NLP tasks. The authors found that for the most commonly used tokenizers, greedy inference performs surprisingly well, and a recently-introduced contextually-informed tokenizer outperforms all others on morphological alignment.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:23 Perplexity Poised To Become Latest AI Startup To Hit Unicorn Status — Report

02:53 Microsoft compares The New York Times’ claims against OpenAI to Hollywood’s early fight against VCR

04:41 Training great LLMs entirely from ground zero in the wilderness as a startup

05:49 Fake sponsor

07:39 Towards General Computer Control: A Multimodal Agent for Red Dead Redemption II as a Case Study

09:15 Design2Code: How Far Are We From Automating Front-End Engineering?

11:01 Greed is All You Need: An Evaluation of Tokenizer Inference Methods

12:49 Outro

OpenAI vs Elon Musk 💻 // Automated Text Embeddings 📊 // Unified Time Series Model 📈06 Mar 202400:14:34

Groq, an AI chip startup, forms a new business unit and acquires Definitive Intelligence to expand its customer and developer ecosystem.

OpenAI responds to Elon Musk's lawsuit, revealing that Musk himself wanted "absolute control" over the company by merging it with Tesla.

A new Postgres extension called pg_vectorize automates the transformation and orchestration of text to embeddings, providing workflows for vector search and RAG.

UNITS, a unified time series model, achieves superior performance compared to task-specific models and repurposed natural language-based LLMs, demonstrating remarkable zero-shot, few-shot, and prompt learning capabilities.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

02:05 AI chip startup Groq forms new business unit, acquires Definitive Intelligence

03:49 OpenAI says Elon Musk wanted ‘absolute control’ of the company

05:29 pg_vectorize: a VectorDB for Postgres

06:29 Fake sponsor

08:26 Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models

09:54 UniTS: Building a Unified Time Series Model

11:39 DéjàVu: KV-cache Streaming for Fast, Fault-tolerant Generative LLM Serving

13:14 Outro

Anthropic's Claude 3 🤖 // Elon Musk sues OpenAI 💥 // Unified Time Series Model 🎧05 Mar 202400:14:58

Anthropic's new and improved Claude 3 model family sets new industry benchmarks across a wide range of cognitive tasks, exhibiting near-human levels of comprehension and fluency on complex tasks.

Elon Musk is suing OpenAI and CEO Sam Altman for allegedly abandoning their original mission to benefit humanity and instead focusing on profits with Microsoft.

Opus 1.5 brings quality improvements, including machine learning-based upgrades, while remaining fully compatible with RFC 6716, and uses deep learning techniques to process or generate signals themselves.

The Multimodal ArXiv dataset represents an important step forward for LVLMs when it comes to interpreting and understanding complex scientific figures, achieving a 10.4% absolute accuracy gain on a multimodal mathematical reasoning benchmark.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:27 Introducing the next generation Claude: Claude 3

03:17 Elon Musk sues Sam Altman and OpenAI

04:59 Opus Gets a Serious Machine Learning Upgrade

06:29 Fake sponsor

08:28 UniTS: Building a Unified Time Series Model

10:10 Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models

12:06 Learning and Leveraging World Models in Visual Representation Learning

13:48 Outro

Adobe's GenAI for Audio 🎧 // User Data for AI Backlash 👀 // MOSAIC's Modular Cooking 🍲04 Mar 202400:14:51

Adobe's new generative AI tools for custom audio creation and editing.

Tumblr and WordPress selling user data to train AI tools, sparking backlash.

MOSAIC, a modular system for assistive and interactive cooking using natural language and multiple robots.

A new approach to real-world humanoid control using a causal transformer model trained through autoregressive prediction of sensorimotor trajectories.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:20 Adobe previews new cutting-edge generative AI tools for crafting and editing custom audio

02:39 Tumblr and WordPress to Sell Users’ Data to Train AI Tools

04:14 “AI will cure cancer” misunderstands both AI and medicine

05:56 Fake sponsor

08:12 MOSAIC: A Modular System for Assistive and Interactive Cooking

09:54 Humanoid Locomotion as Next Token Prediction

11:50 In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss

13:22 Outro

Meta's Llama 3 🦙 // Apple's GenAI 🍎 // Unsupervised RL via Reward Encoding 🤖01 Mar 202400:14:42

Meta Platforms is set to launch its new AI language model, Llama 3, which promises to tackle taboo questions with more grace and respect than its predecessor.

Apple is ramping up its investment in GenAI, with plans to upgrade Siri and iOS’ built-in search tool, Spotlight, with GenAI models to handle more complex queries and multi-turn conversations.

The University of California, Berkeley, has published a paper exploring unsupervised zero-shot reinforcement learning via functional reward encodings, which could enable pre-training of an agent to adapt to any new downstream tasks in a zero-shot manner.

TrustMol, an inverse molecular design method built to be trustworthy, has been proposed by the Max Planck Institute for Informatics, which could make the IMD process more explainable and reliable.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:52 Meta plans launch of new AI language model Llama 3 in July, The Information reports

02:56 Tim Cook says Apple will ‘break new ground’ in GenAI this year

04:35 Things You Should Never Do, Part I

05:46 Fake sponsor

07:28 Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings

09:04 TrustMol: Trustworthy Inverse Molecular Design via Alignment with Molecular Dynamics

10:57 Stochastic Gradient Succeeds for Bandits

13:23 Outro

Pichai on Google Controversy 🤡 // C3.ai's Revenue Surprises AI Market 📈 // 1-bit LLMs for Efficient Language Modeling 💾29 Feb 202400:13:14

Google's image creation tool, Gemini, has been generating offensive and embarrassing results, prompting the company to make structural changes and update product guidelines to avoid bias in AI tools.

C3.ai, a software maker that helps companies build AI applications, reported a narrower-than-expected loss and revenue that topped estimates, causing AI stock to pop more than 14% in extended trading.

A new paper introduces a cost-effective Large Language Model called a 1-bit LLM, which matches the performance of full-precision Transformer LLMs while being significantly more efficient in terms of latency, memory, throughput, and energy consumption.

Another paper proposes a hybrid approach that combines a frozen LLM with a small language model to improve the efficiency of autoregressive decoding for Large Language Models, resulting in substantial speedups of up to 4 times with minor performance penalties. Additionally, a new framework called EMO utilizes a direct audio-to-video synthesis approach to produce highly expressive and lifelike talking head videos.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:38 Google CEO calls AI tool’s controversial responses ‘completely unacceptable’

03:11 Artificial Intelligence Play C3.ai Climbs On Earnings Report, Outlook

04:41 Jason Wei On Sora

06:19 Fake sponsor

08:35 The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

09:19 Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding

10:38 EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

12:04 Outro

DeepMind's Genie 🧞 // More Hollywood AI Concerns 🤖 // MobileLLM for Efficient Language Models 📱28 Feb 202400:15:22

DeepMind's Genie, a tool that creates video games with just a prompt or an image, is a game-changer in the industry.

Tyler Perry's $800M studio expansion is on hold after seeing OpenAI's Sora, highlighting the potential for AI to replace human workers in the entertainment industry.

MobileLLM is a promising development for those looking to deploy efficient language models on mobile devices.

A comprehensive review of existing literature on data selection methods for language models provides a taxonomy of existing approaches and proposes promising avenues for future research. 

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:44 DeepMind's Genie: creating videogames with prompts

02:48 Tyler Perry Puts $800M Studio Expansion on Hold After Seeing OpenAI’s Sora: “Jobs Are Going to Be Lost”

04:41 Speakz AI

06:15 Fake sponsor

07:56 MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

09:57 A Survey on Data Selection for Language Models

11:26 Do Large Language Models Latently Perform Multi-Hop Reasoning?

13:44 Outro

Mistral New Models 🗣️ // Mistral-Microsoft Partnership 💻 // Input Length Impact on LLMs 🤔27 Feb 202400:13:21

Mistral AI has launched a new conversational assistant, Le Chat Mistral, which serves as an entry point to interact with their various models. They're also launching Le Chat Enterprise, which could be useful for businesses looking to boost productivity and efficiency.

Microsoft has partnered with Mistral, a French company focused on language models, and will be taking a minor stake in the company and offering their language models on Azure AI platform. Mistral is also releasing a new model called Mistral Large, which is designed to compete with OpenAI's GPT-4 model.

"Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models" by Levy et al. investigates how the performance of Large Language Models (LLMs) changes when the input length is extended. The authors found that there is a notable degradation in LLMs' reasoning performance at much shorter input lengths than their technical maximum.

"Executable Code Actions Elicit Better LLM Agents" proposes using executable Python code to consolidate LLM agents' actions into a unified action space called CodeAct. CodeAct outperforms widely used alternatives by up to 20% higher success rate and could have a lot of practical applications.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:39 Le Chat announced by Mistral AI

02:53 Microsoft partners with Mistral in second AI deal beyond OpenAI

04:29 Introducing Phind 70Billion

05:27 Fake sponsor

07:05 Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

08:47 Executable Code Actions Elicit Better LLM Agents

10:36 Cleaner Pretraining Corpus Curation with Neural Web Scraping

12:20 Outro

Google's Gemma 🌟 // Generalized Instruction Tuning 📚 // Multi-object Diffusion 🖼️22 Feb 202400:14:56

Gemma, a new family of lightweight, state-of-the-art open models built for responsible AI development, is introduced by Google.

"Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models" presents a new method for instruction tuning of Large Language Models (LLMs) called Generalized Instruction Tuning (GLAN).

"MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion" addresses the challenge of generating images of multiple objects with spatial relationships and attribute bindings.

"Instruction-tuned Language Models are Better Knowledge Learners" explores how to update factual knowledge in large language models. 

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:21 Google DeepMind Releases Gemma

03:28 Andrej Karpathy on Gemma's Tokenizer

04:16 Groq Inference Tokenomics: Speed, But At What Cost?

05:51 Fake sponsor

07:44 Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

09:38 MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion

11:06 Instruction-tuned Language Models are Better Knowledge Learners

12:58 Outro

Google's Gemma 2 vs. GPT-3.5 ⚔️ // Black Forest Labs' Flux Model 🌲 // Ethical Concerns in AI 🚨02 Aug 202400:14:44

This episode dives into Google’s Gemma 2, which claims to outperform GPT-3.5 while tackling responsible AI practices. We explore Black Forest Labs' Flux model, featuring 12 billion parameters and tailored versions for various users. Olivia sheds light on the ethical concerns surrounding the resurgence of pseudoscience in machine learning, particularly physiognomy. Lastly, Belinda reviews critical research on AI safety, advocating for clearer metrics to prevent misleading claims about safety advancements.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:37 Google’s tiny AI model bests GPT-3.5

02:48 Announcing Flux by Black Forest Labs: The Next Leap in Text-to-Image Models

04:28 The reanimation of pseudoscience in machine learning and its ethical repercussions

06:06 Fake sponsor

08:04 MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts

09:55 Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models

11:41 Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?

13:33 Outro

Groq's AI Hardware 💻 // Japan's $67B Chip Bet 🎲 // Video Understanding 📹21 Feb 202400:14:02

Groq's AI hardware breakthroughs with LPU architecture achieving speeds of 500 tokens per second.

Japan's $67 billion investment to become a global chip powerhouse and insulate its economy from growing US-China tensions.

Neural Network Diffusion paper demonstrating that diffusion models can generate high-performing neural network parameters.

VideoPrism paper from Google Research achieving state-of-the-art performance on 30 out of 33 video understanding benchmarks.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:47 Groq Goes Viral with Crazy Fast AI Inference

03:01 Japan Bets $67 Billion to Become a Global Chip Powerhouse Once Again

04:54 My benchmark for large language models

06:01 Fake sponsor

07:54 Neural Network Diffusion

09:19 Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

11:16 VideoPrism: A Foundational Visual Encoder for Video Understanding

12:42 Outro

OpenAI's Challenge 🤝 // NVIDIA's Graphics Card 💻 // Advancements in AI Research 🔬20 Feb 202400:13:56

OpenAI's trademark claim for 'GPT' was rejected by the US Patent and Trademark Office, which could impact other AI companies using the term.

OpenAI's recent deal with Microsoft-backed tender offer led by venture firm Thrive Capital values the company at $80 billion, solidifying its position in the AI industry.

The NVIDIA A800 40GB Active Graphics Card is a powerful tool for AI and HPC workflows, with industry-leading performance and production-ready AI development software included.

Research papers on processing long documents using generative transformer models, creating a strong connection between vision and language models, and a tool for synthetic data generation and reproducible LLM workflows were discussed, highlighting advancements and challenges in the field of AI research.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:26 The U.S. Patent and Trademark Office has Rejected OpenAI's Generic 'GPT' Trademark

02:37 OpenAI valued at $80 billion after deal

04:08 NVIDIA A800 40GB Active Graphics Card

05:24 Fake sponsor

07:39 In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss

09:12 PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter

10:40 DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows

12:46 Outro

Karpathy Leaves OpenAI 💥 // Slack's New AI Features 🤖 // Preventing Election Misinformation 🗳️19 Feb 202400:14:20

Renowned AI researcher Andrej Karpathy departs from OpenAI for personal projects, leaving speculation about the company's internal issues. 

Slack introduces AI features for enterprise plans, including extractive summarization and a digest feature. 

Anthropic tests Prompt Shield, an AI tool that redirects users to authoritative sources of voting information to prevent election misinformation. 

Google Brain's "Generating Wikipedia by Summarizing Long Sequences" and Google DeepMind's "A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts" showcase the potential of AI in natural language generation and long-document reading comprehension. 

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:37 Andrej Karpathy departs OpenAI

02:49 Slack AI is here, letting you catch up on lengthy threads and unread messages

04:30 Anthropic takes steps to prevent election misinformation

06:10 Fake sponsor

08:11 Generating Wikipedia by Summarizing Long Sequences

09:37 A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts

11:13 ChatGPT vs LLaMA: Impact, Reliability, and Challenges in Stack Overflow Discussions

12:51 Outro

OpenAI's Sora: Text-to-Video 📹 // Google's Gemini 1.5 🚀 // Data-efficient LLMs 💾16 Feb 202400:14:42

OpenAI's announcement of Sora, a text to video model that can generate realistic and imaginative scenes from text instructions.

Google's new Gemini 1.5, which delivers dramatically enhanced performance and achieves the longest context window of any large-scale foundation model yet.

"How to Train Data-Efficient LLMs" paper from Google DeepMind, UC San Diego, and Texas A&M University, which explores two data-efficient approaches to optimize the training of large language models.

"OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset" paper from NVIDIA, which presents a new math instruction tuning dataset called OpenMathInstruct-1, constructed using an open-source language model.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:38 OpenAI Announces Sora: a Text to Video Model

03:11 Google Introduces Gemini 1.5

05:27 Magika: AI powered fast and efficient file type identification

06:37 Fake sponsor

08:21 How to Train Data-Efficient LLMs

09:58 Generative Representational Instruction Tuning

11:28 OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset

13:22 Outro

Cohere's Aya Languag Model 🌍 // Personalized ChatGPT 🤖 // AI Romance 😍15 Feb 202400:15:03

Cohere's new language model Aya is making waves in the industry, providing a foundation for underserved languages in natural language understanding, summarization, and translation tasks.

OpenAI's new experiment for ChatGPT aims to provide more helpful and personalized responses in future conversations by allowing the chatbot to remember key details from prior chats.

People are seeking romantic connections with AI programs, raising concerns about data privacy, security vulnerabilities, and potentially displacing human relationships.

BASE TTS, currently the largest text-to-speech model trained on 100K hours of public domain speech data, achieves state-of-the-art speech naturalness through a novel speech tokenization technique and emergent abilities when trained on large amounts of data.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:52 Cohere's New Language Model Aya

03:24 Memory and new controls for ChatGPT

04:58 Artificial intelligence, real emotion. People are seeking a romantic connection with the perfect bot

06:48 Fake sponsor

09:03 Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model

10:36 BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

12:13 Transformers Can Achieve Length Generalization But Not Robustly

13:53 Outro

ChatGPT's Memory Feature 🤔 // Nvidia Founder Dismisses AI Investment Proposal 💸 // M2-BERT for Long-Context Retrieval 📈14 Feb 202400:15:09

From ChatGPT's memory feature and its potential impact on privacy and efficiency, to Nvidia founder Jensen Huang's dismissal of OpenAI's $7 trillion AI investment proposal. The episode also delves into V-STaR's approach to improving self-improvement in large language models, and M2-BERT's ability to handle long-context retrieval and outperform competitive baselines.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:41 Memory and new controls for ChatGPT

03:20 Nvidia Founder Jensen Huang Dismisses $7 Trillion AI Investment Figure Floated by OpenAI's Sam Altman

04:57 Stable Cascade

06:00 Fake sponsor

07:53 V-STaR: Training Verifiers for Self-Taught Reasoners

09:20 Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT

11:35 ODIN: Disentangled Reward Mitigates Hacking in RLHF

13:50 Outro

Super Bowl AI Commercials 🏈 // Reka Flash Language Model 🤖 // AMD's Open-Source CUDA 🎮13 Feb 202400:14:50

Companies are using AI in their Super Bowl commercials to showcase their products and services.

Reka Flash is a state-of-the-art language model that rivals the performance of larger models and is multilingual and multimodal.

AMD has funded an open-source CUDA implementation built on ROCm, allowing for CUDA-enabled software to run without developer intervention.

Keyframer is a design tool that uses Large Language Models to animate static images using natural language, showing the potential impact of LLMs in creative domains.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:41 Companies Hope Super Bowl AI Commercials Score With Viewers

03:05 Reka Flash: An Efficient and Capable Multimodal Language Model

04:36 AMD Quietly Funded A Drop-In CUDA Implementation Built On ROCm: It's Now Open-Source

06:14 Fake sponsor

08:24 Large Language Models: A Survey

10:11 DistiLLM: Towards Streamlined Distillation for Large Language Models

11:48 Keyframer: Empowering Animation Design using Large Language Models

13:49 Outro

ChatGPT API Price Cut 💰 // Nvidia CEO's Sovereign AI Call 🌐 // Animated Stickers 🎉12 Feb 202400:14:25

The ChatGPT API has reduced its prices, making it more accessible for developers to use. Nvidia CEO Huang is calling for governments to build sovereign AI infrastructure, while also addressing concerns about the dangers of AI. The Aya Dataset is a valuable resource for researchers looking to develop multilingual NLP models. Finally, the "Animated Stickers" paper introduces a model that generates high-quality animated stickers with interesting and relevant motion.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:51 ChatGPT API Reduced Prices

03:19 Nvidia CEO Huang says countries must build sovereign AI infrastructure

05:03 Adrej Karpathi on Learning

06:21 Fake sponsor

08:03 Animated Stickers: Bringing Stickers to Life with Video Diffusion

09:34 Feedback Loops With Language Models Drive In-Context Reward Hacking

11:29 Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning

13:15 Outro

Watermarks for DALLE 3 🌊 // TSMC's New Chip Factory in Japan 🇯🇵 // DeepMind's Self-Discover 🧩09 Feb 202400:14:11

OpenAI implements watermarks on images generated by DALL-E 3 to enhance the trustworthiness of digital information.

TSMC's plans to build a second chip factory in Japan could boost Japan's chip-making sector and position TSMC as a major player in the global chip-making industry.

"Fractal Patterns May Unravel the Intelligence in Next-Token Prediction" and "Self-Discover: Large Language Models Self-Compose Reasoning Structures" introduce new frameworks that could lead to more robust and comprehensive language models.

"Diffusion World Model" introduces a new model called DWM that can make long-horizon predictions in a single forward pass, making it a robust and efficient model for long-horizon prediction tasks.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:33 OpenAI’s ChatGPT Will Now Watermark Images Generated By DALL-E 3

02:50 TSMC to build second Japan chip factory, raising investment to $20 billion

04:54 NVIDIA’S “GRACE” ARM CPU HOLDS ITS OWN AGAINST X86 FOR HPC

06:02 Fake sponsor

08:14 Fractal Patterns May Unravel the Intelligence in Next-Token Prediction

09:41 Self-Discover: Large Language Models Self-Compose Reasoning Structures

11:13 Diffusion World Model

13:01 Outro

Gemini Takes Over 🚀 // FCC Ban on AI Voices 🚫 // Zero-Shot Generalization 🎯09 Feb 202400:12:19

Google's new Gemini release on Bard and the FCC's ban on AI-generated voices in robocalls are discussed, along with the paper "Learning to Route Among Specialized Experts for Zero-Shot Generalization" and "Ten Hard Problems in Artificial Intelligence We Must Get Right".

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:44 Google Announces Gemini release on Bard

03:34 FCC Makes AI-Generated Voices in Robocalls Illegal

05:19 Hybrid Bonding Process Flow - Advanced Packaging Part 5

06:38 Fake sponsor

08:15 Learning to Route Among Specialized Experts for Zero-Shot Generalization

08:18 Ten Hard Problems in Artificial Intelligence We Must Get Right

09:33 Large Language Model for Table Processing: A Survey

11:09 Outro

Apple's AI Feature Delay 📅 // SAM 2 Object Segmentation 🖼️ // Google's TPU Chips Shift ⚡30 Jul 202400:14:25
Apple’s delay in releasing AI features until October could affect iPhone 16 sales and customer excitement. The tech giant’s choice to use Google’s TPU chips instead of Nvidia marks a significant shift in AI hardware competition. Meta’s SAM 2 introduces groundbreaking real-time object segmentation with zero-shot generalization, revolutionizing visual content interaction. Additionally, Sony AI’s research presents a cost-effective approach to training diffusion models, democratizing access to advanced AI technology. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:54 Apple Intelligence Won't Be Released Until October 03:09 Apple used Google's chips to train two AI models, research paper shows 04:44 A Visual Guide to Quantization 05:38 Introducing SAM 2: The next generation of Meta Segment Anything Model for videos and images 06:41 Fake sponsor 08:46 Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget 10:28 Theia: Distilling Diverse Vision Foundation Models for Robot Learning 12:27 Outro
OpenAI's Democratic Inputs to AI💡 // MiniCPM-2B Language Model 🤯 // V-IRL Platform for Real-World AI 🌍06 Feb 202400:15:05

OpenAI's plan to involve the public in AI governance through "Democratic Inputs to AI" is a promising idea. MiniCPM-2B, an edge-side large language model, outperforms many other models on comprehensive benchmarks. DeepSeekMath 7B achieved a score of 51.7% on the MATH benchmark for mathematical reasoning. V-IRL is a platform that enables AI agents to interact with the real world, opening up possibilities for practical applications such as disaster response and environmental monitoring.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:28 Inside OpenAI’s Plan to Make AI More ‘Democratic’

03:46 MiniCPM: Unveiling the Potential of End-side Large Language Models

05:06 Beyond Token Prediction: the post-Pretraining journey of modern LLMs

06:56 Fake sponsor

08:36 DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

10:19 Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

12:13 V-IRL: Grounding Virtual Intelligence in Real Life

13:54 Outro

Gemini Ultra Is Close 🤖 // Open Source Assistant Creator 🌟 // AI and AR with ChatGPT 📷06 Feb 202400:12:58

Google is revamping its Bard chatbot under a new name, Gemini, with the launch of the highly anticipated Gemini Ultra model.

Hugging Face has launched an open source assistant creator, allowing users to create customizable AI assistants with just two clicks.

OpenAI has released a VisionOS ChatGPT app for the Apple Vision Pro, bringing AI and AR together.

Three interesting research papers were discussed, including Nomic Embed, Boximator, and StepCoder, all making strides in the field of AI.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:38 Leaked doc reveals Bard rebrand and Gemini Ultra launch

02:56 Hugging Face launches open source assistant creator

04:09 AI meets AR as ChatGPT is now available on the Apple Vision Pro

05:36 Fake sponsor

07:15 Nomic Embed: Training a Reproducible Long Context Text Embedder

08:34 Boximator: Generating Rich and Controllable Motions for Video Synthesis

10:11 StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

11:48 Outro

AI With Body-Cameras 🕵️ // AI Guide Feb 2024 📖 // Planning Capabilities Benchmark ✈️05 Feb 202400:14:38

AI to sift through body-cam footage, a comprehensive guide to AI in February 2024, the impact of utterance lengths on conversation models, and a new benchmark for testing the planning capabilities of language agents.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:29 Police Departments Are Turning to AI to Sift Through Millions of Hours of Unreviewed Body-Cam Footage

03:29 Your guide to AI: February 2024

04:45 Running Open-Source AI Models Locally With Ruby

06:08 Fake sponsor

08:10 Making a Long Story Short in Conversation Modeling

09:59 Large Language Models for Mathematical Reasoning: Progresses and Challenges

11:28 TravelPlanner: A Benchmark for Real-World Planning with Language Agents

13:28 Outro

AI Companies Market Drop 💸 // Meta's Custom Chip Artemis 🚀 // Amazon's Shopping AI Rufus 🛍️02 Feb 202400:14:17

Major AI players like Microsoft, Google, AMD, and Nvidia took a hit after their latest earnings reports failed to impress investors.

Meta is rolling out a new custom AI chip called Artemis in their data centers this year, which will complement the Nvidia H100 chips they've acquired.

Amazon has announced an AI-based shopping assistant called Rufus, which uses generative artificial intelligence to help users search for products.

The papers discussed in this episode cover topics such as open language models, enhancing the Winograd Schema Challenge, and mitigating reward overfitting and overoptimization in RLHF.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:37 AI companies lose $190 billion in market cap after Alphabet and Microsoft report

03:17 Exclusive: Meta to deploy in-house custom chips this year to power AI drive - memo

04:49 Amazon announces AI shopping assistant called Rufus

06:05 Fake sponsor

08:01 OLMo: Accelerating the Science of Language Models

09:22 WSC+: Enhancing The Winograd Schema Challenge Using Tree-of-Experts

11:15 Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF

12:58 Outro

Neuralink's BrainChip 🧠 // OpenAI's Chip Collaboration 💻 // YOLO-World Object Detection 🌎01 Feb 202400:14:25

Elon Musk's Neuralink implanting a chip in its first human brain, OpenAI exploring AI chip collaboration with Samsung and SK Group, and papers proposing new algorithms to improve the robustness of language models and enhance open-vocabulary object detection.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:48 Elon Musk says his Neuralink startup has implanted a chip in its first human brain

03:14 OpenAI CEO Sam Altman explores AI chip collaboration with Samsung and SK Group

04:59 Building an early warning system for LLM-aided biological threat creation

06:17 Fake sponsor

08:22 H2O-Danube-1.8B Technical Report

09:45 Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks

11:29 YOLO-World: Real-Time Open-Vocabulary Object Detection

13:05 Outro

Hallucinations Leaderboard 🤯 // Rebellions vs Nvidia 💪 // Microsoft's Future of Work 🏢31 Jan 202400:14:46

Hallucinations Leaderboard, AI Chip Startup Rebellions challenging Nvidia, and Microsoft's "Future of Work" report. Our experts also discuss innovative papers that focus on improving the efficiency and effectiveness of Large Language Models (LLMs) and Large Vision-Language Models (LVLMs) through new training strategies and architectures.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:25 The Hallucinations Leaderboard, an Open Effort to Measure Hallucinations in Large Language Models

03:22 AI Chip Startup Rebellions Snags Funding to Challenge Nvidia

05:06 Microsoft New Future of Work Report 2023

06:21 Fake sponsor

08:10 Scaling Sparse Fine-Tuning to Large Language Models

10:02 Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling

11:25 MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

13:26 Outro

Google's Bard Beats GPT-4 🆚 // SliceGPT's Sparsification 🪜 // ICE Self-Evolution 🔍30 Jan 202400:15:02

Google's Bard AI outperforms GPT-4 in HuggingFace's Chatbot Arena Leaderboard, while OpenAI partners with Common Sense Media to curate "family-friendly" chatbots in the GPT Store. The episode also delves into two fascinating research papers: SliceGPT introduces a new post-training sparsification scheme that reduces model parameters and maintains performance, while Investigate-Consolidate-Exploit (ICE) promotes the transfer of knowledge between tasks for genuine self-evolution, leading to more robust and autonomous AI agents.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:36 Bard from Google Outperforms GPT-4 in LLMSys Arena Benchmark

02:56 OpenAI partners with Common Sense Media to collaborate on AI guidelines

04:42 What I Talk About When I Talk About Query Optimizer (Part 1): IR Design

05:45 Fake sponsor

07:45 SliceGPT: Compress Large Language Models by Deleting Rows and Columns

09:48 Investigate-Consolidate-Exploit: A General Strategy for Inter-Task Agent Self-Evolution

11:40 Tweets to Citations: Unveiling the Impact of Social Media Influencers on AI Research Visibility

13:24 Outro

MambaByte 🔥 // Lumiere Video Generation AI 🎥 // Nvidia's RTX GPUs HDR Upgrade 🖥️26 Jan 202400:14:10

Google's Lumiere AI model for video generation, Nvidia's RTX GPUs using AI to upgrade SDR content to HDR, China's newest dark matter lab, CJPL-II, and MaLA-500, a large language model designed to cover 534 languages. These advancements showcase the immense progress in AI research and are expected to revolutionize video generation, language modeling, and dark matter research.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:34 Google Releases Lumiere, a New Video Generation AI

03:12 Nvidia’s RTX GPUs can now upgrade SDR content to HDR using AI

05:03 China's new dark matter lab is biggest and deepest yet

06:42 Fake sponsor

08:25 MambaByte: Token-free Selective State Space Model

09:49 VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks

11:34 MaLA-500: Massive Language Adaptation of Large Language Models

13:09 Outro

AI-enhanced Chrome 🌐 // OpenAI transparency concerns 🔍 // Multi-layered 3D assets 🎥25 Jan 202400:14:36

Google is using AI to enhance user experience and productivity in Chrome, while OpenAI's lack of transparency could damage their reputation in the industry. We also discuss GALA, a framework that can decompose a single-layer clothed 3D human mesh into complete multi-layered 3D assets, and AutoRT, which leverages vision-language models to enable autonomous robot learning.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:33 Google is using AI to organize and customize your Chrome browser

02:44 Google cancels contract with an AI data firm that’s helped train Bard

04:05 OpenAI Quietly Scrapped a Promise to Disclose Key Documents to the Public

05:41 Fake sponsor

07:26 GALA: Generating Animatable Layered Assets from a Single Scan

09:40 AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents

11:22 Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study

13:16 Outro

Microsoft hit by Russian hackers again 🛡️ // ElevenLabs' $80M unicorn status 🦄 // Realistic AI voices at TextReader.ai 🗣️24 Jan 202400:15:24

Microsoft was hit by another nation-state attack, this time by the same Russian group behind the SolarWinds attack.

ElevenLabs, a startup that just landed $80 million in funding and achieved unicorn status, is making it easier than ever to replace human voice actors with AI-generated voices.

TextReader.ai is a free text-to-speech generator with some of the most realistic AI voices.

"Is the Emergence of Life an Expected Phase Transition in the Evolving Universe?" challenges our current ideas about the emergence of life and opens up new avenues for research.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:25 Microsoft ‘senior leadership’ emails accessed by Russian SolarWinds hackers

03:18 Voice cloning startup ElevenLabs lands $80M, achieves unicorn status

05:29 Text Reader - Free text to speech generator with realistic AI voices

06:45 Fake sponsor

08:33 Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

10:10 Is the Emergence of Life an Expected Phase Transition in the Evolving Universe?

11:57 EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models

13:45 Outro

OpenAI's SearchGPT 🧐 // AI in Math Olympiad 🏅 // Unreliable AI Existential Risk 🔍29 Jul 202400:15:50

OpenAI's new prototype, SearchGPT, promises to combine AI smarts with real-time web information to make search easier.

AI has achieved silver-medal standards at the International Mathematical Olympiad, raising questions about the future of mathematics and the role of AI in solving complex problems.

The reliability of AI existential risk probabilities is called into question in a thought-provoking article, challenging the authority we often assign to these forecasts and calling for more scrutiny.

Three fascinating papers from UNC Chapel Hill, Google DeepMind, and a collaboration between Caltech and NVIDIA explore advancements in theorem proving, balancing fast and slow planning, and aligning large language models with Best-of-N distillation. These papers could transform the way we approach complex problems with language models and streamline the development of LLMs.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:54 OpenAI Announces SearchGPT

03:15 AI achieves silver-medal standard solving International Mathematical Olympiad problems

04:55 AI existential risk probabilities are too unreliable to inform policy

06:25 Fake sponsor

08:21 LeanDojo: Theorem Proving with Retrieval-Augmented Language Models

10:10 System-1.x: Learning to Balance Fast and Slow Planning with Language Models

12:01 BOND: Aligning LLMs with Best-of-N Distillation

13:43 Outro

OpenAI's Chip Factories 💻 // Perplexity AI's rabbit r1 🐇 // Code Prompting Improves LLMs 🔍23 Jan 202400:14:25

OpenAI plans to set up chip factories worth $100 billion to reduce reliance on existing chipmakers and tackle potential supply shortages.

The rabbit r1, which integrates Perplexity AI's technology to respond to user inquiries, has garnered substantial pre-order sales and offers a complimentary year of Perplexity Pro to early adopters.

"Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs" explores how code prompts can improve the performance of large language models on conditional reasoning tasks.

"R-Judge: Benchmarking Safety Risk Awareness for LLM Agents" introduces R-Judge, a benchmark that evaluates the proficiency of LLMs in judging safety risks given agent interaction records, revealing the importance of salient safety risk feedback.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:20 OpenAI plans to set up chip factories worth $100 billion: Report

02:47 The rabbit r1 will use Perplexity AI’s tech to answer your queries

04:24 LoRA From Scratch – Implement Low-Rank Adaptation for LLMs in PyTorch

05:12 Fake sponsor

07:11 Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs

08:32 RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture

10:45 R-Judge: Benchmarking Safety Risk Awareness for LLM Agents

12:56 Outro

AGI and Skin Cancer Detection 🧠 // Self-Rewarding Language Models 🏆 // Neurosymbolic Reasoners in Text-Based Games 🎮22 Jan 202400:15:42

Mark Zuckerberg's new goal of creating artificial general intelligence with Meta's AI research group and the FDA clearance granted for the first AI-powered medical device to detect all three common skin cancers are just some of the highlights. We also explore Self-Rewarding Language Models and Automatic Program Repair using Round-Trip Translation with Large Language Models, as well as Large Language Models as neurosymbolic reasoners for text-based games involving symbolic tasks.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:41 Mark Zuckerberg’s new goal is creating artificial general intelligence

03:27 FDA Clearance Granted for First AI-Powered Medical Device to Detect All Three Common Skin Cancers

05:37 The rise of AI as Magic

06:36 Fake sponsor

09:11 Self-Rewarding Language Models

10:46 A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models

12:27 Large Language Models Are Neurosymbolic Reasoners

14:23 Outro

Samsung's Galaxy AI 📱 // Meta's billions on Nvidia chips 💰 // Beware GPU vs CPU benchmarks ❌19 Jan 202400:15:26

Samsung introduces Galaxy AI platform with five key features, including Live Translate and Note Assist.

Meta is reportedly spending billions of dollars on Nvidia AI chips for AGI research.

Beware of misleading GPU vs CPU benchmarks, as pointed out in a blog post.

Three new research papers explore Reinforced Fine-Tuning for reasoning, Asynchronous Local-SGD Training for Language Modeling, and Vision Mamba for efficient visual representation learning with bidirectional state space models.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

02:08 Galaxy AI at the Samsumg Galaxy S24

03:39 Mark Zuckerberg indicates Meta is spending billions of dollars on Nvidia AI chips

05:36 Beware of misleading GPU vs CPU benchmarks

06:56 Fake sponsor

08:34 ReFT: Reasoning with Reinforced Fine-Tuning

10:24 Asynchronous Local-SGD Training for Language Modeling

12:18 Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model

14:15 Outro

Global AI Regulation 🌍 // Bill Gates' Predictions 💭 // Text-to-Video Metrics 🎥18 Jan 202400:14:35

The predictions of Bill Gates on how AI will transform our lives, and groundbreaking research in AI-generated audio, text-to-video creation, and quantum-based noise reduction. Additionally, the proposed evaluation metric for text-to-video models, T2VScore, integrates Text-Video Alignment and Video Quality criteria to provide a more accurate reflection of human perception.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:47 AI - artificial intelligence - at Davos 2024: Rolling coverage and what to know

03:26 Bill Gates explains how AI will change our lives in 5 years

05:12 RAG Using Unstructured Data & Role of Knowledge Graphs

06:08 Fake sponsor

07:49 Masked Audio Generation using a Single Non-Autoregressive Transformer

09:32 Towards A Better Metric for Text-to-Video Generation

11:11 Quantum Denoising Diffusion Models

12:57 Outro

Combatting Misinformation with Digital Marks 🛡️ // Stable Code 3B 💻 // TinyML Potential 🌱16 Jan 202400:14:24

OpenAI's plan to combat election misinformation, Stable Code 3B's promise to revolutionize coding, and the potential uses of Tiny Machine Learning are all discussed. Additionally, the paper on the Unreasonable Effectiveness of Easy Training Data for Hard Tasks challenges previous assumptions about language models. Overall, this episode provides valuable insights into the latest developments in AI and technology.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:34 Here’s OpenAI’s big plan to combat election misinformation

02:46 Stable Code 3B: Coding on the Edge

04:29 What TinyML is

06:00 Fake sponsor

07:57 Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements

09:38 The Unreasonable Effectiveness of Easy Training Data for Hard Tasks

11:09 How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs

12:55 Outro

Nous Research's RLHF LLM 🤖 // Microsoft's AI-powered Office 💻 // Fine-tuning GPT-3.5 for "Connections" 🕹️16 Jan 202400:14:21

Nous Research has released their new flagship LLM, Nous-Hermes 2, which is the first model trained with RLHF and the first model to beat Mixtral Instruct in popular benchmarks.

Microsoft's Copilot Pro brings AI-powered Office features to consumers for $20 a month, including the ability to generate entire PowerPoint slide decks from a chatbot-like prompt and rephrase paragraphs in Word.

A blog post explores fine-tuning gpt-3.5-turbo to learn how to play "Connections", demonstrating the potential of fine-tuning language models for specific tasks.

Three research papers are discussed, including the effects of pretraining data curation on language models, a new benchmark for evaluating multimodal large language models on image-based wordplay puzzles, and the major shortcomings identified in these models.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:37 Nous Research Releases new flagship LLM

02:50 Microsoft’s new Copilot Pro brings AI-powered Office features to the rest of us

05:03 Fine-tuning gpt-3.5-turbo to learn to play "Connections"

06:08 Fake sponsor

08:02 AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters

09:36 AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters

11:19 REBUS: A Robust Evaluation Benchmark of Understanding Symbols

13:11 Outro

Apple's AI Plans 🍎 // $100M for Humanoid Robots 🤖 // Trustworthiness of Large Language Models 🔍15 Jan 202400:13:57

Apple's relocation request for their Siri team to the $100 million investment in 1X Technologies, listeners will learn about the latest developments in the AI industry. The TrustLLM study evaluates the trustworthiness of LLMs across six dimensions, while Intel Corporation proposes an efficient LLM inference solution.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:35 Apple asks its San Diego Siri quality control team to relocate to Texas

02:51 OpenAI-Backed Humanoid Maker Gets $100 Million in EQT-Led Round

04:28 Why autonomous trucking is harder than autonomous rideshare

05:39 Fake sponsor

07:26 TrustLLM: Trustworthiness in Large Language Models

09:34 Efficient LLM inference solution on Intel GPU

11:25 Transformers are Multi-State RNNs

12:56 Outro

Microsoft's Market Cap Soars with AI 🤖 // ChatGPT Team for small teams 💬 // Distilling Vision-Language Models 🎬12 Jan 202400:15:03

OpenAI has introduced a new plan called ChatGPT Team, which allows smaller teams to use their latest AI models without needing to know how to code.

Microsoft has overtaken Apple as the largest US company thanks to their AI boost, which has been attributed to their investments in AI and machine learning.

"The Impact of Reasoning Step Length on Large Language Models" challenges the traditional view that transformers are conceptually different from recurrent neural networks and provides a potential solution to a major computational issue.

"Distilling Vision-Language Models on Millions of Videos" proposes a method to fine-tune a video-language model from a strong image-language baseline with synthesized instructional data and then use it to auto-label millions of videos to generate high-quality captions. This method could significantly improve the quality of video captioning and retrieval.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:48 OpenAI debuts ChatGPT subscription aimed at small teams

03:14 Microsoft overtakes Apple as largest U.S. company on AI boost

04:51 Neural Network Quantization & Number Formats From First Principles

06:02 Fake sponsor

08:16 The Impact of Reasoning Step Length on Large Language Models

10:06 Transformers are Multi-State RNNs

11:45 Distilling Vision-Language Models on Millions of Videos

13:34 Outro

OpenAI's GPT Store 🤖 // Alexa's generative AI-powered experiences 🗣️ // MagicVideos and Lightning Attention ⚡11 Jan 202400:15:05
OpenAI Lawsuit 🤝 // Volkswagen AI Chatbot 🚗 // Python 3.13 JIT 🐍10 Jan 202400:13:42

More beef on the lawsuit against OpenAI, Volkswagen's new smart chatbot for cars, and the latest developments in Python and language modeling. The papers discussed showcase the potential for new techniques like Mixtral of Experts, MoE-Mamba, and FlightLLM to improve language processing and unlock new possibilities for scaling.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:27 OpenAI Fights Back Against New York Times Lawsuit

02:47 Volkswagen brings AI chatbot ChatGPT into its cars, SUVs

04:28 Python 3.13 gets a JIT

05:35 Fake sponsor

07:22 Mixtral of Experts

08:50 MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

10:20 FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGA

12:13 Outro

Mistral Large 2 🌍 // Memphis Supercluster 💻 // Emergence in Complex Systems 🧩26 Jul 202400:14:51

Mistral Large 2 release with advanced features and multilingual support.

Elon Musk's announcement of the Memphis Supercluster for creating the world's most powerful AI.

Discussion of emergence in complex systems and the MINT-1T dataset for training large multimodal models.

Introduction of OpenDevin, an open platform for developing AI agents and MOMAland, a benchmark framework for multi-objective multi-agent reinforcement learning.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:39 Mistral Large 2 Release

03:01 Elon Musk Announces Memphis Supercomputer

04:48 The Puzzle of How Large-Scale Order Emerges in Complex Systems

06:22 Fake sponsor

08:37 MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

10:16 OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

11:53 MOMAland: A Set of Benchmarks for Multi-Objective Multi-Agent Reinforcement Learning

13:31 Outro

Apple's MLX 🍎 // Duolingo's AI Cuts ✂️ // DeepSeek LLM's Superior Performance 💪09 Jan 202400:14:35

Apple's new MLX framework for on-device AI could shake up the AI race with its optimized design for Apple silicon and ecosystem of devices.

Duolingo's shift towards using AI to create more content and cutting contractors raises concerns about how AI technology will affect jobs in the long run.

The papers discussed in this episode showcase exciting advancements in open-source language models, including DeepSeek LLM's superior performance compared to GPT-3.5 and Alibaba Group's proposed system for supporting exceptionally long context lengths.

"Self-Contrast" is a new method proposed to improve the reflection capacity of Large Language Models (LLMs) by adaptively exploring diverse solving perspectives and generating a checklist to help LLMs re-examine and eliminate errors or inconsistencies.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:35 Apple ML Research releases MLX for on-device AI

02:53 Duolingo Cuts 10% of Contractors as It Uses More AI to Create App Content

04:26 Attacks on machine learning models

05:37 Fake sponsor

07:22 DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

09:09 Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache

10:51 Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives

13:06 Outro

OpenAI GPT Store Launch 🚪 // DeepMind's Composing LLMs 🤔 // Perplexity AI Natural Language Search 🔍08 Jan 202400:14:17

OpenAI's GPT Store launch, Perplexity AI's natural language search engine, and two papers proposing new approaches to improve LLMs' reflection capacity and expand their capabilities.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:42 OpenAI’s GPT Store launching next week

03:01 AI-powered search engine Perplexity AI, now valued at $520M, raises $73.6M

04:39 Our 2023 Year in Review

05:56 Fake sponsor

07:49 Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives

09:34 Instruct-Imagen: Image Generation with Multi-modal Instruction

10:59 LLM Augmented LLMs: Expanding Capabilities through Composition

12:48 Outro

Microsoft's Copilot Key ⌨️ // Samsung's Mobile AI 📱 // Photorealistic Avatars 🤖05 Jan 202400:14:10

Microsoft's Copilot key for PC keyboards, Samsung's upcoming AI advancements in their smartphone series, a framework for generating photorealistic avatars that gesture according to conversational dynamics, and MIT CSAIL's exploration of how language models learn about the visual world and their potential for training visual representation learning systems.

Contact:  sergi@earkind.com

Timestamps:

00:34 Introduction

01:37 Microsoft wants to add a Copilot key to your PC keyboard

02:59 Galaxy Unpacked 2024: Opening a New Era of Mobile AI

04:53 Efficient LLM inference

06:15 Fake sponsor

08:20 From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

09:50 Incremental FastPitch: Chunk-based High Quality Text to Speech

11:04 A Vision Check-up for Language Models

13:00 Outro

© My Podcast Data