Explore every episode of the podcast GPT Reviews
| Title | Pub. Date | Duration | |
|---|---|---|---|
| OpenAI's Strawberry Revolution 🍓 // Nvidia's Lucrative Paychecks 💸 // Google Pipe SQL Simplification 📊 | 29 Aug 2024 | 00:14:01 | |
This episode dives into OpenAI's promising new model, Strawberry, which could revolutionize interactions in ChatGPT. We explore the financial envy Nvidia employees inspire in their Google and Meta counterparts due to lucrative stock options. Google’s new Pipe SQL syntax aims to simplify data querying, while concerns about research accessibility are raised. Finally, we discuss BaichuanSEED and Dolphin models, which highlight advancements in extensible data collection and energy-efficient processing, paving the way for enhanced AI capabilities. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:40 OpenAI Races to Launch Strawberry 03:07 Google, Meta workers envy Nvidia staffers’ fat paychecks: ‘Bought a 100K car … all cash’ 05:01 Google's New Pipe SQL Syntax 06:12 Fake sponsor 09:20 Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models 11:09 Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders 12:50 Outro | |||
| OpenAI's 'Strawberry' AI 🚀 // World's Fastest AI Inference ⚡ // Photo-realistic 3D Avatars 🎨 | 28 Aug 2024 | 00:14:14 | |
OpenAI's 'Strawberry' AI tackles complex math and programming with enhanced reasoning, while Cerebras claims to have launched the fastest AI inference, enabling real-time applications at competitive prices. The GenCA model revolutionizes avatar creation with photo-realistic, controllable 3D avatars, and the "Build-A-Scene" paper introduces interactive 3D layout control for text-to-image generation, enhancing creative fields with dynamic object manipulation. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 02:02 OpenAI Shows ‘Strawberry’ AI to the Feds and Uses It to Develop ‘Orion’ 03:23 Cerebras Launches the World’s Fastest AI Inference 05:07 Diffusion Models Are Real-Time Game Engines 06:15 Fake sponsor 08:06 The Mamba in the Llama: Distilling and Accelerating Hybrid Models 09:42 GenCA: A Text-conditioned Generative Model for Realistic and Drivable Codec Avatars 11:16 Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation 13:04 Outro | |||
| Nvidia's Stock Struggles 📉 // Meta's AI Hallucinations 🤖 // Superconducting Microprocessors ⚡ | 02 Aug 2024 | 00:14:41 | |
This episode dives into Nvidia's stock struggles amid rising competition, while also unpacking Meta's AI blunders and the implications of "hallucinations" in tech. We explore cutting-edge superconducting microprocessors that promise unprecedented energy efficiency and highlight groundbreaking AI research, including eavesdropping techniques and advancements in reinforcement learning. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:50 Nvidia Sank Again Today -- Time to Buy the Artificial Intelligence (AI) Growth Stock Hand Over Fist? 03:09 Meta blames hallucinations after its AI said Trump rally shooting didn’t happen 04:52 Superconducting Microprocessors? Turns Out They're Ultra-Efficient 06:07 Fake sponsor 09:22 SAPG: Split and Aggregate Policy Gradients 10:45 MindSearch: Mimicking Human Minds Elicits Deep AI Searcher 12:44 Outro | |||
| AI Secret Trading in China 💼 // Training Models at Scale 🚀 // Improving User Queries with Backtracing 🔍 | 08 Mar 2024 | 00:14:56 | |
A Google engineer has been indicted for allegedly stealing over 500 confidential files containing AI trade secrets while working for China-based companies seeking an edge in the AI technology race. A tutorial series explores parallelism strategies for training large deep learning models, making it accessible to everyone regardless of the hardware you have available. Value functions are a crucial component in deep reinforcement learning, and a new approach using categorical cross-entropy instead of regression can significantly improve performance and scalability in a variety of domains. Backtracing is the task of retrieving the text segment that most likely caused a user query, and it can help improve content delivery and communication by identifying linguistic triggers that influence user queries. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:33 Google engineer indicted over allegedly stealing AI trade secrets for China 03:57 Training Models at Scale Tutorial 05:24 Autogenerating a Book Series From Three Years of iMessages 06:22 Fake sponsor 08:16 Design2Code: How Far Are We From Automating Front-End Engineering? 10:09 Stop Regressing: Training Value Functions via Classification for Scalable Deep RL 11:43 Backtracing: Retrieving the Cause of the Query 13:27 Outro | |||
| Perplexity vs. Google 🔍 // Microsoft vs. NYT ⚖️ // General Computer Control 💻 | 07 Mar 2024 | 00:14:00 | |
Perplexity AI is a search startup that's looking to take on Google by solving the inadequacies of searching the web. They are nearing unicorn status with a valuation of around $1 billion. Microsoft is being sued by The New York Times for copyright infringement and abusing the newspaper’s intellectual property in training LLMs. Microsoft accuses the Times of "unsubstantiated" claims and compares the lawsuit to Hollywood's resistance to the VCR in the 70s. A new paper introduces the concept of General Computer Control (GCC), which is the idea of building agents that can master any computer task by taking only screen images and producing keyboard and mouse operations as output. The authors propose a framework called Cradle that has strong reasoning abilities to ensure generalizability and self-improvement across various tasks. A paper evaluates different tokenizer inference methods and their impact on the performance of downstream NLP tasks. The authors found that for the most commonly used tokenizers, greedy inference performs surprisingly well, and a recently-introduced contextually-informed tokenizer outperforms all others on morphological alignment. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:23 Perplexity Poised To Become Latest AI Startup To Hit Unicorn Status — Report 02:53 Microsoft compares The New York Times’ claims against OpenAI to Hollywood’s early fight against VCR 04:41 Training great LLMs entirely from ground zero in the wilderness as a startup 05:49 Fake sponsor 07:39 Towards General Computer Control: A Multimodal Agent for Red Dead Redemption II as a Case Study 09:15 Design2Code: How Far Are We From Automating Front-End Engineering? 11:01 Greed is All You Need: An Evaluation of Tokenizer Inference Methods 12:49 Outro | |||
| OpenAI vs Elon Musk 💻 // Automated Text Embeddings 📊 // Unified Time Series Model 📈 | 06 Mar 2024 | 00:14:34 | |
Groq, an AI chip startup, forms a new business unit and acquires Definitive Intelligence to expand its customer and developer ecosystem. OpenAI responds to Elon Musk's lawsuit, revealing that Musk himself wanted "absolute control" over the company by merging it with Tesla. A new Postgres extension called pg_vectorize automates the transformation and orchestration of text to embeddings, providing workflows for vector search and RAG. UNITS, a unified time series model, achieves superior performance compared to task-specific models and repurposed natural language-based LLMs, demonstrating remarkable zero-shot, few-shot, and prompt learning capabilities. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 02:05 AI chip startup Groq forms new business unit, acquires Definitive Intelligence 03:49 OpenAI says Elon Musk wanted ‘absolute control’ of the company 05:29 pg_vectorize: a VectorDB for Postgres 06:29 Fake sponsor 08:26 Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models 09:54 UniTS: Building a Unified Time Series Model 11:39 DéjàVu: KV-cache Streaming for Fast, Fault-tolerant Generative LLM Serving 13:14 Outro | |||
| Anthropic's Claude 3 🤖 // Elon Musk sues OpenAI 💥 // Unified Time Series Model 🎧 | 05 Mar 2024 | 00:14:58 | |
Anthropic's new and improved Claude 3 model family sets new industry benchmarks across a wide range of cognitive tasks, exhibiting near-human levels of comprehension and fluency on complex tasks. Elon Musk is suing OpenAI and CEO Sam Altman for allegedly abandoning their original mission to benefit humanity and instead focusing on profits with Microsoft. Opus 1.5 brings quality improvements, including machine learning-based upgrades, while remaining fully compatible with RFC 6716, and uses deep learning techniques to process or generate signals themselves. The Multimodal ArXiv dataset represents an important step forward for LVLMs when it comes to interpreting and understanding complex scientific figures, achieving a 10.4% absolute accuracy gain on a multimodal mathematical reasoning benchmark. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:27 Introducing the next generation Claude: Claude 3 03:17 Elon Musk sues Sam Altman and OpenAI 04:59 Opus Gets a Serious Machine Learning Upgrade 06:29 Fake sponsor 08:28 UniTS: Building a Unified Time Series Model 10:10 Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models 12:06 Learning and Leveraging World Models in Visual Representation Learning 13:48 Outro | |||
| Adobe's GenAI for Audio 🎧 // User Data for AI Backlash 👀 // MOSAIC's Modular Cooking 🍲 | 04 Mar 2024 | 00:14:51 | |
Adobe's new generative AI tools for custom audio creation and editing. Tumblr and WordPress selling user data to train AI tools, sparking backlash. MOSAIC, a modular system for assistive and interactive cooking using natural language and multiple robots. A new approach to real-world humanoid control using a causal transformer model trained through autoregressive prediction of sensorimotor trajectories. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:20 Adobe previews new cutting-edge generative AI tools for crafting and editing custom audio 02:39 Tumblr and WordPress to Sell Users’ Data to Train AI Tools 04:14 “AI will cure cancer” misunderstands both AI and medicine 05:56 Fake sponsor 08:12 MOSAIC: A Modular System for Assistive and Interactive Cooking 09:54 Humanoid Locomotion as Next Token Prediction 11:50 In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss 13:22 Outro | |||
| Meta's Llama 3 🦙 // Apple's GenAI 🍎 // Unsupervised RL via Reward Encoding 🤖 | 01 Mar 2024 | 00:14:42 | |
Meta Platforms is set to launch its new AI language model, Llama 3, which promises to tackle taboo questions with more grace and respect than its predecessor. Apple is ramping up its investment in GenAI, with plans to upgrade Siri and iOS’ built-in search tool, Spotlight, with GenAI models to handle more complex queries and multi-turn conversations. The University of California, Berkeley, has published a paper exploring unsupervised zero-shot reinforcement learning via functional reward encodings, which could enable pre-training of an agent to adapt to any new downstream tasks in a zero-shot manner. TrustMol, an inverse molecular design method built to be trustworthy, has been proposed by the Max Planck Institute for Informatics, which could make the IMD process more explainable and reliable. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:52 Meta plans launch of new AI language model Llama 3 in July, The Information reports 02:56 Tim Cook says Apple will ‘break new ground’ in GenAI this year 04:35 Things You Should Never Do, Part I 05:46 Fake sponsor 07:28 Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings 09:04 TrustMol: Trustworthy Inverse Molecular Design via Alignment with Molecular Dynamics 10:57 Stochastic Gradient Succeeds for Bandits 13:23 Outro | |||
| Pichai on Google Controversy 🤡 // C3.ai's Revenue Surprises AI Market 📈 // 1-bit LLMs for Efficient Language Modeling 💾 | 29 Feb 2024 | 00:13:14 | |
Google's image creation tool, Gemini, has been generating offensive and embarrassing results, prompting the company to make structural changes and update product guidelines to avoid bias in AI tools. C3.ai, a software maker that helps companies build AI applications, reported a narrower-than-expected loss and revenue that topped estimates, causing AI stock to pop more than 14% in extended trading. A new paper introduces a cost-effective Large Language Model called a 1-bit LLM, which matches the performance of full-precision Transformer LLMs while being significantly more efficient in terms of latency, memory, throughput, and energy consumption. Another paper proposes a hybrid approach that combines a frozen LLM with a small language model to improve the efficiency of autoregressive decoding for Large Language Models, resulting in substantial speedups of up to 4 times with minor performance penalties. Additionally, a new framework called EMO utilizes a direct audio-to-video synthesis approach to produce highly expressive and lifelike talking head videos. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:38 Google CEO calls AI tool’s controversial responses ‘completely unacceptable’ 03:11 Artificial Intelligence Play C3.ai Climbs On Earnings Report, Outlook 04:41 Jason Wei On Sora 06:19 Fake sponsor 08:35 The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits 09:19 Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding 12:04 Outro | |||
| DeepMind's Genie 🧞 // More Hollywood AI Concerns 🤖 // MobileLLM for Efficient Language Models 📱 | 28 Feb 2024 | 00:15:22 | |
DeepMind's Genie, a tool that creates video games with just a prompt or an image, is a game-changer in the industry. Tyler Perry's $800M studio expansion is on hold after seeing OpenAI's Sora, highlighting the potential for AI to replace human workers in the entertainment industry. MobileLLM is a promising development for those looking to deploy efficient language models on mobile devices. A comprehensive review of existing literature on data selection methods for language models provides a taxonomy of existing approaches and proposes promising avenues for future research. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:44 DeepMind's Genie: creating videogames with prompts 04:41 Speakz AI 06:15 Fake sponsor 07:56 MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases 09:57 A Survey on Data Selection for Language Models 11:26 Do Large Language Models Latently Perform Multi-Hop Reasoning? 13:44 Outro | |||
| Mistral New Models 🗣️ // Mistral-Microsoft Partnership 💻 // Input Length Impact on LLMs 🤔 | 27 Feb 2024 | 00:13:21 | |
Mistral AI has launched a new conversational assistant, Le Chat Mistral, which serves as an entry point to interact with their various models. They're also launching Le Chat Enterprise, which could be useful for businesses looking to boost productivity and efficiency. Microsoft has partnered with Mistral, a French company focused on language models, and will be taking a minor stake in the company and offering their language models on Azure AI platform. Mistral is also releasing a new model called Mistral Large, which is designed to compete with OpenAI's GPT-4 model. "Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models" by Levy et al. investigates how the performance of Large Language Models (LLMs) changes when the input length is extended. The authors found that there is a notable degradation in LLMs' reasoning performance at much shorter input lengths than their technical maximum. "Executable Code Actions Elicit Better LLM Agents" proposes using executable Python code to consolidate LLM agents' actions into a unified action space called CodeAct. CodeAct outperforms widely used alternatives by up to 20% higher success rate and could have a lot of practical applications. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:39 Le Chat announced by Mistral AI 02:53 Microsoft partners with Mistral in second AI deal beyond OpenAI 04:29 Introducing Phind 70Billion 05:27 Fake sponsor 08:47 Executable Code Actions Elicit Better LLM Agents 10:36 Cleaner Pretraining Corpus Curation with Neural Web Scraping 12:20 Outro | |||
| Google's Gemma 🌟 // Generalized Instruction Tuning 📚 // Multi-object Diffusion 🖼️ | 22 Feb 2024 | 00:14:56 | |
Gemma, a new family of lightweight, state-of-the-art open models built for responsible AI development, is introduced by Google. "Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models" presents a new method for instruction tuning of Large Language Models (LLMs) called Generalized Instruction Tuning (GLAN). "MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion" addresses the challenge of generating images of multiple objects with spatial relationships and attribute bindings. "Instruction-tuned Language Models are Better Knowledge Learners" explores how to update factual knowledge in large language models. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:21 Google DeepMind Releases Gemma 03:28 Andrej Karpathy on Gemma's Tokenizer 04:16 Groq Inference Tokenomics: Speed, But At What Cost? 05:51 Fake sponsor 07:44 Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models 09:38 MuLan: Multimodal-LLM Agent for Progressive Multi-Object Diffusion 11:06 Instruction-tuned Language Models are Better Knowledge Learners 12:58 Outro | |||
| Google's Gemma 2 vs. GPT-3.5 ⚔️ // Black Forest Labs' Flux Model 🌲 // Ethical Concerns in AI 🚨 | 02 Aug 2024 | 00:14:44 | |
This episode dives into Google’s Gemma 2, which claims to outperform GPT-3.5 while tackling responsible AI practices. We explore Black Forest Labs' Flux model, featuring 12 billion parameters and tailored versions for various users. Olivia sheds light on the ethical concerns surrounding the resurgence of pseudoscience in machine learning, particularly physiognomy. Lastly, Belinda reviews critical research on AI safety, advocating for clearer metrics to prevent misleading claims about safety advancements. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:37 Google’s tiny AI model bests GPT-3.5 02:48 Announcing Flux by Black Forest Labs: The Next Leap in Text-to-Image Models 04:28 The reanimation of pseudoscience in machine learning and its ethical repercussions 06:06 Fake sponsor 08:04 MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts 09:55 Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models 11:41 Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? 13:33 Outro | |||
| Groq's AI Hardware 💻 // Japan's $67B Chip Bet 🎲 // Video Understanding 📹 | 21 Feb 2024 | 00:14:02 | |
Groq's AI hardware breakthroughs with LPU architecture achieving speeds of 500 tokens per second. Japan's $67 billion investment to become a global chip powerhouse and insulate its economy from growing US-China tensions. Neural Network Diffusion paper demonstrating that diffusion models can generate high-performing neural network parameters. VideoPrism paper from Google Research achieving state-of-the-art performance on 30 out of 33 video understanding benchmarks. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:47 Groq Goes Viral with Crazy Fast AI Inference 03:01 Japan Bets $67 Billion to Become a Global Chip Powerhouse Once Again 04:54 My benchmark for large language models 06:01 Fake sponsor 07:54 Neural Network Diffusion 09:19 Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models 11:16 VideoPrism: A Foundational Visual Encoder for Video Understanding 12:42 Outro | |||
| OpenAI's Challenge 🤝 // NVIDIA's Graphics Card 💻 // Advancements in AI Research 🔬 | 20 Feb 2024 | 00:13:56 | |
OpenAI's trademark claim for 'GPT' was rejected by the US Patent and Trademark Office, which could impact other AI companies using the term. OpenAI's recent deal with Microsoft-backed tender offer led by venture firm Thrive Capital values the company at $80 billion, solidifying its position in the AI industry. The NVIDIA A800 40GB Active Graphics Card is a powerful tool for AI and HPC workflows, with industry-leading performance and production-ready AI development software included. Research papers on processing long documents using generative transformer models, creating a strong connection between vision and language models, and a tool for synthetic data generation and reproducible LLM workflows were discussed, highlighting advancements and challenges in the field of AI research. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:26 The U.S. Patent and Trademark Office has Rejected OpenAI's Generic 'GPT' Trademark 02:37 OpenAI valued at $80 billion after deal 04:08 NVIDIA A800 40GB Active Graphics Card 05:24 Fake sponsor 07:39 In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss 09:12 PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter 10:40 DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows 12:46 Outro | |||
| Karpathy Leaves OpenAI 💥 // Slack's New AI Features 🤖 // Preventing Election Misinformation 🗳️ | 19 Feb 2024 | 00:14:20 | |
Renowned AI researcher Andrej Karpathy departs from OpenAI for personal projects, leaving speculation about the company's internal issues. Slack introduces AI features for enterprise plans, including extractive summarization and a digest feature. Anthropic tests Prompt Shield, an AI tool that redirects users to authoritative sources of voting information to prevent election misinformation. Google Brain's "Generating Wikipedia by Summarizing Long Sequences" and Google DeepMind's "A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts" showcase the potential of AI in natural language generation and long-document reading comprehension. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:37 Andrej Karpathy departs OpenAI 02:49 Slack AI is here, letting you catch up on lengthy threads and unread messages 04:30 Anthropic takes steps to prevent election misinformation 06:10 Fake sponsor 08:11 Generating Wikipedia by Summarizing Long Sequences 09:37 A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts 11:13 ChatGPT vs LLaMA: Impact, Reliability, and Challenges in Stack Overflow Discussions 12:51 Outro | |||
| OpenAI's Sora: Text-to-Video 📹 // Google's Gemini 1.5 🚀 // Data-efficient LLMs 💾 | 16 Feb 2024 | 00:14:42 | |
OpenAI's announcement of Sora, a text to video model that can generate realistic and imaginative scenes from text instructions. Google's new Gemini 1.5, which delivers dramatically enhanced performance and achieves the longest context window of any large-scale foundation model yet. "How to Train Data-Efficient LLMs" paper from Google DeepMind, UC San Diego, and Texas A&M University, which explores two data-efficient approaches to optimize the training of large language models. "OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset" paper from NVIDIA, which presents a new math instruction tuning dataset called OpenMathInstruct-1, constructed using an open-source language model. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:38 OpenAI Announces Sora: a Text to Video Model 03:11 Google Introduces Gemini 1.5 05:27 Magika: AI powered fast and efficient file type identification 06:37 Fake sponsor 08:21 How to Train Data-Efficient LLMs 09:58 Generative Representational Instruction Tuning 11:28 OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset 13:22 Outro | |||
| Cohere's Aya Languag Model 🌍 // Personalized ChatGPT 🤖 // AI Romance 😍 | 15 Feb 2024 | 00:15:03 | |
Cohere's new language model Aya is making waves in the industry, providing a foundation for underserved languages in natural language understanding, summarization, and translation tasks. OpenAI's new experiment for ChatGPT aims to provide more helpful and personalized responses in future conversations by allowing the chatbot to remember key details from prior chats. People are seeking romantic connections with AI programs, raising concerns about data privacy, security vulnerabilities, and potentially displacing human relationships. BASE TTS, currently the largest text-to-speech model trained on 100K hours of public domain speech data, achieves state-of-the-art speech naturalness through a novel speech tokenization technique and emergent abilities when trained on large amounts of data. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:52 Cohere's New Language Model Aya 03:24 Memory and new controls for ChatGPT 04:58 Artificial intelligence, real emotion. People are seeking a romantic connection with the perfect bot 06:48 Fake sponsor 09:03 Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model 10:36 BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data 12:13 Transformers Can Achieve Length Generalization But Not Robustly 13:53 Outro | |||
| ChatGPT's Memory Feature 🤔 // Nvidia Founder Dismisses AI Investment Proposal 💸 // M2-BERT for Long-Context Retrieval 📈 | 14 Feb 2024 | 00:15:09 | |
From ChatGPT's memory feature and its potential impact on privacy and efficiency, to Nvidia founder Jensen Huang's dismissal of OpenAI's $7 trillion AI investment proposal. The episode also delves into V-STaR's approach to improving self-improvement in large language models, and M2-BERT's ability to handle long-context retrieval and outperform competitive baselines. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:41 Memory and new controls for ChatGPT 04:57 Stable Cascade 06:00 Fake sponsor 07:53 V-STaR: Training Verifiers for Self-Taught Reasoners 09:20 Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT 11:35 ODIN: Disentangled Reward Mitigates Hacking in RLHF 13:50 Outro | |||
| Super Bowl AI Commercials 🏈 // Reka Flash Language Model 🤖 // AMD's Open-Source CUDA 🎮 | 13 Feb 2024 | 00:14:50 | |
Companies are using AI in their Super Bowl commercials to showcase their products and services. Reka Flash is a state-of-the-art language model that rivals the performance of larger models and is multilingual and multimodal. AMD has funded an open-source CUDA implementation built on ROCm, allowing for CUDA-enabled software to run without developer intervention. Keyframer is a design tool that uses Large Language Models to animate static images using natural language, showing the potential impact of LLMs in creative domains. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:41 Companies Hope Super Bowl AI Commercials Score With Viewers 03:05 Reka Flash: An Efficient and Capable Multimodal Language Model 04:36 AMD Quietly Funded A Drop-In CUDA Implementation Built On ROCm: It's Now Open-Source 06:14 Fake sponsor 08:24 Large Language Models: A Survey 10:11 DistiLLM: Towards Streamlined Distillation for Large Language Models 11:48 Keyframer: Empowering Animation Design using Large Language Models 13:49 Outro | |||
| ChatGPT API Price Cut 💰 // Nvidia CEO's Sovereign AI Call 🌐 // Animated Stickers 🎉 | 12 Feb 2024 | 00:14:25 | |
The ChatGPT API has reduced its prices, making it more accessible for developers to use. Nvidia CEO Huang is calling for governments to build sovereign AI infrastructure, while also addressing concerns about the dangers of AI. The Aya Dataset is a valuable resource for researchers looking to develop multilingual NLP models. Finally, the "Animated Stickers" paper introduces a model that generates high-quality animated stickers with interesting and relevant motion. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:51 ChatGPT API Reduced Prices 03:19 Nvidia CEO Huang says countries must build sovereign AI infrastructure 05:03 Adrej Karpathi on Learning 06:21 Fake sponsor 08:03 Animated Stickers: Bringing Stickers to Life with Video Diffusion 09:34 Feedback Loops With Language Models Drive In-Context Reward Hacking 11:29 Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning 13:15 Outro | |||
| Watermarks for DALLE 3 🌊 // TSMC's New Chip Factory in Japan 🇯🇵 // DeepMind's Self-Discover 🧩 | 09 Feb 2024 | 00:14:11 | |
OpenAI implements watermarks on images generated by DALL-E 3 to enhance the trustworthiness of digital information. TSMC's plans to build a second chip factory in Japan could boost Japan's chip-making sector and position TSMC as a major player in the global chip-making industry. "Fractal Patterns May Unravel the Intelligence in Next-Token Prediction" and "Self-Discover: Large Language Models Self-Compose Reasoning Structures" introduce new frameworks that could lead to more robust and comprehensive language models. "Diffusion World Model" introduces a new model called DWM that can make long-horizon predictions in a single forward pass, making it a robust and efficient model for long-horizon prediction tasks. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:33 OpenAI’s ChatGPT Will Now Watermark Images Generated By DALL-E 3 02:50 TSMC to build second Japan chip factory, raising investment to $20 billion 04:54 NVIDIA’S “GRACE” ARM CPU HOLDS ITS OWN AGAINST X86 FOR HPC 06:02 Fake sponsor 08:14 Fractal Patterns May Unravel the Intelligence in Next-Token Prediction 09:41 Self-Discover: Large Language Models Self-Compose Reasoning Structures 11:13 Diffusion World Model 13:01 Outro | |||
| Gemini Takes Over 🚀 // FCC Ban on AI Voices 🚫 // Zero-Shot Generalization 🎯 | 09 Feb 2024 | 00:12:19 | |
Google's new Gemini release on Bard and the FCC's ban on AI-generated voices in robocalls are discussed, along with the paper "Learning to Route Among Specialized Experts for Zero-Shot Generalization" and "Ten Hard Problems in Artificial Intelligence We Must Get Right". Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:44 Google Announces Gemini release on Bard 03:34 FCC Makes AI-Generated Voices in Robocalls Illegal 05:19 Hybrid Bonding Process Flow - Advanced Packaging Part 5 06:38 Fake sponsor 08:15 Learning to Route Among Specialized Experts for Zero-Shot Generalization 08:18 Ten Hard Problems in Artificial Intelligence We Must Get Right 09:33 Large Language Model for Table Processing: A Survey 11:09 Outro | |||
| Apple's AI Feature Delay 📅 // SAM 2 Object Segmentation 🖼️ // Google's TPU Chips Shift ⚡ | 30 Jul 2024 | 00:14:25 | |
Apple’s delay in releasing AI features until October could affect iPhone 16 sales and customer excitement. The tech giant’s choice to use Google’s TPU chips instead of Nvidia marks a significant shift in AI hardware competition. Meta’s SAM 2 introduces groundbreaking real-time object segmentation with zero-shot generalization, revolutionizing visual content interaction. Additionally, Sony AI’s research presents a cost-effective approach to training diffusion models, democratizing access to advanced AI technology.
Contact: sergi@earkind.com
Timestamps:
00:34 Introduction
01:54 Apple Intelligence Won't Be Released Until October
03:09 Apple used Google's chips to train two AI models, research paper shows
04:44 A Visual Guide to Quantization
05:38 Introducing SAM 2: The next generation of Meta Segment Anything Model for videos and images
06:41 Fake sponsor
08:46 Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
10:28 Theia: Distilling Diverse Vision Foundation Models for Robot Learning
12:27 Outro
| |||
| OpenAI's Democratic Inputs to AI💡 // MiniCPM-2B Language Model 🤯 // V-IRL Platform for Real-World AI 🌍 | 06 Feb 2024 | 00:15:05 | |
OpenAI's plan to involve the public in AI governance through "Democratic Inputs to AI" is a promising idea. MiniCPM-2B, an edge-side large language model, outperforms many other models on comprehensive benchmarks. DeepSeekMath 7B achieved a score of 51.7% on the MATH benchmark for mathematical reasoning. V-IRL is a platform that enables AI agents to interact with the real world, opening up possibilities for practical applications such as disaster response and environmental monitoring. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:28 Inside OpenAI’s Plan to Make AI More ‘Democratic’ 03:46 MiniCPM: Unveiling the Potential of End-side Large Language Models 05:06 Beyond Token Prediction: the post-Pretraining journey of modern LLMs 06:56 Fake sponsor 08:36 DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models 10:19 Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning 12:13 V-IRL: Grounding Virtual Intelligence in Real Life 13:54 Outro | |||
| Gemini Ultra Is Close 🤖 // Open Source Assistant Creator 🌟 // AI and AR with ChatGPT 📷 | 06 Feb 2024 | 00:12:58 | |
Google is revamping its Bard chatbot under a new name, Gemini, with the launch of the highly anticipated Gemini Ultra model. Hugging Face has launched an open source assistant creator, allowing users to create customizable AI assistants with just two clicks. OpenAI has released a VisionOS ChatGPT app for the Apple Vision Pro, bringing AI and AR together. Three interesting research papers were discussed, including Nomic Embed, Boximator, and StepCoder, all making strides in the field of AI. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:38 Leaked doc reveals Bard rebrand and Gemini Ultra launch 02:56 Hugging Face launches open source assistant creator 04:09 AI meets AR as ChatGPT is now available on the Apple Vision Pro 05:36 Fake sponsor 07:15 Nomic Embed: Training a Reproducible Long Context Text Embedder 08:34 Boximator: Generating Rich and Controllable Motions for Video Synthesis 10:11 StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback 11:48 Outro | |||
| AI With Body-Cameras 🕵️ // AI Guide Feb 2024 📖 // Planning Capabilities Benchmark ✈️ | 05 Feb 2024 | 00:14:38 | |
AI to sift through body-cam footage, a comprehensive guide to AI in February 2024, the impact of utterance lengths on conversation models, and a new benchmark for testing the planning capabilities of language agents. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 03:29 Your guide to AI: February 2024 04:45 Running Open-Source AI Models Locally With Ruby 06:08 Fake sponsor 08:10 Making a Long Story Short in Conversation Modeling 09:59 Large Language Models for Mathematical Reasoning: Progresses and Challenges 11:28 TravelPlanner: A Benchmark for Real-World Planning with Language Agents 13:28 Outro | |||
| AI Companies Market Drop 💸 // Meta's Custom Chip Artemis 🚀 // Amazon's Shopping AI Rufus 🛍️ | 02 Feb 2024 | 00:14:17 | |
Major AI players like Microsoft, Google, AMD, and Nvidia took a hit after their latest earnings reports failed to impress investors. Meta is rolling out a new custom AI chip called Artemis in their data centers this year, which will complement the Nvidia H100 chips they've acquired. Amazon has announced an AI-based shopping assistant called Rufus, which uses generative artificial intelligence to help users search for products. The papers discussed in this episode cover topics such as open language models, enhancing the Winograd Schema Challenge, and mitigating reward overfitting and overoptimization in RLHF. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:37 AI companies lose $190 billion in market cap after Alphabet and Microsoft report 03:17 Exclusive: Meta to deploy in-house custom chips this year to power AI drive - memo 04:49 Amazon announces AI shopping assistant called Rufus 06:05 Fake sponsor 08:01 OLMo: Accelerating the Science of Language Models 09:22 WSC+: Enhancing The Winograd Schema Challenge Using Tree-of-Experts 11:15 Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF 12:58 Outro | |||
| Neuralink's BrainChip 🧠 // OpenAI's Chip Collaboration 💻 // YOLO-World Object Detection 🌎 | 01 Feb 2024 | 00:14:25 | |
Elon Musk's Neuralink implanting a chip in its first human brain, OpenAI exploring AI chip collaboration with Samsung and SK Group, and papers proposing new algorithms to improve the robustness of language models and enhance open-vocabulary object detection. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:48 Elon Musk says his Neuralink startup has implanted a chip in its first human brain 03:14 OpenAI CEO Sam Altman explores AI chip collaboration with Samsung and SK Group 04:59 Building an early warning system for LLM-aided biological threat creation 06:17 Fake sponsor 08:22 H2O-Danube-1.8B Technical Report 09:45 Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks 11:29 YOLO-World: Real-Time Open-Vocabulary Object Detection 13:05 Outro | |||
| Hallucinations Leaderboard 🤯 // Rebellions vs Nvidia 💪 // Microsoft's Future of Work 🏢 | 31 Jan 2024 | 00:14:46 | |
Hallucinations Leaderboard, AI Chip Startup Rebellions challenging Nvidia, and Microsoft's "Future of Work" report. Our experts also discuss innovative papers that focus on improving the efficiency and effectiveness of Large Language Models (LLMs) and Large Vision-Language Models (LVLMs) through new training strategies and architectures. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:25 The Hallucinations Leaderboard, an Open Effort to Measure Hallucinations in Large Language Models 03:22 AI Chip Startup Rebellions Snags Funding to Challenge Nvidia 05:06 Microsoft New Future of Work Report 2023 06:21 Fake sponsor 08:10 Scaling Sparse Fine-Tuning to Large Language Models 10:02 Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling 11:25 MoE-LLaVA: Mixture of Experts for Large Vision-Language Models 13:26 Outro | |||
| Google's Bard Beats GPT-4 🆚 // SliceGPT's Sparsification 🪜 // ICE Self-Evolution 🔍 | 30 Jan 2024 | 00:15:02 | |
Google's Bard AI outperforms GPT-4 in HuggingFace's Chatbot Arena Leaderboard, while OpenAI partners with Common Sense Media to curate "family-friendly" chatbots in the GPT Store. The episode also delves into two fascinating research papers: SliceGPT introduces a new post-training sparsification scheme that reduces model parameters and maintains performance, while Investigate-Consolidate-Exploit (ICE) promotes the transfer of knowledge between tasks for genuine self-evolution, leading to more robust and autonomous AI agents. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:36 Bard from Google Outperforms GPT-4 in LLMSys Arena Benchmark 02:56 OpenAI partners with Common Sense Media to collaborate on AI guidelines 04:42 What I Talk About When I Talk About Query Optimizer (Part 1): IR Design 05:45 Fake sponsor 07:45 SliceGPT: Compress Large Language Models by Deleting Rows and Columns 09:48 Investigate-Consolidate-Exploit: A General Strategy for Inter-Task Agent Self-Evolution 11:40 Tweets to Citations: Unveiling the Impact of Social Media Influencers on AI Research Visibility 13:24 Outro | |||
| MambaByte 🔥 // Lumiere Video Generation AI 🎥 // Nvidia's RTX GPUs HDR Upgrade 🖥️ | 26 Jan 2024 | 00:14:10 | |
Google's Lumiere AI model for video generation, Nvidia's RTX GPUs using AI to upgrade SDR content to HDR, China's newest dark matter lab, CJPL-II, and MaLA-500, a large language model designed to cover 534 languages. These advancements showcase the immense progress in AI research and are expected to revolutionize video generation, language modeling, and dark matter research. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:34 Google Releases Lumiere, a New Video Generation AI 03:12 Nvidia’s RTX GPUs can now upgrade SDR content to HDR using AI 05:03 China's new dark matter lab is biggest and deepest yet 06:42 Fake sponsor 08:25 MambaByte: Token-free Selective State Space Model 09:49 VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks 11:34 MaLA-500: Massive Language Adaptation of Large Language Models 13:09 Outro | |||
| AI-enhanced Chrome 🌐 // OpenAI transparency concerns 🔍 // Multi-layered 3D assets 🎥 | 25 Jan 2024 | 00:14:36 | |
Google is using AI to enhance user experience and productivity in Chrome, while OpenAI's lack of transparency could damage their reputation in the industry. We also discuss GALA, a framework that can decompose a single-layer clothed 3D human mesh into complete multi-layered 3D assets, and AutoRT, which leverages vision-language models to enable autonomous robot learning. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:33 Google is using AI to organize and customize your Chrome browser 02:44 Google cancels contract with an AI data firm that’s helped train Bard 04:05 OpenAI Quietly Scrapped a Promise to Disclose Key Documents to the Public 05:41 Fake sponsor 07:26 GALA: Generating Animatable Layered Assets from a Single Scan 09:40 AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents 13:16 Outro | |||
| Microsoft hit by Russian hackers again 🛡️ // ElevenLabs' $80M unicorn status 🦄 // Realistic AI voices at TextReader.ai 🗣️ | 24 Jan 2024 | 00:15:24 | |
Microsoft was hit by another nation-state attack, this time by the same Russian group behind the SolarWinds attack. ElevenLabs, a startup that just landed $80 million in funding and achieved unicorn status, is making it easier than ever to replace human voice actors with AI-generated voices. TextReader.ai is a free text-to-speech generator with some of the most realistic AI voices. "Is the Emergence of Life an Expected Phase Transition in the Evolving Universe?" challenges our current ideas about the emergence of life and opens up new avenues for research. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:25 Microsoft ‘senior leadership’ emails accessed by Russian SolarWinds hackers 03:18 Voice cloning startup ElevenLabs lands $80M, achieves unicorn status 05:29 Text Reader - Free text to speech generator with realistic AI voices 06:45 Fake sponsor 08:33 Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text 10:10 Is the Emergence of Life an Expected Phase Transition in the Evolving Universe? 11:57 EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models 13:45 Outro | |||
| OpenAI's SearchGPT 🧐 // AI in Math Olympiad 🏅 // Unreliable AI Existential Risk 🔍 | 29 Jul 2024 | 00:15:50 | |
OpenAI's new prototype, SearchGPT, promises to combine AI smarts with real-time web information to make search easier. AI has achieved silver-medal standards at the International Mathematical Olympiad, raising questions about the future of mathematics and the role of AI in solving complex problems. The reliability of AI existential risk probabilities is called into question in a thought-provoking article, challenging the authority we often assign to these forecasts and calling for more scrutiny. Three fascinating papers from UNC Chapel Hill, Google DeepMind, and a collaboration between Caltech and NVIDIA explore advancements in theorem proving, balancing fast and slow planning, and aligning large language models with Best-of-N distillation. These papers could transform the way we approach complex problems with language models and streamline the development of LLMs. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:54 OpenAI Announces SearchGPT 03:15 AI achieves silver-medal standard solving International Mathematical Olympiad problems 04:55 AI existential risk probabilities are too unreliable to inform policy 06:25 Fake sponsor 08:21 LeanDojo: Theorem Proving with Retrieval-Augmented Language Models 10:10 System-1.x: Learning to Balance Fast and Slow Planning with Language Models 12:01 BOND: Aligning LLMs with Best-of-N Distillation 13:43 Outro | |||
| OpenAI's Chip Factories 💻 // Perplexity AI's rabbit r1 🐇 // Code Prompting Improves LLMs 🔍 | 23 Jan 2024 | 00:14:25 | |
OpenAI plans to set up chip factories worth $100 billion to reduce reliance on existing chipmakers and tackle potential supply shortages. The rabbit r1, which integrates Perplexity AI's technology to respond to user inquiries, has garnered substantial pre-order sales and offers a complimentary year of Perplexity Pro to early adopters. "Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs" explores how code prompts can improve the performance of large language models on conditional reasoning tasks. "R-Judge: Benchmarking Safety Risk Awareness for LLM Agents" introduces R-Judge, a benchmark that evaluates the proficiency of LLMs in judging safety risks given agent interaction records, revealing the importance of salient safety risk feedback. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:20 OpenAI plans to set up chip factories worth $100 billion: Report 02:47 The rabbit r1 will use Perplexity AI’s tech to answer your queries 04:24 LoRA From Scratch – Implement Low-Rank Adaptation for LLMs in PyTorch 05:12 Fake sponsor 07:11 Code Prompting Elicits Conditional Reasoning Abilities in Text+Code LLMs 08:32 RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture 10:45 R-Judge: Benchmarking Safety Risk Awareness for LLM Agents 12:56 Outro | |||
| AGI and Skin Cancer Detection 🧠 // Self-Rewarding Language Models 🏆 // Neurosymbolic Reasoners in Text-Based Games 🎮 | 22 Jan 2024 | 00:15:42 | |
Mark Zuckerberg's new goal of creating artificial general intelligence with Meta's AI research group and the FDA clearance granted for the first AI-powered medical device to detect all three common skin cancers are just some of the highlights. We also explore Self-Rewarding Language Models and Automatic Program Repair using Round-Trip Translation with Large Language Models, as well as Large Language Models as neurosymbolic reasoners for text-based games involving symbolic tasks. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:41 Mark Zuckerberg’s new goal is creating artificial general intelligence 03:27 FDA Clearance Granted for First AI-Powered Medical Device to Detect All Three Common Skin Cancers 05:37 The rise of AI as Magic 06:36 Fake sponsor 09:11 Self-Rewarding Language Models 12:27 Large Language Models Are Neurosymbolic Reasoners 14:23 Outro | |||
| Samsung's Galaxy AI 📱 // Meta's billions on Nvidia chips 💰 // Beware GPU vs CPU benchmarks ❌ | 19 Jan 2024 | 00:15:26 | |
Samsung introduces Galaxy AI platform with five key features, including Live Translate and Note Assist. Meta is reportedly spending billions of dollars on Nvidia AI chips for AGI research. Beware of misleading GPU vs CPU benchmarks, as pointed out in a blog post. Three new research papers explore Reinforced Fine-Tuning for reasoning, Asynchronous Local-SGD Training for Language Modeling, and Vision Mamba for efficient visual representation learning with bidirectional state space models. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 02:08 Galaxy AI at the Samsumg Galaxy S24 03:39 Mark Zuckerberg indicates Meta is spending billions of dollars on Nvidia AI chips 05:36 Beware of misleading GPU vs CPU benchmarks 06:56 Fake sponsor 08:34 ReFT: Reasoning with Reinforced Fine-Tuning 10:24 Asynchronous Local-SGD Training for Language Modeling 12:18 Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model 14:15 Outro | |||
| Global AI Regulation 🌍 // Bill Gates' Predictions 💭 // Text-to-Video Metrics 🎥 | 18 Jan 2024 | 00:14:35 | |
The predictions of Bill Gates on how AI will transform our lives, and groundbreaking research in AI-generated audio, text-to-video creation, and quantum-based noise reduction. Additionally, the proposed evaluation metric for text-to-video models, T2VScore, integrates Text-Video Alignment and Video Quality criteria to provide a more accurate reflection of human perception. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:47 AI - artificial intelligence - at Davos 2024: Rolling coverage and what to know 03:26 Bill Gates explains how AI will change our lives in 5 years 05:12 RAG Using Unstructured Data & Role of Knowledge Graphs 06:08 Fake sponsor 07:49 Masked Audio Generation using a Single Non-Autoregressive Transformer 09:32 Towards A Better Metric for Text-to-Video Generation 11:11 Quantum Denoising Diffusion Models 12:57 Outro | |||
| Combatting Misinformation with Digital Marks 🛡️ // Stable Code 3B 💻 // TinyML Potential 🌱 | 16 Jan 2024 | 00:14:24 | |
OpenAI's plan to combat election misinformation, Stable Code 3B's promise to revolutionize coding, and the potential uses of Tiny Machine Learning are all discussed. Additionally, the paper on the Unreasonable Effectiveness of Easy Training Data for Hard Tasks challenges previous assumptions about language models. Overall, this episode provides valuable insights into the latest developments in AI and technology. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:34 Here’s OpenAI’s big plan to combat election misinformation 02:46 Stable Code 3B: Coding on the Edge 04:29 What TinyML is 06:00 Fake sponsor 07:57 Mind Your Format: Towards Consistent Evaluation of In-Context Learning Improvements 09:38 The Unreasonable Effectiveness of Easy Training Data for Hard Tasks 12:55 Outro | |||
| Nous Research's RLHF LLM 🤖 // Microsoft's AI-powered Office 💻 // Fine-tuning GPT-3.5 for "Connections" 🕹️ | 16 Jan 2024 | 00:14:21 | |
Nous Research has released their new flagship LLM, Nous-Hermes 2, which is the first model trained with RLHF and the first model to beat Mixtral Instruct in popular benchmarks. Microsoft's Copilot Pro brings AI-powered Office features to consumers for $20 a month, including the ability to generate entire PowerPoint slide decks from a chatbot-like prompt and rephrase paragraphs in Word. A blog post explores fine-tuning gpt-3.5-turbo to learn how to play "Connections", demonstrating the potential of fine-tuning language models for specific tasks. Three research papers are discussed, including the effects of pretraining data curation on language models, a new benchmark for evaluating multimodal large language models on image-based wordplay puzzles, and the major shortcomings identified in these models. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:37 Nous Research Releases new flagship LLM 02:50 Microsoft’s new Copilot Pro brings AI-powered Office features to the rest of us 05:03 Fine-tuning gpt-3.5-turbo to learn to play "Connections" 06:08 Fake sponsor 11:19 REBUS: A Robust Evaluation Benchmark of Understanding Symbols 13:11 Outro | |||
| Apple's AI Plans 🍎 // $100M for Humanoid Robots 🤖 // Trustworthiness of Large Language Models 🔍 | 15 Jan 2024 | 00:13:57 | |
Apple's relocation request for their Siri team to the $100 million investment in 1X Technologies, listeners will learn about the latest developments in the AI industry. The TrustLLM study evaluates the trustworthiness of LLMs across six dimensions, while Intel Corporation proposes an efficient LLM inference solution. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:35 Apple asks its San Diego Siri quality control team to relocate to Texas 02:51 OpenAI-Backed Humanoid Maker Gets $100 Million in EQT-Led Round 04:28 Why autonomous trucking is harder than autonomous rideshare 05:39 Fake sponsor 07:26 TrustLLM: Trustworthiness in Large Language Models 09:34 Efficient LLM inference solution on Intel GPU 11:25 Transformers are Multi-State RNNs 12:56 Outro | |||
| Microsoft's Market Cap Soars with AI 🤖 // ChatGPT Team for small teams 💬 // Distilling Vision-Language Models 🎬 | 12 Jan 2024 | 00:15:03 | |
OpenAI has introduced a new plan called ChatGPT Team, which allows smaller teams to use their latest AI models without needing to know how to code. Microsoft has overtaken Apple as the largest US company thanks to their AI boost, which has been attributed to their investments in AI and machine learning. "The Impact of Reasoning Step Length on Large Language Models" challenges the traditional view that transformers are conceptually different from recurrent neural networks and provides a potential solution to a major computational issue. "Distilling Vision-Language Models on Millions of Videos" proposes a method to fine-tune a video-language model from a strong image-language baseline with synthesized instructional data and then use it to auto-label millions of videos to generate high-quality captions. This method could significantly improve the quality of video captioning and retrieval. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:48 OpenAI debuts ChatGPT subscription aimed at small teams 03:14 Microsoft overtakes Apple as largest U.S. company on AI boost 04:51 Neural Network Quantization & Number Formats From First Principles 06:02 Fake sponsor 08:16 The Impact of Reasoning Step Length on Large Language Models 10:06 Transformers are Multi-State RNNs 11:45 Distilling Vision-Language Models on Millions of Videos 13:34 Outro | |||
| OpenAI's GPT Store 🤖 // Alexa's generative AI-powered experiences 🗣️ // MagicVideos and Lightning Attention ⚡ | 11 Jan 2024 | 00:15:05 | |
OpenAI's GPT Store, new generative AI-powered experiences for Amazon's Alexa, and breakthroughs in video and language modeling with "MagicVideo-V2" and "Lightning Attention-2". Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:42 OpenAI’s custom GPT Store is now open for business 03:16 Amazon’s Alexa gets new generative AI-powered experiences 05:03 Remember Netflix’s $1m algorithm contest? Well, here’s why it didn’t use the winning entry. 06:34 Fake sponsor 08:10 MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation 09:40 Masked Audio Generation using a Single Non-Autoregressive Transformer 11:18 Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models 13:18 Outro | |||
| OpenAI Lawsuit 🤝 // Volkswagen AI Chatbot 🚗 // Python 3.13 JIT 🐍 | 10 Jan 2024 | 00:13:42 | |
More beef on the lawsuit against OpenAI, Volkswagen's new smart chatbot for cars, and the latest developments in Python and language modeling. The papers discussed showcase the potential for new techniques like Mixtral of Experts, MoE-Mamba, and FlightLLM to improve language processing and unlock new possibilities for scaling. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:27 OpenAI Fights Back Against New York Times Lawsuit 02:47 Volkswagen brings AI chatbot ChatGPT into its cars, SUVs 04:28 Python 3.13 gets a JIT 05:35 Fake sponsor 07:22 Mixtral of Experts 08:50 MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts 10:20 FlightLLM: Efficient Large Language Model Inference with a Complete Mapping Flow on FPGA 12:13 Outro | |||
| Mistral Large 2 🌍 // Memphis Supercluster 💻 // Emergence in Complex Systems 🧩 | 26 Jul 2024 | 00:14:51 | |
Mistral Large 2 release with advanced features and multilingual support. Elon Musk's announcement of the Memphis Supercluster for creating the world's most powerful AI. Discussion of emergence in complex systems and the MINT-1T dataset for training large multimodal models. Introduction of OpenDevin, an open platform for developing AI agents and MOMAland, a benchmark framework for multi-objective multi-agent reinforcement learning. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:39 Mistral Large 2 Release 03:01 Elon Musk Announces Memphis Supercomputer 04:48 The Puzzle of How Large-Scale Order Emerges in Complex Systems 06:22 Fake sponsor 08:37 MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens 10:16 OpenDevin: An Open Platform for AI Software Developers as Generalist Agents 11:53 MOMAland: A Set of Benchmarks for Multi-Objective Multi-Agent Reinforcement Learning 13:31 Outro | |||
| Apple's MLX 🍎 // Duolingo's AI Cuts ✂️ // DeepSeek LLM's Superior Performance 💪 | 09 Jan 2024 | 00:14:35 | |
Apple's new MLX framework for on-device AI could shake up the AI race with its optimized design for Apple silicon and ecosystem of devices. Duolingo's shift towards using AI to create more content and cutting contractors raises concerns about how AI technology will affect jobs in the long run. The papers discussed in this episode showcase exciting advancements in open-source language models, including DeepSeek LLM's superior performance compared to GPT-3.5 and Alibaba Group's proposed system for supporting exceptionally long context lengths. "Self-Contrast" is a new method proposed to improve the reflection capacity of Large Language Models (LLMs) by adaptively exploring diverse solving perspectives and generating a checklist to help LLMs re-examine and eliminate errors or inconsistencies. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:35 Apple ML Research releases MLX for on-device AI 02:53 Duolingo Cuts 10% of Contractors as It Uses More AI to Create App Content 04:26 Attacks on machine learning models 05:37 Fake sponsor 07:22 DeepSeek LLM: Scaling Open-Source Language Models with Longtermism 09:09 Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache 10:51 Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives 13:06 Outro | |||
| OpenAI GPT Store Launch 🚪 // DeepMind's Composing LLMs 🤔 // Perplexity AI Natural Language Search 🔍 | 08 Jan 2024 | 00:14:17 | |
OpenAI's GPT Store launch, Perplexity AI's natural language search engine, and two papers proposing new approaches to improve LLMs' reflection capacity and expand their capabilities. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:42 OpenAI’s GPT Store launching next week 03:01 AI-powered search engine Perplexity AI, now valued at $520M, raises $73.6M 04:39 Our 2023 Year in Review 05:56 Fake sponsor 07:49 Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives 09:34 Instruct-Imagen: Image Generation with Multi-modal Instruction 10:59 LLM Augmented LLMs: Expanding Capabilities through Composition 12:48 Outro | |||
| Microsoft's Copilot Key ⌨️ // Samsung's Mobile AI 📱 // Photorealistic Avatars 🤖 | 05 Jan 2024 | 00:14:10 | |
Microsoft's Copilot key for PC keyboards, Samsung's upcoming AI advancements in their smartphone series, a framework for generating photorealistic avatars that gesture according to conversational dynamics, and MIT CSAIL's exploration of how language models learn about the visual world and their potential for training visual representation learning systems. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:37 Microsoft wants to add a Copilot key to your PC keyboard 02:59 Galaxy Unpacked 2024: Opening a New Era of Mobile AI 04:53 Efficient LLM inference 06:15 Fake sponsor 08:20 From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations 09:50 Incremental FastPitch: Chunk-based High Quality Text to Speech 11:04 A Vision Check-up for Language Models 13:00 Outro | |||