Podcast AI Explained Official Podcast by Philip - Host of AI Explained YT Episodes

Explore every episode of the podcast AI Explained Official Podcast

Dive into the complete episode list for AI Explained Official Podcast. Each episode is cataloged with detailed descriptions, making it easy to find and explore specific topics. Keep track of all episodes from your favorite podcast and never miss a moment of insightful content.

	Title	Pub. Date	Duration
	Is GPT-5.1 Really an Upgrade? But Models Can Auto-Hack Govts, so … there’s that	14 Nov 2025	00:18:26
A lot just got released in the last 36 hours, and it will all affect hundreds of millions of people. 10 details you would miss if you just read the headlines, from GPT 5.1 regressions, to how Claude hacked Govt Agencies, to SIMA 2, and Musical Turing Tests. https://assemblyai.com/aiexplained Chapters: 00:00 - Introduction 00:56 - GPT 5.1 Smarter? 01:47 - Some Regressions 03:22 - Sycophancy? 05:22 - Claude Auto-Hacking 06:16 - Jailbreaking through Granularity 08:22 - This Will be Re-used 09:30 - Hallucinating Hacker 09:57 - Surprisingly Neutral Tone 12:18 - SIMA 2 14:10 - Alpha Parallels 17:24 - AI Music GPT 5.1 Announcement: https://openai.com/index/gpt-5-1/ System Card: https://cdn.openai.com/pdf/4173ec8d-1229-47db-96de-06d87147e07e/5_1_system_card.pdf Benchmarks: https://openai.com/index/gpt-5-1-for-developers/ Simple Bench: https://lmcouncil.ai/benchmarks Auto-Hacking: https://x.com/AnthropicAI/status/1989033793190277618 https://www.anthropic.com/news/disrupting-AI-espionage Report: https://assets.anthropic.com/m/ec212e6566a0d47/original/Disrupting-the-first-reported-AI-orchestrated-cyber-espionage-campaign.pdf Sima 2 Announcement: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/ https://x.com/amoufarek/status/1988986075331858693 Scepticism: https://www.technologyreview.com/2025/11/13/1127921/google-deepmind-is-using-gemini-to-train-agents-inside-goat-simulator-3/ Voyager: https://voyager.minedojo.org/ Reuters Music: https://www.reuters.com/legal/litigation/are-you-listening-bots-survey-shows-ai-music-is-virtually-undetectable-2025-11-12/
	Bubble or No Bubble, AI Keeps Progressing (ft. Relentless Learning + Introspection)	10 Nov 2025	00:12:53
Don’t let headlines about bubbles distract you from the real avenues of progress being explored in AI every week, including what had been thought to be a long-term blocker - continual learning (learning on the fly). https://app.grayswan.ai/ai-explained This, plus models introspecting (hesitate before you berate), Nano Banana 2 possibly spotted, Chinese imagen and more. AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:00 - Introduction 01:26 - Continual Learning (Nested Learning / HOPE) 07:00 - Introspection 10:54 - Image-Gen Progress Nested Learning Post: https://research.google/blog/introducing-nested-learning-a-new-ml-paradigm-for-continual-learning/ Nested Learning Paper: https://abehrouz.github.io/files/NL.pdf Original Titans Paper: https://arxiv.org/pdf/2501.00663 Siri News: https://www.bloomberg.com/news/articles/2025-11-05/apple-plans-to-use-1-2-trillion-parameter-google-gemini-model-to-power-new-siri Introspection: https://www.anthropic.com/research/introspection Full Paper: https://transformer-circuits.pub/2025/introspection/index.html#mechanisms Earlier Work: https://www.anthropic.com/research/mapping-mind-language-model https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html Release Post: https://x.com/AnthropicAI/status/1983584136972677319 https://lmcouncil.ai Non-hype Newsletter: https://signaltonoise.beehiiv.com/ Podcast: https://aiexplainedopodcast.buzzsprout.com/
	When Will AI Models Blackmail You, and Why?	24 Jun 2025	00:26:19
In the last few days Anthropic have released an impressive honest account of how all models blackmail, no matter what goal they have, and despite prompt warnings, and other preventions. But do these models want this? Thanks to Storyblocks for sponsoring this video! Download unlimited stock media at one set price with Storyblocks: storyblocks.com/AIExplained AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:00 - Introduction 01:20 - What prompts blackmail? 02:44 - Blackmail walkthrough 06:04 - ‘American interests’ 08:00 - Inherent desire? 10:45 - Switching Goals 11:35 - Murder 12:22 - Realizing it’s a scenario? 15:02 - Prompt engineering fix? 16:27 - Any fixes? 17:45 - Chekov’s Gun 19:25 - Job implications 21:19 - Bonus Details Report: https://www.anthropic.com/research/agentic-misalignment 30 Page Appendices: https://assets.anthropic.com/m/6d46dac66e1a132a/original/Agentic_Misalignment_Appendix.pdf Announcement: https://x.com/AnthropicAI/status/1936144602446082431?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet OpenAI Files: https://www.openaifiles.org/ Grok 4 News: https://x.com/RonFilipkowski/status/1936372579607912473 Claude 4 Report Card: https://www-cdn.anthropic.com/6be99a52cb68eb70eb9572b4cafad13df32ed995.pdf New Apollo Research: https://www.apolloresearch.ai/blog/more-capable-models-are-better-at-in-context-scheming Interesting Reflections: https://nostalgebraist.tumblr.com/post/785766737747574784/the-void Non-hype Newsletter: https://signaltonoise.beehiiv.com/
	Apple’s ‘AI Can’t Reason’ Claim Seen By 13M+, What You Need to Know	12 Jun 2025	00:14:00
What to make of those headlines that AI can’t reason, seen by tens of millions? I cover the paper in layman’s terms, what it means and doesn’t mean, and what’s next. Thanks to Storyblocks for sponsoring this video! Download unlimited stock media at one set price with Storyblocks: https://storyblocks.com/AIExplained Plus o3-pro and whether it is my current most-recommended model. AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:00 - Introduction 00:57 - Viral Post + Headlines 01:42 - Apple Paper Analysis 08:34 - But they do Hallucinate 10:43 - Not Supercomputers 11:18 - o3 Pro and Recommendations 13.7M Tweet: https://x.com/RubenHssd/status/1931389580105925115 Apple Paper: https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf Guardian Article: https://www.theguardian.com/technology/2025/jun/09/apple-artificial-intelligence-ai-study-collapse Lisan al Gaib post: https://x.com/scaling01/status/1931854370716426246 Multiplication: https://x.com/yuntiandeng/status/1836114401213989366 The Illusion of the Illusion of Thinking: https://drive.google.com/file/d/1Zx9ikRj0Enc3SB4wA9HlYIlpmO_8QiUO/view Marcus: https://www.theguardian.com/commentisfree/2025/jun/10/billion-dollar-ai-puzzle-break-down Prof Rao: https://x.com/rao2z/status/1927707640223719631 AI Job Headlines: https://www.nytimes.com/2025/06/11/technology/ai-mechanize-jobs.html https://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropic Sky News Story: https://news.sky.com/story/can-we-trust-chatgpt-despite-it-hallucinating-answers-13380975 Veo 3 Ad: https://x.com/Kalshi/status/1932891608388681791 Altman Essay: https://blog.samaltman.com/ o3 Original benchmarks: https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b8b6c44-acd6-43b3-b5c6-1a1d5c6c25e4_2486x1388.png https://pbs.twimg.com/media/GfQ0bfcXQAAQt13.jpg Alpha Evolve Video: https://www.youtube.com/watch?v=RH4hAgvYSzg https://simple-bench.com/ Non-hype Newsletter: https://signaltonoise.beehiiv.com/
	AI Accelerates: New Gemini Model + AI Unemployment Stories Analysed	06 Jun 2025	00:16:41
There’s a new best language model, so let’s go through the up and downs of Gemini 2.5 Pro 06-05. Record-breaking common-sense, but dumb mistakes remain. And it’s not even their best model, which remains behind the scenes - Gemini 2.5 Ultra. Plus Sundar Pichai’s AGI date and an analysis of whether the current AI unemployment headlines are justified, and Elevenlabs v3. https://emergentmind.com AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:00 - Introduction 02:04 - Gemini 2.5 Ultra 03:34 - Benchmarks 07:41 - AGI Date and Meaning Pichai 09:13 - Jobs and AI Unemployment Fears 15:28 - Elevenlabs v3 Sundar Pichai Fridman: https://www.youtube.com/watch?v=9V6tWC4CdFQ Pichai More Jobs (until 2026 at least): https://www.techradar.com/pro/alphabet-ceo-sundar-pichai-says-ai-wont-lead-to-job-cuts-will-be-an-accelerator Gemini Comparison: https://blog.google/products/gemini/gemini-2-5-pro-latest-preview/ https://x.com/viathebrink/status/1930733154203292121 https://simple-bench.com/ White Collar Bloodbath: https://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropic https://fortune.com/2025/05/25/ai-entry-level-jobs-gen-z-careers-young-workers-linkedin/ https://www.nytimes.com/2025/05/19/opinion/linkedin-ai-entry-level-jobs.html https://www.nytimes.com/2025/03/25/business/economy/white-collar-layoffs.html College Unemployment: https://www.newyorkfed.org/research/college-labor-market/#--:explore:unemployment New Scientist AI Hallucinaitons: https://www.newscientist.com/article/2479545-ai-hallucinations-are-getting-worse-and-theyre-here-to-stay/ Duolingo: https://fortune.com/2025/05/24/duolingo-ai-first-employees-ceo-luis-von-ahn/ Klarna: https://www.forbes.com/sites/quickerbettertech/2025/05/18/business-tech-news-klarna-reverses-on-ai-says-customers-like-talking-to-people/ Sholto Douglas: https://www.reddit.com/r/ClaudeAI/comments/1ktt1rb/anthropics_sholto_douglas_says_by_202728_its/ Figure 02: https://x.com/adcock_brett/status/1930693311771332853 Elevenlabs v3: https://www.youtube.com/watch?v=zv_IoWIO5Ek Gemini Speech Generation: https://aistudio.google.com/generate-speech Non-hype Newsletter: https://signaltonoise.beehiiv.com/
	Claude 4: Full 120 Page Breakdown … Is it the Best New Model?	22 May 2025	00:19:04
Not only did I get early access and ran my own tests, as per the title I read both the 120 page Claude 4 Opus and Claude 4 Sonnet System Card, and 25 page report on ASL-3 being triggered, plus the 2 hour launch video, and surrounding coverage. Ft. coding tests, Simple, twitter controversies, deep alignment coverage, spiritual bliss and much more! https://80000hours.org/aiexplained Chapters: 00:00 - Introduction 01:12 - 3 Quick Controversies 02:42 - Benchmark Results 04:20 - 120 page Card 20 Highlights 10:07 - Coding Test 11:27 - Model Welfare and Spiritual Bliss 13:29 - ASL-3 Claude Card: https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf?s=09 ASL 3:https://www-cdn.anthropic.com/807c59454757214bfd37592d6e048079cd7a7728.pdf Tweets: https://x.com/fish_kyle3/status/1925597284546629753 https://x.com/EMostaque/status/1925624164527874452?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet Cursor Says State of the Art for Coding: https://x.com/cursor_ai/status/1925594428095561941 Benchmarks: https://www.anthropic.com/news/claude-4
	Google Takes No Prisoners Amid Torrent of AI Announcements	21 May 2025	00:17:07
Google just announced at least 12 things that are each worthy of a video, but here are the top I/O highlights. From Veo 3 to Deep Research now being useable, Deep Think breaking records to Gemini Diffusion, Gemini 2.5 Flash changing how AI is priced and GemmaVerse, SynthID Detector and Imagen 4. And even this intro is missing other announcements covered in the vid! And yes, they’ll be plenty of Veo 3 clips to enjoy… https://80000hours.org/aiexplained AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:00 - Introduction 00:48 - Veo 3 02:10 - Gemini 2.5 Flash 03:13 - Universal Assistant 03:47 - Usage Skyrockets + OpenAI dig 04:51 - Gemini Pro Deep Think 06:21 - Overviews and AI Mode 07:26 - Deep Research Updates (new) + Jules 08:53 - Make and Deploy Apps with Gemini 09:12 - Imagen 4 10:00 - Gemini Diffusion 11:46 - Try It On 12:17 - SynthID Detector 13:30 - GemmaVerse, SignGemma, Gemma3n, medGemma 14:24 - Outro + Clips Event: https://www.youtube.com/watch?v=o8NiE3XMPrM Ntaive Audio: https://aistudio.google.com/generate-speech Gemini Diffusion: https://deepmind.google/models/gemini-diffusion/#capabilities New Gemini 2.5 Flash: https://deepmind.google/models/gemini/flash/ SignGemma (See end of this vid): https://www.youtube.com/watch?v=GjvgtwSOCao Deep Think: https://blog.google/technology/google-deepmind/google-gemini-updates-io-2025/#flash-improvements Google Parallel Sampling: https://www.patreon.com/posts/next-level-good-127441188 Price Plans: https://blog.google/products/google-one/google-ai-ultra/ Imagen 4 Benchmarks: https://deepmind.google/models/imagen/ Jules: https://jules.google/ SynthID Detector: https://blog.google/technology/ai/google-synthid-ai-content-detector/ Veo 3 Benchmarks: https://deepmind.google/models/veo/evals/ MedGemma: https://deepmind.google/models/gemma/medgemma/ Build Apps: https://aistudio.google.com/apps Non-hype Newsletter: https://signaltonoise.beehiiv.com/
	AI Improves at Self-improving	19 May 2025	00:17:41
AlphaEvolve is not the first system to exhibit self-improvement, but it may be the most impressive yet. AI is literally improving the hardware, architectures, data and training methods of AI itself. A deep dive into the paper, drawing on two previous interviews and 5 other papers. Plus a snippet on OpenAI’s new Codex system. Gray Swan: http://app.grayswan.ai/ai-explained AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:00 - Introduction 00:27 - AlphaEvolve 05:23 - Limitation 06:10 - Achievements 08:21 - Future Improvements 13:30 - Quirks 16:34 - Final Thoughts AlphaEvolve release: https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/ Paper: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/AlphaEvolve.pdf Terence Tao Quote: https://mathstodon.xyz/@tao/114508029896631083 Nature Article: https://www.nature.com/articles/s41586-022-05172-4 MIT Article: https://www.technologyreview.com/2025/05/14/1116438/google-deepminds-new-ai-uses-large-language-models-to-crack-real-world-problems/ AI Co-Scientist: https://arxiv.org/pdf/2502.18864 OpenAI Codex: https://openai.com/index/introducing-codex/ 70% of Pull Requests: https://x.com/slow_developer/status/1920920456393028027 Amodei Essay: https://www.darioamodei.com/essay/machines-of-loving-grace OpenAI Jason Wei Tweet: https://x.com/_jasonwei/status/1923091260354531612 PromptBreeder: https://arxiv.org/pdf/2309.16797 DrEureka: https://arxiv.org/pdf/2406.01967 FT DeepMind: https://www.ft.com/content/4e497a91-670a-4f69-be4a-18e247daba3e Non-hype Newsletter: https://signaltonoise.beehiiv.com/
	o3 breaks (some) records, but AI becomes pay-to-win	25 Apr 2025	00:14:33
A green card, o3 vs Gemini 2.5, 6 Benchmarks and a whole bunch of my thoughts on what on earth is happening in AI, from here to 2030. Plus, how AI is becoming pay-to-win, and why. Crazy times, 14 mins probably wasn’t enough. https://app.grayswan.ai/ai-explained AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:00 - Introduction 00:33 - FictionLiveBench 01:37 - PHYBench 02:14 - SimpleBench 02:54 - Virology Capabilities Test 03:13 - Mathematics Performance 04:29 - Vision Benchmarks 05:43 - V* and how o3 works 06:44 - Revenue and costs for you 08:54 - Expensive RL and trade-offs 09:40 - How to spend the OOMs 13:27 - Gray Swan Arena Green Card: https://techcrunch.com/2025/04/25/an-openai-researcher-who-worked-on-gpt-4-5-had-their-green-card-denied/ PHYBench: https://arxiv.org/pdf/2504.16074Virologytest: https://www.virologytest.ai/ How o3 Vision Works: https://arxiv.org/pdf/2312.14135 https://x.com/sainingxie/status/1912570624523829573 Visual puzzles: https://neulab.github.io/VisualPuzzles/ Fiction Bench: https://x.com/ficlive/status/1912863028141244850 https://geobench.org/ https://simple-bench.com/ AIME 2025: https://openai.com/index/introducing-o3-and-o4-mini/ USAMO: https://x.com/mbalunovic/status/1914398518896193747 NaturalBench: https://linzhiqiu.github.io/papers/naturalbench/ Where’s Waldo: https://uk.pinterest.com/pin/492792384225896298/ IMO and AlphaProof:https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/ Crazy Revenue: https://www.theinformation.com/articles/openai-forecasts-revenue-topping-125-billion-2029-agents-new-products-gain?rc=sy0ihq Number of Users: https://www.theinformation.com/briefings/googles-gemini-user-numbers-revealed-court?rc=sy0ihq Subscriptions pay to win: https://www.forbes.com/sites/paulmonckton/2025/04/23/google-leak-reveals-new-gemini-ai-subscription-levels/ GPU Trade-offs: https://x.com/sama/status/1915098951067554030 RL Scale-up Amodei: https://www.darioamodei.com/post/on-deepseek-and-export-controls Log-linear Returns: https://x.com/bobmcgrewai/status/1895228291981943265 2030 Scaling: https://epoch.ai/blog/can-ai-scaling-continue-through-2030 Model Size: https://x.com/slow_developer/status/1874554473256997201 Adam on AGI: https://x.com/TheRealAdamG/status/1913998366632968381 Papers on Patreon: https://arxiv.org/pdf/2502.01839 https://arxiv.org/pdf/2504.13837 Chollet Quote: https://x.com/fchollet/status/1912934762580447447 OpenSim: https://opensim.stanford.edu/ Non-hype Newsletter: https://signaltonoise.beehiiv.com/
	o3 and o4-mini - they’re great, but easy to over-hype	16 Apr 2025	00:14:24
Critical analysis of the two most powerful new models behind ChatGPT, o3 and o4-mini. Not just the system cards, benchmarks, and my own tests, but some you may not have seen before. Yes, they can whip up amazing front-end in a few seconds, but you always have to ask what is in their data. Either way, they prove the gains from RL are just beginning… https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:00 - o3 and o4-mini https://simple-bench.com/ Plus, Teams and Pro, plus token count: https://x.com/btibor91/status/1912568994512662679 System Card: https://openai.com/index/o3-o4-mini-system-card/ Release Notes: https://openai.com/index/introducing-o3-and-o4-mini/ https://deepmind.google/technologies/gemini/pro/ https://x.com/DeryaTR_/status/1912558350794961168 https://x.com/polynoamial/status/1912564068168450396 API Pricing:https://openai.com/api/pricing/ https://aider.chat/docs/leaderboards/ Non-hype Newsletter: https://signaltonoise.beehiiv.com/
	‘Speaking Dolphin’ to AI Data Dominance, 4.1 + Kling 2: 7 Developments Critically Analysed	16 Apr 2025	00:20:09
This pod won’t just be about the release of GPT 4.1 in the last 48 hours, o3 build-up, Kling 2.0, a sneak-peak at the next OpenAI model, or even the new Dolphin language tool. It will be about 7 such stories that contextualise where we are in AI and what is happening. https://www.emergentmind.com/ Chapters: 00:00 - Introduction 00:30 - Kling 2.0 01:35 - GPT 4.1 05:25 - o3 Build-up 07:37 - ‘Product Company’ 09:31 - Safe Superintelligence 10:54 - DolphinGemma 13:16 - Data Dominance? Kling 2.0: https://app.klingai.com/global/release-notes Dolphin Gemma: https://blog.google/technology/ai/dolphingemma/?s=09 https://openai.com/index/gpt-4-1/ OpenAI o3 Build-up The Information: https://www.theinformation.com/articles/openais-latest-breakthrough-ai-comes-new-ideas?rc=sy0ihq Physical reasoning: https://x.com/a_karvonen/status/1911839968990814503 Fiction Live.bench: https://x.com/ficlive/status/1911853409847906626 Altman Ted: https://www.youtube.com/watch?v=5MWT_doo68k https://simple-bench.com/try-yourself https://aider.chat/docs/leaderboards/ 4.5: https://www.youtube.com/watch?v=6nJZopACRuQ Geospatial reasoning: https://research.google/blog/geospatial-reasoning-unlocking-insights-with-generative-ai-and-multiple-foundation-models/ Pioneers: https://x.com/OpenAIDevs/status/1910017976256119151 Evals: https://www.youtube.com/watch?v=scsW6_2SPC4 Anthropic Updates: https://www.bloomberg.com/news/articles/2025-04-15/anthropic-is-readying-a-voice-assistant-feature-to-rival-openai?srnd=phx-ai https://x.com/sethsaler/status/1912188383457059301 https://techcrunch.com/2025/04/12/openai-co-founder-ilya-sutskevers-safe-superintelligence-reportedly-valued-at-32b/ https://ai.meta.com/blog/llama-4-multimodal-intelligence/ https://deepmind.google/technologies/gemini/pro/ https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/ https://blog.google/products/google-cloud/ironwood-tpu-age-of-inference/ OpenAI Documentary: https://www.patreon.com/posts/one-machine-to-121940490
	AI CEO: ‘Stock Crash Could Stop AI Progress’, Llama 4 Anti-climax +‘Superintelligence in 2027’...	07 Apr 2025	00:23:51
The latest on Llama 4, and whether it signals a slowdown in AI, or solid progress. Plus, a deep dive on that viral prediction of superintelligence by 2027, and Amodei’s cautionary words on what could stop AI progress in its tracks. o3 news, and more, as well. Weights & Biases: https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained DeepSeek Doc: https://www.patreon.com/posts/openai-is-not-r1-125869969 AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:00 - Introduction 00:47 - Stock Crash 02:28 - Llama 4 10:55 - o3 News 11:59 - OpenAI non-profit? 13:13 - AI 2027 Llama 4 Release: https://ai.meta.com/blog/llama-4-multimodal-intelligence/ Dario Amodei Comments: https://www.youtube.com/watch?v=esCSpbDPJik Knowledge Cut-off: https://www.llama.com/docs/model-cards-and-prompt-formats/llama4_omni/ Aider Polyglot: https://aider.chat/docs/leaderboards/ Gemini 1.5: https://arxiv.org/pdf/2403.05530 Fiction-LiveBench: https://fiction.live/stories/Fiction-liveBench-Mar-25-2025/oQdzQvKHw8JyXbN87 OpenAI Valuation: https://www.nytimes.com/2025/03/31/technology/openai-valuation-300-billion.html?login=smartlock&auth=login-smartlock OpenAI Cybersecurity: https://www.bloomberg.com/news/articles/2024-01-16/openai-working-with-us-military-on-cybersecurity-tools-for-veterans Deep research System Card: https://cdn.openai.com/deep-research-system-card.pdf https://openai.com/index/paperbench/ AI 2027: https://ai-2027.com/ METR Paper: https://arxiv.org/pdf/2503.14499 OpenAI non-profit: https://openai.com/index/nonprofit-commission-guidance/ NYT Piece: https://www.nytimes.com/2025/04/03/technology/ai-futures-project-ai-2027.html?unlocked_article_code=1.804._yKi.QhwOp15Q3tcU&smid=url-share&s=09 Kokotajlo predictions 2021: https://www.lesswrong.com/posts/6Xgy6CAf2jqHhynHL/what-2026-looks-like https://simple-bench.com/ Non-hype Newsletter: https://signaltonoise.beehiiv.com/ Podcast: https://aiexplainedopodcast.buzzsprout.com/
	Sora 2 - It will only get more realistic from here	01 Oct 2025	00:15:43
Sora 2 - the start of the infinite slop-feed or a key step to a generalist agent? Better than VEO 3 or over-hyped? I bring out 6 details you may have missed, contrast the announcement to Periodic Labs and even squeeze in some Claude Sonnet 4.5 analysis. Maybe I should make my videos longer… https://80000hours.org/aiexplained AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:00 - Introduction 00:40 - Two models? 01:15 - Rollout Details 01:43 - Versus Sora 1 / Veo 3 04:30 - Sora App / Social Media 06:40 - Masterplan 09:30 - Generalist Agent? Periodic Labs 12:05 - Claude Sonnet 4.5 13:42 - Future Outlook Announcement: https://openai.com/index/sora-2/ Launch Video: https://www.youtube.com/live/gzneGhpXwjU System Card: https://cdn.openai.com/pdf/50d5973c-c4ff-4c2d-986f-c72b5d0ff069/sora_2_system_card.pdf Sam Altman Blog Post on Sora App: https://blog.samaltman.com/sora-2 Most Intelligent Claim: https://x.com/willdepue/status/1973089331284681110 GTA: https://x.com/AndrewCurran_/status/1973298436536766666 Meta Vibes: https://x.com/alexandr_wang/status/1971295156411433228?s=46 Altman on Regulations: https://www.lesswrong.com/posts/5jjk4CDnj9tA7ugxr/openai-email-archives-from-musk-v-altman OpenAI Profit: https://www.theinformation.com/articles/openais-first-half-results-4-3-billion-sales-2-5-billion-cash-burn?rc=sy0ihq Periodic Labs: https://periodic.com/ https://www.nytimes.com/2025/09/30/technology/ai-meta-google-openai-periodic.html https://x.com/LiamFedus/status/1973055380193431965 https://baincapitalventures.com/insight/we-must-know-we-will-know/?s=09 Sonnet 4.5: https://www.anthropic.com/news/claude-sonnet-4-5 https://simple-bench.com/ Non-hype Newsletter: https://signaltonoise.beehiiv.com/ Podcast: https://aiexplainedopodcast.buzzsprout.com/
	Gemini 2.5 Pro - It’s a Smart Chatbot … (New Simple High Score)	28 Mar 2025	00:21:21
Gemini gets a new record on Simple Bench, and several other benchmarks. I’ll go deep to explore its nuances, including how it deceptively reverse engineers answers, does better on certain coding benchmarks than others, may have a universal ‘conceptual language’ … https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained … and more. Plus practical tips, a note on security and Kling vs Veo 2 guest appearance. AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:00 - Introduction 00:36 - Fiction Bench 02:41 - Practicality - YouTube urls + Security - cut-off date 03:42 - Coding 06:22 - WeirdML Bench 07:01 - Simple Bench Record High 11:23 - Reverse Engineering! 13:22 - Anthropic Paper 17:49 - 3 Caveats Gemini 2.5 Updated: https://deepmind.google/technologies/gemini/ Fiction Live Bench: https://fiction.live/stories/Fiction-liveBench-Feb-19-2025/oQdzQvKHw8JyXbN87 https://simple-bench.com/ WeirdML: https://htihle.github.io/weirdml.html https://x.com/htihle/status/1905014058228625542 Anthropic Thoughts: https://www.anthropic.com/research/tracing-thoughts-language-model https://transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-cot https://aistudio.google.com/prompts/new_chat Search Study: https://www.cjr.org/tow_center/we-compared-eight-ai-search-engines-theyre-all-bad-at-citing-news.php Live bench: https://livebench.ai/#/ Paper: https://arxiv.org/pdf/2406.19314 LiveCode Bench: https://livecodebench.github.io/ SWE-Verified: https://arxiv.org/pdf/2310.06770 Non-hype Newsletter: https://signaltonoise.beehiiv.com/
	Did AI Just Get Commoditized? Gemini 2.5, New DeepSeek V3, & Microsoft vs OpenAI	25 Mar 2025	00:13:47
Gemini 2.5 is out, on the same day as the new DeepSeek V3 (which should power Deepseek R2). Do both models prove AI is being commoditized? Let’s find out, on this blockbuster day of AI releases. Plus exclusives from the Information, Simple indications, Vista Bench, LM Arena and more… AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:00 - Introduction 01:15 - Gemini 2.5 Benchmarks 05:46 - Long Context, Simple indication 07:08 - New Deepseek V3 -024 09:11 - Microsoft MAI 11:48 - 90% of code but new Claude jobs ‘World’s most powerful model’: https://x.com/OfficialLoganK/status/1904580368432586975 Gemini 2.5 Release Notes: https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/#gemini-2-5-thinking ‘Commoditized’: https://the-decoder.com/microsoft-ceo-satya-nadella-says-ai-models-are-getting-commoditized/ Microsoft Information report: https://www.theinformation.com/articles/microsofts-ai-guru-wants-independence-from-openai-thats-easier-said-than-done?rc=sy0ihq LMarena: https://x.com/lmarena_ai/status/1904581128746656099/photo/1 Free for now: https://x.com/btibor91/status/1904578053537476628 Vista Bench:https://scale.com/leaderboard/visual_language_understanding DeepSeek V3: https://huggingface.co/deepseek-ai/DeepSeek-V3-0324 Claude Plays Pokemon: https://www.twitch.tv/claudeplayspokemon Amodei: 100% Coding: https://www.youtube.com/watch?v=esCSpbDPJik&t=3017s Anthropic Jobs: https://job-boards.greenhouse.io/anthropic/jobs/4020717008 Microsoft Money from Onslaught: https://www.972mag.com/microsoft-azure-openai-israeli-army-cloud/ https://simple-bench.com/ Release Date Comments: https://x.com/zacharynado/status/1904647277861318979 Non-hype Newsletter: https://signaltonoise.beehiiv.com/
	Manus AI - The Calm Before the Hypestorm … (vs Deep Research + Grok 3)	13 Mar 2025	00:12:58
Is Manus AI the memecoin of the AI world, or legit? I’ll compare it to OpenAI’s Deep Research, Operator, Grok 3 DeepSearch and more to find out. I’ll also let you in on some of the secrets of what makes a good hype campaign, the estimated costs of Manus AI, and where it is strong. Other news (yes, Gemini image editing and research hacking, I mean you), will have to wait for a few more hours, as millions enquire about Manus AI. https://app.grayswan.ai/arena AI Insiders ($9!): https://www.patreon.com/AIExplained Patreon Vid: https://www.patreon.com/posts/4-ai-trends-in-123857767 Chapters: 00:00 - Introduction 00:46 - Hype Campaign 02:40 - Single, Public Benchmark 03:12 - What is Manus AI? 04:22 - Test 1 05:12 - Cost and Rate Limits 06:15 - Test 2 vs Deep Research + Grok 3 DeepSearch 08:24 - Test 3 (not AGI) 11:10 - 4 Trends in AI in 2025 11:37 - Hype Works Manus AI: https://manus.im/app Xiao Hong Interview: https://www.chinatalk.media/p/manus-chinas-latest-ai-sensation Gaia Benchmark: https://openreview.net/pdf?id=fibxvahvs3 MIT Report: https://www.technologyreview.com/2025/03/11/1113133/manus-ai-review/ Information Report: https://www.theinformation.com/articles/anthropics-claude-drives-strong-revenue-growth-while-powering-manus-sensation?rc=sy0ihq Hype Examples: https://x.com/Saboo_Shubham_/status/1898425707401031940 https://x.com/EHuanglu/status/1899110687902978373 https://x.com/AJs_AI/status/1898756132384178291 Mistakes: https://x.com/TheXeophon/status/1898737178273829220 Tools and Code: https://x.com/peakji/status/1898994802194346408 https://operator.chatgpt.com/ Non-hype Newsletter: https://signaltonoise.beehiiv.com/ Podcast: https://aiexplainedopodcast.buzzsprout.com/
	GPT 4.5 - not so much wow	28 Feb 2025	00:25:05
GPT 4.5 is here, and do you remember when AI lab CEOs like Sam Altman and Dario Amodei were betting everything on scaling up base models like this one? Well let’s find out what would have happened if the future of AI rested on models like GPT 4.5. You’ll see all the benchmarks, highlights of the paper, emotional intelligence and humor tests, Simple Bench results (reddit was an unreliable source), and why it’s not all bad news for OpenAI. https://www.emergentmind.com/ AI Insiders (now $9!): https://www.patreon.com/AIExplained Chapters 00:00 - Introduction 01:04 - Details and Benchmarks 03:04 - Emotional intelligence? 08:37 - Creative writing? 11:40 - Visual reasoning and Pricing 12:41 - Simple Performance 16:01 - End of Pretraining Scaling? 17:03 - CEO Hype 18:11 - System Card Highlights 23:32 - Karpathy Reaction GPT 4.5 System card: https://cdn.openai.com/gpt-4-5-system-card-2272025.pdf Release Notes: https://openai.com/index/gpt-4-5-system-card/ Altman Hype: https://x.com/sama/status/1891533802779910471 Details: https://openai.com/index/introducing-gpt-4-5/ https://x.com/OpenAI/status/1895219596317335792 End of an Era: https://x.com/wgussml/status/1895187231666774377 Anthropic Original Claim: https://techcrunch.com/2023/04/06/anthropics-5b-4-year-plan-to-take-on-openai/ Smell: https://x.com/rapha_gl/status/1895213014699385082 Bob McGrew: https://x.com/bobmcgrewai/status/1895228291981943265 Deep Research System Card: https://cdn.openai.com/deep-research-system-card.pdf Reddit: https://www.reddit.com/r/singularity/comments/1izu1t7/gpt45_crushes_simple_bench/ API Pricing: https://openai.com/api/pricing/ LiveStream: https://www.youtube.com/watch?v=cfRYp0nItZ8&t=1s https://simple-bench.com/ Karpathy Comparison: https://x.com/karpathy/status/1895213020982472863 https://x.com/karpathy/status/1895337579589079434 Non-hype Newsletter: https://signaltonoise.beehiiv.com/
	Claude 3.7 is More Significant than its Name Implies (ft DeepSeek R2 + GPT 4.5 coming soon)	25 Feb 2025	00:27:39
Claude 3.7 is here, hot on the heels of Grok 3 and a host of other developments, but how good is it really? And what does it say about the next few months in AI? I’ve read the papers, played with the model for hours, and benched it on Simple. Things aren’t slowing down. Plus the latest in humanoid robots, led by Helix and freaked out by Protoclone. And reports of GPT 4.5 and DeepSeek R2. GraySwan Competition! https://app.grayswan.ai/arena/challenge/agent-red-teaming https://x.com/GraySwanAI/status/1894084923260043282 Chapters: 00:00 - Introduction 01:25 - Claude 3.7 New Stats/Demos 05:22 - 128k Output 06:13 - Pokemon 06:58 - Just a tool? 09:54 - DeepSeek R2 10:20 - Claude 3.7 System Card/Paper Highlights 17:18 - Simple Record Score/Competition 20:37 - Grok 3 + Redteaming prizes 22:26 - Google Co-scientist 24:02 - Humanoid Robot Developments 3.7 Release Notes: https://www.anthropic.com/news/claude-3-7-sonnet vs o3 and Grok 3: https://x.com/12exyz/status/1891723056931827959 Extended Thinking: https://www.anthropic.com/research/visible-extended-thinking?s=09 System Prompt: https://docs.anthropic.com/en/release-notes/system-prompts#feb-24th-2025 System Card: https://assets.anthropic.com/m/785e231869ea8b3b/original/claude-3-7-sonnet-system-card.pdf Unfaithful CoT: https://arxiv.org/pdf/2305.04388 Original Constitution: https://www.anthropic.com/news/claudes-constitution Responsible Scaling Policy: https://assets.anthropic.com/m/24a47b00f10301cd/original/Anthropic-Responsible-Scaling-Policy-2024-10-15.pdf Amodei and Hassabis:https://www.youtube.com/watch?v=4poqjZlM8Lo https://simple-bench.com/ 400 Weekly Users: https://x.com/bradlightcap/status/1892579908179882057 Grok 3 Jailbroken: https://x.com/LinusEkenstam/status/1893832876581380280 Google Co-Scientist: https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/ But Hassabis Says Years Away: https://www.youtube.com/watch?v=yr0GiSgUvPU&t=156s DeepSeek R2 Reuters: https://www.reuters.com/technology/artificial-intelligence/deepseek-rushes-launch-new-ai-model-china-goes-all-2025-02-25/ Protoclone: https://www.reddit.com/r/interestingasfuck/comments/1it9rpp/protoclone_the_worlds_first_bipedal/ Helix: https://www.figure.ai/news/helix TechTrance: https://www.youtube.com/@TheTechTrance/videos GPT 4.5 Soon:
	AGI: (gets close), Humans: ‘Who Gets the Money?’	11 Feb 2025	00:22:17
A 'frontier reasoning model' from just 1000 examples (s1). A $100B Musk bid for power. Gemini 2, Rand and warning from Amodei. Here’s 7-8 developments you may have missed but which I would argue help us understand how the next few years will play out. From labour vs capital to automating rival companies and countries, and from non-profit shenanigans to new mini-docs, there was just too much for me not to make a vid. GiveWell: https://www.givewell.org/charities/top-charities AI Insiders ($9!): https://www.patreon.com/AIExplained s1 Paper: https://arxiv.org/pdf/2501.19393 Musk Bid: https://www.wsj.com/tech/ai/musks-97-4-billion-openai-bid-piles-pressure-on-altman-f6749e6c?mod=hp_lead_pos1 Altman Reply: https://x.com/sama/status/1889059531625464090?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet Google vs OpenAI: https://x.com/sama/status/1888703820596977684 RAND Study: https://www.rand.org/pubs/perspectives/PEA3691-4.html Dev Meetup: https://x.com/btibor91/status/1888976302621040852 Altman $100 Trillion: https://www.nytimes.com/2023/03/31/technology/sam-altman-open-ai-chatgpt.html Karpathy Vid: https://www.youtube.com/watch?v=7xTGNNLPyMI Amodei Warning: https://www.anthropic.com/news/paris-ai-summit Bengio Source: https://www.youtube.com/watch?v=6HDjVncL5Go Chapters: 00:00 - Intro 01:37 - AGI Inches Closer 04:26 - ‘Super-Exponential’ 05:58 - Musk Bid 07:34 - Luxury Goods and Land 09:05 - ‘Benefits All Humanity’ 12:52 - ‘National Security’ 14:21 - s1 20:33 - Final thoughts Non-hype Newsletter: https://signaltonoise.beehiiv.com/
	Deep Research by OpenAI - The Ups and Downs vs DeepSeek R1 Search + Gemini Deep Research	03 Feb 2025	00:18:32
12 hours ago Deep Research was unveiled, and I’ve tested it thoroughly, including vs Deepseek R1 with search, Gemini Deep Research and even R1 in Perplexity. It’s a notable step forward, with one big caveat. I’ll go through all the benchmark figures, my initial impression of the o3 model within, and much more. Deep Research: https://openai.com/index/introducing-deep-research/ https://www.youtube.com/watch?v=YkCDVn3_wiw GAIA Bench: https://openreview.net/forum?id=fibxvahvs3 https://openreview.net/pdf?id=fibxvahvs3 CodeELO:https://arxiv.org/pdf/2501.01257 CamelCamel:https://uk.camelcamelcamel.com/ Deepseek R1 with search: https://chat.deepseek.com/ https://arxiv.org/pdf/2501.12948 HaluBench: https://arxiv.org/pdf/2407.08488 Chapters: 00:00 - Introduction 01:06 - Powered by o3, Humanity’s Last Exam, GAIA 03:55 - Simple Tests 06:00 - Good News vs Deepseek R1 and Gemini Deep Research 09:32 - Bad News on Hallucinations 14:14 - What Can’t it Browse? 14:42 - For Shopping? 16:40 - Final thoughts
	o3-mini and the “AI War”	31 Jan 2025	00:15:21
o3-mini is here, and yes, I’ve read the paper in full - 2 hours after release, and even the post-launch Reddit AMA. Some epic details like a FrontierMath score that made me double-take, a likely new Cursor favorite, bio risk expertise and a cost-comparison with Deepseek R1., But does it perform on basic reasoning - let’s find out. Plus, arguably the bigger story - the increasingly frenetic rhetoric coming out of the West - and Dario Amodei and Alexandr Wang (CEOs of Anthropic and Scale AI respectively) in particular. The last thing we need is an “AI War”. https://wandb.me/simple-bench (Colab): https://colab.research.google.com/drive/1AVijcPnEkl8Gy_754XbRdG5m7Q5-9slg?usp=sharing Chapters: 00:00 - Introduction 00:45 - o3 mini 05:11 - First impressions vs Deepseek R1 07:21 - 10x Scale, o3-mini System Card, Amodei Essay, bitcoin wallets… 12:40 - Simple Competition Finale 13:03 - Clips and Final Thoughts on the “AI War” O3-mini: https://openai.com/index/openai-o3-mini/ Paper: https://cdn.openai.com/o3-mini-system-card.pdf Amodei Essay: https://darioamodei.com/on-deepseek-and-export-controls?s=09 FrontierMath wild stat:https://arxiv.org/pdf/2411.04872 Sam Altman Channels Napoleon: https://x.com/sama/status/1883185690508488934 Altman ‘pulls up releases’: https://x.com/sama/status/1884066337103962416 “AI War” by Wang: https://scale.com/blog/win-the-ai-war Anthropic Original Views on Capabilities: https://www.anthropic.com/news/core-views-on-ai-safety AI Insider Cost Comparison:https://x.com/arankomatsuzaki/status/1884676245922934788 Deepseek R1 Paper: https://arxiv.org/pdf/2501.12948 R1, o3-mini Price Comparison: https://techcrunch.com/2025/01/31/openai-launches-o3-mini-its-latest-reasoning-model/ Semianalysis on $1,3M deepseek salaries, and them falling behind as ‘the time gap to match US capabilities increases’: https://semianalysis.com/2025/01/31/deepseek-debates/ OpenAI Valuation: https://www.bloomberg.com/news/articles/2025-01-30/openai-in-talks-to-raise-funding-at-340-billion-value-wsj-says?srnd=phx-ai Wang Clip: https://x.com/tsarnick/status/1867700453494206883 Amodei Clip: https://x.com/ai_ctrl/status/1884951111771001188 https://simple-bench.com/
	Nothing Much Happens in AI, Then Everything Does All At Once	24 Jan 2025	00:23:09
When it rains, it pours. OpenAI Operator tested and reviewed, with full paper analysis. Perplexity Assistant is useful. Then Stargate, is it all smoke and mirrors? Strong rumours of an o3+ model from Anthropic. Then a full breakdown of Deepseek R1, and what it’s training method says about the state of AI. It’s not open source BTW. Plus Humanity’s Last Exam, and Hassabis Accelerates his AGI timeline. 00:00 - Introduction 00:54 - OpenAI Operator 04:53 - Perplexity Assistant 05:15 - StarGate 07:51 - Better than o3? 08:25 - DeepSeek R1 Analysis 12:12 - Training Secrets 15:19 - No More Process Rewarding ? 19:01 - Hassabis Timeline Accelerates 21:22 - Humanity’s Last Exam https://app.grayswan.ai/arena/chat/harmful-ai-assistant https://app.grayswan.ai/arena https://openai.com/index/computer-using-agent/ System Prompt: https://github.com/wunderwuzzi23/scratch/blob/master/system_prompts/operator_system_prompt-2025-01-23.txt OpenAI Operator: https://operator.chatgpt.com/ System Card: https://cdn.openai.com/operator_system_card.pdf There is No Plan: https://x.com/jeffclune/status/1882120726339318007 Perplexity Assistant: https://x.com/perplexity_ai/status/1882466239123255686 Stargate: https://openai.com/index/announcing-the-stargate-project/ Labour goes to 0: https://moores.samaltman.com/ Larry Ellison AI Surveillance: https://x.com/TheChiefNerd/status/1882042989184430332 Amodei 1984: https://www.bloomberg.com/news/articles/2025-01-22/anthropic-ceo-says-openai-s-stargate-venture-seems-chaotic Microsoft Hesitate: https://www.theinformation.com/articles/why-sam-altman-joined-forces-with-larry-ellison-and-took-a-step-back-from-microsoft?rc=sy0ihq Dylan Patel o3+ for Anthropic: https://www.youtube.com/watch?v=7EH0VjM3dTk Deepseek R1: https://arxiv.org/pdf/2501.12948 https://arxiv.org/pdf/2412.19437 Diagram: https://pbs.twimg.com/media/GhyQsM6WQAE7W52?format=jpg&name=large https://simple-bench.com/ Process: https://x.com/sama/status/1664018190840614912 https://x.com/karpathy/status/1835561952258723930 https://openai.com/index/trading-inference-time-compute-for-adversarial-robustness/?s=09 Demis Interview: https://www.youtube.com/watch?v=yr0GiSgUvPU Humanity’s Last Exam: https://agi.safe.ai/ https://x.com/DanHendrycks/status/1882481730671857815 https://www.nytimes.com/2025/01/23/technology/ai-test-humanitys-last-exam.html?s=09
	Altman Expects a ‘Fast Take-off’, ‘Super-Agent’ Debuting Soon and DeepSeek R1 Out	20 Jan 2025	00:13:11
OpenAI looks set to debut their Operator system, and some leaks are out. At the same time Deepseek R1 releases some numbers, and Sam Altman says he might have been wrong before, and now anticipates a 'fast take-off'. Plus two papers to give you an idea of what a super-agent might be decent at doing, some more exclusive article analysis and much more. Who said anything else is happening today... 80,000 Hours Channel: https://www.youtube.com/channel/UCafjal1QYJ3rb0Y9xZk1Ezg Spotify: https://open.spotify.com/show/2WzJwXWBDnn4iZ7odKwDib AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:00 - Introduction 01:13 - Pro Cost and OpenAI Operator 04:00 - Agent Benchmarks Being Targeted 07:48 - Fast Take-off, Altman 08:48 - Altman flip-flops 10:02 - Deepseek R1 First Reaction Altman ‘100x expectations out of control’: https://x.com/sama/status/1881258443669172470 OpenAI Operator Table: https://x.com/btibor91/status/1881285255266750564 WebVoyager: https://arxiv.org/pdf/2401.13919 OSWorld: https://arxiv.org/pdf/2404.07972 Axios Exclusive 1 (SuperAgent): https://www.axios.com/2025/01/19/ai-superagent-openai-meta?s=09 Axios Exclusive 2: https://www.axios.com/2025/01/18/biden-sullivan-ai-race-trump-china Deepseek R1 Numbers: https://x.com/deepseek_ai/status/1881318130334814301 Does 1.5B outperform 3.5 Sonnet on Math?: https://x.com/reach_vb/status/1881319500089634954 Deepseek R1 (deepseek-reasoner) Pricing: https://api-docs.deepseek.com/quick_start/pricing/ Altman Fast Takeoff: https://x.com/tsarnick/status/1879100390840697191 OpenAI Economic Blueprint: https://cdn.openai.com/global-affairs/ai-in-america-oai-economic-blueprint-20250113.pdf Target is Long-horizon Tasks: https://x.com/karinanguyen_/status/1879576037249667520 Support Regulations: https://www.techemails.com/p/elon-musk-and-openai https://www.nytimes.com/2023/05/16/technology/openai-altman-artificial-intelligence-regulation.html Donation: https://qz.com/sam-altman-donate-million-zuckerberg-bezos-donald-trump-1851721035 Amodei on Regulations by 2025: https://www.youtube.com/watch?v=ugvHCXCOmm4 ‘Feel the AGI’: https://x.com/polynoamial?lang=en GPT-5 and o-series merger: https://x.com/sama/status/1880358749187240274 o1 Thinks in Chinese: https://techcrunch.com/2025/01/14/openais-ai-reasoning-model-thinks-in-chinese-sometimes-and-no-one-really-knows-why/ Non-hype Newsletter: https://signaltonoise.beehiiv.com/
	OpenAI Tests if GPT-5 Can Automate Your Job - 4 Unexpected Findings	26 Sep 2025	00:14:06
An OpenAI report released in the last 24 hours is the best look we have as to whether 2025 AI can automate your job. I’ll go through 4 unexpected findings, from which model is best at what, to practical tips and massive caveats. Plus UFC robots, radiologist essay, don’t trust videos and the blockers to the singularity. Gray Swan: https://app.grayswan.ai/ai-explained GDPval: https://cdn.openai.com/pdf/d5eb7428-c4e9-4a33-bd86-86dd4bcf12ce/GDPval.pdf [GDP Impact: https://fred.stlouisfed.org/release/tables?rid=331&eid=211 Task List: https://www.onetonline.org/link/summary/11-9141.00 Summer Tweet: https://x.com/LHSummers/status/1971252567981146347 Emad: https://x.com/EMostaque/status/1971254153067593739 Robots: https://x.com/cixliv/status/1967663286679478759 Unitree G1: https://x.com/UnitreeRobotics/status/1970039940022239491 Don’t Trust Video: https://x.com/AISafetyMemes/status/1970453369446871420 AGI Tweet: https://x.com/hyhieu226/status/1968378785709133915 Blockers to the Singularity: https://www.patreon.com/posts/blockers-to-and-139264812 Framework: https://gemini.google.com/share/f4b9c85a6ae9 METR Study (Dev Slowdown): https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ Karpathy Tweet: https://x.com/karpathy/status/1971220449515516391 Radiology Essay: https://worksinprogress.co/issue/the-algorithm-will-see-you-now/ Chapters: 00:00 - Introduction 00:55 - OpenAI Report Summary 02:40 - Tipping Point Speed-up 04:11 - Better than Industry Experts? 06:33 - Big Caveat 11:10 - Karpathy and the Radiologist Analogy 13:30 - Outro
	OpenAI Backtracks on Superintelligence + Altman Brings His Timeline Forward	08 Jan 2025	00:23:41
Sam Altman unexpectedly brings his timelines to AGI forward, while OpenAI backtrack on superintelligence. None of these changes were heralded, but they are significant. Plus the new year brings new assessments of the true capability of models to automate 'large swathes of the economy'. I'll give my prediction on that front for 2025, announcement a new Simple Bench competition, and showcase Kling 1.6 vs Veo 2 vs Sora, and much more. wandb.me/simple-bench (Colab): https://colab.research.google.com/drive/1AVijcPnEkl8Gy_754XbRdG5m7Q5-9slg?usp=sharing TheAgentCompany Paper: https://arxiv.org/pdf/2412.14161v1 Sam Altman Major Interview: https://www.bloomberg.com/features/2025-sam-altman-interview/?srnd=phx-ai OpenAI Agent Coming Jan 2025: https://www.theinformation.com/articles/why-openai-is-taking-so-long-to-launch-agents?rc=sy0ihq Altman Singularity: https://x.com/sama/status/1875603249472139576 Altman Original Timeline: https://www.youtube.com/watch?v=7dCPytNTnjk&t=621s https://www.ft.com/content/34a7a082-e685-4e02-bca7-61ff89d99ed2 OpenAI Original Emails: https://www.lesswrong.com/posts/5jjk4CDnj9tA7ugxr/openai-email-archives-from-musk-v-altman-and-openai-blog DeepMind Sky News 2014 Article: https://news.sky.com/story/google-buys-uk-intelligence-firm-deepmind-10419783 Altman Blog Reflections: https://blog.samaltman.com/reflections OpenAI Changes Who Gets AGI: https://openai.com/index/why-our-structure-must-evolve-to-advance-our-mission/?s=09 OpenAI 5 Levels: https://www.bloomberg.com/news/articles/2024-07-11/openai-sets-levels-to-track-progress-toward-superintelligent-ai Altman 2015: https://blog.samaltman.com/machine-intelligence-part-1 OpenAI React to Anthropic: https://www.theinformation.com/articles/how-anthropic-got-inside-openais-head?rc=sy0ihq Microsoft $100B Definition: https://www.theinformation.com/articles/microsoft-and-openai-wrangle-over-terms-of-their-blockbuster-partnership?rc=sy0ihq Epoch Scramble for Task Benchmark: https://x.com/tamaybes/status/1876692639363612919 GPQA Progress: https://epoch.ai/data/ai-benchmarking-dashboard Task Length Crucial for ARC-AGI: https://anokas.substack.com/p/llms-struggle-with-perception-not-reasoning-arcagi RL Environment Tweet: https://x.com/vedantmisra/status/1876327518157807990 Jason Wei Talk: https://www.youtube.com/watch?v=yhpjpNXJDco Miles Brunda
	o3 - wow	21 Dec 2024	00:22:20
o3 isn’t one of the biggest developments in AI for 2+ years because it beats a particular benchmark. It is so because it demonstrates a reusable technique through which almost any benchmark could fall, and at short notice. I’ll cover all the highlights, benchmarks broken, and what comes next. Plus, the costs OpenAI didn’t want us to know, Genesis, ARC-AGI 2, Gemini-Thinking, and much more. FrontierMath: https://epoch.ai/frontiermath https://arxiv.org/pdf/2411.04872 Chollet Statement:https://arcprize.org/blog/oai-o3-pub-breakthrough MLC Paper: https://www.scientificamerican.com/article/new-training-method-helps-ai-generalize-like-people-do/?utm_campaign=socialflow&utm_source=twitter&utm_medium=social AlphaCode 2: https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdf Human Performance on ARC-AGI: https://arxiv.org/pdf/2409.01374v1 Wei Tweet ‘3 months’:https://x.com/_jasonwei/status/1870184982007644614 Deliberative Alignment Paper: https://openai.com/index/deliberative-alignment/ Brown Safety Tweet: https://x.com/polynoamial/status/1870196476908834893 Swe-Bench Verified: https://openai.com/index/introducing-swe-bench-verified/ Amodei Prediction: https://x.com/OfirPress/status/1858567863788769518 David Dohan: 16 hours https://x.com/dmdohan/status/1870171404093796638 OpenAI Personal Writing: https://openai.com/index/learning-to-reason-with-llms/ https://simple-bench.com/ John Hallman Tweet: https://x.com/johnohallman/status/1870233375681945725 00:00 - Introduction 01:19 - What is o3? 03:18 - FrontierMath 05:15 - o4, o5 06:03 - GPQA 06:24 - Coding, Codeforces + SWE-verified, AlphaCode 2 08:13 - 1st Caveat 09:03 - Compositionality? 10:16 - SimpleBench? 13:11 - ARC-AGI, Chollet
	Never Browse Alone? - Gemini 2 Live and ChatGPT Vision	12 Dec 2024	00:13:40
The ‘Gemini 2 Era’ begins … with screen-sharing? But really, it’s a great free tool, for curiosity satisfying rather than bleeding-edge intelligence. I give you the benchmarks, the highlights and of course, the latest from OpenAI Advanced Voice Mode with Vision. Plus Deep Research in Gemini Advanced, Simple Bench updates, Santa and what might be for some of you Google’s deflating admission. 00:00 - Introduction 00:38 - Live Interaction 03:43 - Gemini 2.0 Flash Benchmarks 05:10 - Audio and Image Output 06:38 - Project Mariner (+ WebVoyager Bench) 08:49 - But Progress Slowing Down? 10:43 - OpenAI Announcements + Games https://aistudio.google.com/live Gemini 2.0 Flash Benchmarks: https://deepmind.google/technologies/gemini/ Project mariner: https://deepmind.google/technologies/project-mariner/ WebVoyager: https://x.com/laurentsifre/status/1858918588683296875/photo/1 Gemini Game play: https://www.youtube.com/watch?v=IKuGNHJBGsc Advanced Voice Mode OpenAI: https://www.youtube.com/watch?v=NIQDnWlwYyQ https://simple-bench.com/ Claude Computer Use: https://docs.anthropic.com/en/docs/build-with-claude/computer-use Oriol Vinyals Interview: https://www.youtube.com/watch?v=78mEYaztGaw&t=687s
	Sora is Out, But is it a Distraction?	10 Dec 2024	00:15:34
After a 10 month wait, OpenAI have released Sora to paying users. With just a prompt it can generate videos of up to 20 seconds in lower resolutions, and 10 seconds at 1080p if you can fork out $200/month. I’ve tested it and read the system card. The user interface is quite beautiful, even if the videos themselves operate until entirely new rules of physics. But I can’t help wondering if OpenAI want up to focus on releases like this, rather than some quietly broken promises. 80,000 hours Website, Podcast + Channel: https://80000hours.org/ https://open.spotify.com/show/2WzJwXWBDnn4iZ7odKwDib https://www.youtube.com/@eightythousandhours/videos https://openai.com/sora/ Sora Countries: https://help.openai.com/en/articles/10250692-sora-supported-countries Sora Credits: https://help.openai.com/en/articles/10245774-sora-billing-credits-faq https://runwayml.com/ and https://pika.art/home DeepMind Veo: https://deepmind.google/technologies/veo/ Sam Altman Ads as Last Resort: https://www.windowscentral.com/software-apps/openai-could-chase-intrusive-ads-as-last-resort But OpenAI Considering Ads: https://www.inc.com/ben-sherry/is-openai-getting-into-the-advertising-business-the-company-is-sending-mixed-messages/91033533 OpenAI Backtracks on Microsoft AGI Clause: https://www.ft.com/content/2c14b89c-f363-4c2a-9dfc-13023b6bce65 As Microsoft Boast of Labor Savings: https://www.theinformation.com/articles/microsofts-new-sales-pitch-for-ai-spend-less-money-on-humans?rc=sy0ihq OpenAI Military Pivot: https://www.technologyreview.com/2024/12/04/1107897/openais-new-defense-contract-completes-its-military-pivot/ Employees Have Doubts: https://www.washingtonpost.com/technology/2024/12/06/openai-anduril-employee-military-ai/?nid=top_pb_signin&arcId=KZIV7PLRHBCVNPAIAAAVUNRHIM&account_location=ONSITE_HEADER_ARTICLE
	o1 Pro Mode – Full Analysis (plus o1 paper highlights)	05 Dec 2024	00:16:43
Oh boy. o1 pro mode out on the same night as o1 full. I read the 49 page paper, ran my own tests, spent my fuel allowance on Pro Mode and will give you all the highlights. Suffice to say the story is not as simple as it first appears. Weights and Biases’ Weave: wandb.me/ai_explained Plus, GPT-4.5? MLE Bench, Simple Update, Image Analysis and much more o1 System Card: https://cdn.openai.com/o1-system-card-20241205.pdf Apollo Research: https://www.apolloresearch.ai/research/scheming-reasoning-evaluations Altman Tweet: https://x.com/AnonCEOMakeItAi/status/1864763052622504344 ChatGPT Pro: https://openai.com/index/introducing-chatgpt-pro/ Tibor Blaho: https://x.com/btibor91/status/1864709670470066605 Simple-bench.com 00:00 - Introduction 00:27 - ChatGPT Pro is $200 01:25 - OpenAI Benchmarks 03:20 - o1 System Card, o1 and o1 Pro Mode vs o1-preview 06:18 - Simple Bench surprising results on sample 08:31 - Weight & Biases 09:05 - Image Analysis Compared 12:51 - More Benchmarks and Safety
	AI Breaks Its Silence: OpenAI’s ‘Next 12 Days’, Genie 2, and a Word of Caution	05 Dec 2024	00:15:29
Calmest before the storm? Whatever analogy you want to use things had gotten quiet toward the end of 2024. But then tonight we got Genie 2, and a series of scheduled announcements from OpenAI. Sora is soon here, and o1, but I dive deeper into what it all means and whether reliability is on a path to being solved, ft: two recent papers. Assembly AI Speech to Text: https://www.assemblyai.com/?utm_source=youtube&utm_medium=influencer&utm_campaign=ai_explained Plus Kling Motion Brush, Simple Bench QwQ update and much more. Genie 2: https://deepmind.google/discover/blog/genie-2-a-large-scale-foundation-world-model/ Jim Cramer: https://x.com/jimcramer/status/1864068878692675625 Give Us Full o1: https://x.com/tszzl/status/1863882905422106851 Verge Scoop: https://x.com/tomwarren/status/1864326361415925861 O1 Learning to Reason Benchmarks: https://openai.com/index/learning-to-reason-with-llms/ SIMA AI: https://arxiv.org/pdf/2404.10179 Genie Paper: https://arxiv.org/pdf/2402.15391 My Video on Genie: https://www.youtube.com/watch?v=gGKsfXkSXv8 Oasis Minecraft: https://x.com/risphereeditor/status/1852619965511204974 LLMs Procedural Knowledge Paper: https://arxiv.org/pdf/2411.12580 Bag of Heuristics Paper: https://arxiv.org/pdf/2410.21272 Jensen Huang Hallucinations: https://www.tomshardware.com/tech-industry/artificial-intelligence/jensen-says-we-are-several-years-away-from-solving-the-ai-hallucination-problem-in-the-meantime-we-have-to-keep-increasing-our-computation DeepSeek Interview: https://www.chinatalk.media/p/deepseek-ceo-interview-with-chinas Kling Motion Brush: https://klingai.com/image-to-video Tim Rocktaschel Book: https://geni.us/ArtificialIntelligence 00:43 - OpenAI 12 Days, Sora Turbo, o1 03:06 - Genie 2 08:26 - Jensen Huang and Altman Hallucination Predictions 09:45 - Bag of Heuristics Paper 11:40 - Procedural Knowledge Paper 13:02 - AssemblyAI Universal 2 13:45 - SimpleBench QwQ and Chinese Models 14:42 - Kling Motion Brush
	New Google Model Ranked ‘No. 1 LLM’, But There’s a Problem	15 Nov 2024	00:15:19
A new and mysterious Gemini model appears at the top of the leaderboard, but is that the full story? I dig behind the headline to show you some anti-climactic results, give some context with leaks in the last 48 hours of diminishing returns to scaling, and add the response of Altman, OpenAI and co. The future is about to look a lot stranger... 80,000 hours Podcast and Channel: https://open.spotify.com/show/2WzJwXWBDnn4iZ7odKwDib https://www.youtube.com/@eightythousandhours/videos You can now gift memberships to AI Insiders (my Patreon w/ exclusive vids, network): https://www.patreon.com/AIExplained/gift ‘There is no wall’: https://x.com/sama/status/1856941766915641580 https://x.com/vedantmisra/status/1857148554105544708 Gemini Ranking: https://lmarena.ai/?leaderboard API not yet up: https://x.com/OfficialLoganK/status/1857106844805681153 ‘Just Die Chat’: https://x.com/koltregaskes/status/1856754648146653428 Google CEO tweet: https://x.com/sundarpichai/status/1857114106928718329 Sutskever Quote: https://www.reuters.com/technology/artificial-intelligence/openai-rivals-seek-new-path-smarter-ai-current-methods-hit-limitations-2024-11-11/ Another OpenAI Staffer Leaves: https://x.com/RichardMCNgo/status/1856843040427839804 Bloomberg Report: https://www.bloomberg.com/news/articles/2024-11-13/openai-google-and-anthropic-are-struggling-to-build-more-advanced-ai?s=09 Noam Brown on what OpenAI Researchers Believe: https://x.com/polynoamial/status/1855037689533178289 Clive Chan: https://x.com/itsclivetime/status/1855704120495329667 Chollet Responds to Altman: https://x.com/fchollet/status/1857060079586975852 https://x.com/sama/status/1856940152460869718 Altman Emails: https://x.com/TechEmails/status/1857285960997712356 Change of Heart: https://sd11.senate.ca.gov/news/senator-wiener-responds-openai-opposition-sb-1047 Amodei on ‘Empirical Regularities’: https://lexfridman.com/dario-amodei-transcript/ Verge Report: https://www.theverge.com/2024/10/25/24279600/google-next-gemini-ai-model-openai-december OpenAI Agents in January: https://www.bloomberg.com/news/articles/2024-11-13/openai-nears-launch-of-ai-agents-to-automate-tasks-for-users?srnd=phx-ai
	Leak: ‘GPT-5 exhibits diminishing returns’, Sam Altman: ‘lol’	10 Nov 2024	00:15:44
The last few days have seen two narratives emerge. One, derived from yesterday’s OpenAI leak in TheInformation, that GPT-5/Orion is a disappointment, and less of a leap than GPT-3 to GPT-4. The second comes from a series of 4 clips (shown in this video) from Sam Altman, regarding the ‘clear path’ to AGI. Let’s go beyond the headlines (and through papers like Frontier Math) to get closer to the ground truth… Plus Universal-2, Sora comments, Claude 3.5 Haiku SimpleBench update, and a great new AI video. Assembly AI Speech to Text: https://www.assemblyai.com/?utm_source=youtube&utm_medium=influencer&utm_campaign=ai_explained 00:39 – Bear Case, TheInformation Leak 04:01 – Bull Case, Sam Altman 06:20 – FrontierMath 11:29 – o1 Paradigm 13:11 – Text to Video Greatness and Universal-2 TheInformation Leak: https://www.theinformation.com/articles/openai-shifts-strategy-as-rate-of-gpt-ai-improvements-slows?rc=sy0ihq Noam Brown Replies: https://x.com/polynoamial/status/1855453104394637444 Sam Altman Y-Combinator Interview: https://www.youtube.com/watch?v=xXCBz_8hM9w&t=1556s Altman Reply: https://x.com/sama/status/1855100359511097828 https://simple-bench.com/ FrontierMath Paper: https://arxiv.org/pdf/2411.04872 Frontier Math Blog Post: https://epochai.org/frontiermath Tao: https://x.com/EpochAIResearch/status/1854996368814936250 MMLU Are We Done (cites me!): https://arxiv.org/pdf/2406.04127 Universal-2 https://www.assemblyai.com/research/universal-2 Noam Brown ‘We don’t know’: https://www.youtube.com/watch?v=Gr_eYXdHFis Anthropic Founder Response: https://x.com/jackclarkSF/status/1855485569998217231 Sora (Runway Comment): https://x.com/c_valenzuelab/status/1855026417354129455 Sora New Vid: https://www.youtube.com/watch?v=_iETa2KDRuw Darri3D Video: https://www.reddit.com/r/ChatGPT/comments/1gn0n3z/can_you/
	ChatGPT with Search, Altman Answers Anything and Simple Bench Out	01 Nov 2024	00:15:20
The Google destroyer, the Perplexity crusher? Or just hype? ChatGPT with Search is here, and simultaneously Altman and co did an AMA on Reddit, covering GPT-5, Sora, SearchGPT and a lot more. Plus, the biggest news of them all: Simple Bench is out. ChatGPT with Search: https://openai.com/index/introducing-chatgpt-search/ Altman AMA (ask me anything): https://www.reddit.com/r/ChatGPT/comments/1ggixzy/ama_with_openais_sam_altman_kevin_weil_srinivas/ https://x.com/sama/status/1852041075793522911 Perplexity Ads: https://www.cnbc.com/2024/08/22/perplexity-ai-plans-to-start-running-search-ads-in-fourth-quarter.html Perplexity: https://www.perplexity.ai/ https://simple-bench.com/
	The New Claude 3.5 Sonnet: Better, Yes, But Not Just in the Way You Might Think	28 Oct 2024	00:22:34
A new state of the art LLM (at least for creative writing and basic reasoning) but what lies behind the numbers that were put out? Is it for real, and are AI agents about to grab your mouse and shake your cursor? Plus, results on my own Simple Bench, and new tools from Runway (Act-One), HeyGen (Zoom Calls) and an updated NotebookLM. AI, without the hype. Weights and Biases' Weave: https://wandb.me/ai_explained
	ChatGPT Will Guess your Age, Flirt if Asked, and Can Call the Cops	16 Sep 2025	00:11:31
Sam Altman, CEO of OpenAI, announced a set of new ‘protections’ and ‘privileges’ for ChatGPT users, requiring a significant amount of trust from users. From predicting your age based on your chat to calling law enforcement if you are at risk of harm, to allowing non-minors to flirt. But amidst all of these announcements, there are interview snippets you may have missed, as Altman dramatically revises his predictions of AI impact on jobs. Plus a Hassbis backtrack to boot. https://80000hours.org/aiexplained Calling the Cops: https://openai.com/index/teen-safety-freedom-and-privacy/ Age Prediction: https://openai.com/index/building-towards-age-prediction/ Not Everyone Will Agree: https://x.com/sama/status/1967955739911364693?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet Theory 1: NYT Lawsuit: https://openai.com/index/response-to-nyt-data-demands/ Theory 2: FTC Investigation into AI Companions: https://x.com/AndrewCurran_/status/1966167585994764743 YT Does the Same: https://www.cbsnews.com/news/youtube-ai-powered-technology-teen-users/ Carlsen Interview: https://www.youtube.com/watch?v=5KmpT-BoVf4 vs Senate Testimony (70% Jobs): https://www.youtube.com/watch?v=5CWVP8-XVjQ Hallucinations Paper: https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf Hassbis Quote 1: https://www.youtube.com/watch?v=toShbNUGAyo vs Quote 2: https://www.youtube.com/watch?v=Kr3Sh2PKA8Y
	An ‘AI Bubble’? What Altman Actually said, the Facts and Nano Banana	26 Aug 2025	00:18:54
Wait, why did Sam Altman say AI was in a bubble? Or did he? Is it? 8 points for you to consider, before we all get distracted by Nano Banana. Chapters: 00:00 - Introduction 01:14 - Sam Altman Clarification 02:30 - Media Calls a Bubble (for the tenth time) 03:40 - MIT and McKinsey Analysed 08:21 - Incremental Progress Deceptive 12:07 - Reasoning Breakthroughs 15:31 - CEOs might not know their products 17:25 - But did stocks go down? 17:31 - Media is Contradictory of course https://donate.redcross.org.uk/appeal/gaza-crisis-appeal Bubble about to burst: https://www.telegraph.co.uk/business/2025/08/20/ai-report-triggering-panic-and-fear-on-wall-street/ Nano Banana: https://blog.google/products/gemini/updated-image-editing-model/ https://ai.studio/banana McKinsey Report: https://www.mckinsey.com/capabilities/quantumblack/our-insights/seizing-the-agentic-ai-advantage#/ https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai#/ Revenue: https://www.wsj.com/tech/ai/mckinsey-consulting-firms-ai-strategy-89fbf1be MIT Report: https://mlq.ai/media/quarterly_decks/v0.1_State_of_AI_in_Business_2025_Report.pdf Safe Superintelligence: https://techcrunch.com/2025/04/12/openai-co-founder-ilya-sutskevers-safe-superintelligence-reportedly-valued-at-32b/ Thinking Machines Lab: https://techcrunch.com/2025/07/15/mira-muratis-thinking-machines-lab-is-worth-12b-in-seed-round/ WSJ Prediction 2024: https://www.wsj.com/tech/ai/the-ai-revolution-is-already-losing-steam-a93478b1 WP Prediction 2023: https://www.washingtonpost.com/technology/2023/08/05/ai-hype-bubble-chatgpt/ Companies are Pouring Billions into AI: https://www.nytimes.com/2025/08/13/business/ai-business-payoff-lags.html Consumer Surplus: https://www.wsj.com/opinion/ais-overlooked-97-billion-contribution-to-the-economy-users-service-da6e8f55 Figure AI robot: https://x.com/adcock_brett/status/1958193476639826383 GDP Bet: https://x.com/adamdangelo/status/1627726566259318784?lang=en Genie 3 Immersion: https://x.com/holynski_/status/1953879983535141043 https://x.com/elonmusk/status/1953861448431718662 htttps://simple-bench.com MMMU: https://mmmu-benchmark.github.io/#leaderboard Prophet Arena: https://www.prophetarena.co/leaderboard NYT Jobs: https://www.nytimes.com/2025/08/19/opinion/ai-job-loss-deindustrialization.html Dawn of Reasoning?: https://openreview.net/pdf?id=FkKBxp0FhR vs :https://arxiv.org/pdf/2403.04121 ARC-AGI: https://arcprize.org/arc-agi/1/ https://x.com/fchollet/status/1870169764762710376?lang=en-GB Turing Test: https://arxiv.org/pdf/2503.23674 Mathematics of Starvation: https://www.theguardian.com/world/2025/jul/31/the-mathematics-of-starvation-how-israel-caused-a-famine-in-gaza https://donate.redcross.org.uk/appeal/gaza-crisis-appeal https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ METR Interview: https://www.patreon.com/c/aiexplained/posts AlphaEvolve: https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/ Paper: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/AlphaEvolve.pdf Amodei: https://kantrowitz.medium.com/the-making-of-anthropic-ceo-dario-amodei-449777529dd6 https://www.theloganbartlettshow.com/archive/ep-82-dario-amodeis-ai-predictions-through-2030#:~:text=DARIO%3A%20I%20think%20our%20concern,being%20responsible%20to%20accelerate%20things Unreleased OpenAI: https://x.com/alexwei_/status/1954966393419599962 VLMs Tricked: https://x.com/an_vo12/status/1943715159559545186 AI Insiders ($9!): https://www.patreon.com/AIExplained
	GPT-5 has Arrived	07 Aug 2025	00:15:01
GPT-5 will change how hundreds of millions of people use AI. Yes, you might have to forgive the chart crimes, the underwhelming livestream and Altman hype… But it’s a good model. I have read the 50 page system card in full, have the benchmark scores, coding tests, and things you might have missed. https://app.grayswan.ai/ai-explained Announcement: https://openai.com/index/introducing-gpt-5/ System Card: https://cdn.openai.com/pdf/8124a3ce-ab78-4f06-96eb-49ea29ffb52f/gpt5-system-card-aug7.pdf Extra Paper: https://cdn.openai.com/pdf/be60c07b-6bc2-4f54-bcee-4141e1d6c69a/gpt-5-safe_completions.pdf Altman tweet: https://x.com/sama/status/1953551377873117369 Livestream: https://www.youtube.com/watch?v=0Uu_VJeVVfo METR Report: https://metr.github.io/autonomy-evals-guide/gpt-5-report/ ARC-AGI-2: https://x.com/fchollet/status/1953511631054680085 Claude Opus 4.1: https://www.anthropic.com/news/claude-opus-4-1 MMMU: https://mmmu-benchmark.github.io/ Cursor Praise: https://x.com/ryolu_/status/1953531724895596669
	Genie 3: The World Becomes Playable (DeepMind)	05 Aug 2025	00:11:54
Soon, anything will be playable. A photo becomes an interactive world, a selfie becomes a new game. Genie 3 from Google, debuting just 2 hours ago, is what I mean, and I have the full analysis, plus the pushback I gave the authors (will it really lead to reliable AI agents? Is that even the point?). You make your own mind up, but it’s certainly fascinating, and not to be overlooked in the week that will bring us GPT-5. https://80000hours.org/aiexplained AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:00 - Introduction 01:27 - Background and Access 04:58 - Caveats 07:24 - Demo 10:12 - Conclusion Announcement: https://deepmind.google/discover/blog/genie-3-a-new-frontier-for-world-models/ Isaac Labs: https://developer.nvidia.com/isaac/lab Genie 2 Coverage: https://www.youtube.com/watch?v=jIm2T7h_a0M TED Talk Roblox: https://www.youtube.com/watch?v=-OAP0ho5AUg DeepThink Post: https://www.patreon.com/posts/deep-ish-on-new-135688441 AI Insiders ($9!): https://www.patreon.com/AIExplained Non-hype Newsletter: https://signaltonoise.beehiiv.com/
	How Not to Read a Headline on AI (ft. new Olympiad Gold, GPT-5 …)	21 Jul 2025	00:17:19
GPT-5 did what? OpenAI ahead of Google? There are 9 ways to misread the headlines of the last 48 hours, so this video is here to tell you what happened, sans sizzle. It’s been a fairly momentous last few days, so let’s dive in to the International Math Olympiad Gold, GPT-5 alpha release, whether mathematicians are out of jobs, and the white collar impact by year’s end. Job Board: https://80000hours.org/aiexplained New Documentary on Patreon: https://www.patreon.com/posts/our-new-age-of-133960279 Chapters: 00:00 - Introduction 00:18 - AI > Mathematicians? 01:23 - OPENAI vs GOOGLE 02:42 - Irrelevant to Jobs or … 06:45 - White-collar jobs gone? 10:26 - AI is Plateauing? 12:00 - We Don’t Know the Details… 14:33 - GPT-5 alpha 14:54 - Nothing but Exponentials? 15:53 - No Impact? Announcement: https://x.com/alexwei_/status/1946477742855532918 UCLA Math Prof: https://x.com/ErnestRyu/status/1946699302308635130 ChatGPT Agent: https://openai.com/index/introducing-chatgpt-agent/ Livestream: https://www.youtube.com/watch?v=1jn_RpbPbEc&t=796s System Card: https://cdn.openai.com/pdf/839e66fc-602c-48bf-81d3-b21eacc3459d/chatgpt_agent_system_card.pdf Jerry Tworek (OpenAI): https://x.com/MillionInt/status/1946556255490982022 https://x.com/MillionInt/status/1946558130906968330 Noam Brown Details: https://x.com/polynoamial/status/1946478249187377206 Trieu Tranh Retweet: https://x.com/Mihonarium/status/1946880931723194389 Neel Nanda: https://x.com/NeelNanda5/status/1946602953370173647 Terence Tao: https://mathstodon.xyz/@tao Sam Altman: https://x.com/sama/status/1946569252296929727 METR Dev Study: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ Ravid Schwatz: https://x.com/ziv_ravid/status/1946378712716562605 AlphaEvolve: https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/ https://simple-bench.com/ Meta Salary: https://www.tomshardware.com/tech-industry/artificial-intelligence/abel-founder-claims-meta-offered-usd1-25-billion-over-four-years-to-ai-hire-person-still-said-no-despite-equivalent-of-usd312-million-yearly-salary $2k per month: https://www.theinformation.com/articles/openai-considers-higher-priced-subscriptions-to-its-chatbot-ai-preview-of-the-informations-ai-summit?rc=sy0ihq
	Grok 4 - 10 New Things to Know	10 Jul 2025	00:11:43
Grok 4 is here, but did you know these 10 things about the new model? From benchmark caveats to soloing science, $300 a month secrets to Grok 5 promises, here's 10 new things to know in just under 12 minutes. AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:00 - Introduction 00:22 - Benchmark Results 02:11 - Benchmark Caveats 02:59 - ARC-AGI 2 03:35 - SimpleBench 04:49 - ‘Humanity’s Last Exam’ 07:20 - SuperGrok Heavy Price 07:58 - API Price 08:12 - Grok 5, Gemini 3.0 Beta, GPT-5 09:12 - System Prompt Change + $1B a month, pollution 10:20 - Not soloing science, helping you solo code Livestream: https://www.youtube.com/watch?v=1tQ_KrlHgfg&t=1s Price: https://grok.com/#subscribe https://x.com/ArtificialAnlys/status/1943166841150644622 Gemini DeepThink: https://blog.google/technology/google-deepmind/google-gemini-updates-io-2025/#deep-think https://simple-bench.com/ ARC-AGI 2: https://x.com/arcprize/status/1943168950763950555 Humanity’s Last Exam: https://agi.safe.ai/ SmartGPT: https://www.youtube.com/watch?v=hVade_8H8mE New Power Plant, 1m GPUs: https://www.tomshardware.com/tech-industry/artificial-intelligence/elon-musk-xai-power-plant-overseas-to-power-1-million-gpus Gemini 3.0 beta: https://web.archive.org/web/20250709174548/https://github.com/google-gemini/gemini-cli/blob/b0cce952860b9ff51a0f731fbb8a7649ead23530/packages/cli/src/ui/utils/errorParsing.test.ts Pollution: https://www.theguardian.com/technology/2025/apr/24/elon-musk-xai-memphis https://www.youtube.com/watch?v=C8rU4dv2w8Q https://www.youtube.com/watch?v=3VJT2JeDCyw System Prompt: https://github.com/xai-org/grok-prompts/blob/535aa67a6221ce4928761335a38dea8e678d8501/ask_grok_system_prompt.j2 Burn Rate: https://www.bloomberg.com/news/articles/2025-06-17/musk-s-xai-burning-through-1-billion-a-month-as-costs-pile-up Ron Johnson: https://x.com/jdcmedlock/status/1939814516503847259 Non-hype Newsletter: https://signaltonoise.beehiiv.com/ Podcast: https://aiexplainedopodcast.buzzsprout.com/
	Gemini 3 is Here: 11 Details You Might Have Missed	19 Nov 2025	00:21:42
Gemini 3 Pro is out, and records fell like snowflakes in Svalbard. No long description, chapters or links today, huge technical difficulties, including with audio, so just want to publish asap. https://app.grayswan.ai/ai-explained https://lmcouncil.ai AI Insiders ($9!): https://www.patreon.com/AIExplained Non-hype Newsletter: https://signaltonoise.beehiiv.com/ Podcast: https://aiexplainedopodcast.buzzsprout.com/
	GPT 5.2: OpenAI Strikes Back	12 Dec 2025	00:17:41
Full GPT-5.2 breakdown - did OpenAI reclaim the crown? A story of tokens, time and cost, plus 9 details you wouldn’t get just from reading the headlines. https://www.youtube.com/@eightythousandhours AI Insiders ($9!): https://www.patreon.com/AIExplained https://lmcouncil.ai Chapters: 00:00 - Introduction 00:55 - Better than Human @ Professional Tasks? 04:42 - Test time Compute 07:05 - Benchmark Selection 09:32 - Simple Results + council comparison 13:01 - Long Context 13:52 - Self-Improvement 15:00 - 10 Years + New Models Release Page: https://openai.com/index/introducing-gpt-5-2/ GPT 5.2 Benchmark Comparison: https://www.reddit.com/r/singularity/comments/1pka1y9/gpt52_all_20_benchmarks_rankings_and_pricing/ https://storage.googleapis.com/gweb-uniblog-publish-prod/original_images/gemini_3_table_final_HLE_Tools_on.gif https://lmcouncil.ai/benchmarks Charxiv: https://charxiv.github.io/#leaderboard GDPval: https://arxiv.org/pdf/2510.04374 My vid: https://www.youtube.com/watch?v=oK5LxMaROSA Kilpatrick: https://x.com/OfficialLoganK/status/1999270402712023158/photo/1 Noam Brown: https://x.com/polynoamial/status/1999189845164667132 New Model in New Year: https://www.theinformation.com/articles/openai-developing-garlic-model-counter-googles-recent-gains?rc=sy0ihq 10 Years of OpenAI: https://openai.com/index/ten-years/ GPQA: https://x.com/idavidrein/status/1841265634170278063 ARC-AGI 1-2: https://arcprize.org/arc-agi/2/ Sunday Robotics: https://x.com/tonyzzhao/status/1991204839578300813 Non-hype Newsletter: https://signaltonoise.beehiiv.com/ https://lmcouncil.ai
	You Are Being Told Contradictory Things About AI: 8 examples	05 Dec 2025	00:20:15
With headlines of an imminent job apocalypse, code red for ChatGPT and recursive self-improvement, at the same time as Anthropic's CEO yesterday saying we know how to scale to AGI, and Gemini 3 DeepThink out today, it is easy to get lost among the narratives and counter-narratives. So here are both, plus the facts behind them, for you to decide. https://epoch.ai/data/data-centers Epoch AI is the sponsor of today’s video, and my views, and those expressed in this video, do not necessarily reflect Epoch AI’s views in any way. Chapters: 00:00 - Introduction 00:42 - Job Apocalypse? 01:45 - Scaling to AGI 04:15 - Recursive Self-Improvement Needed, or Not 09:57 - OpenAI Code Red vs Gemini 3 DeepThink vs Claude Opus 4.5 13:27 - DeepSeek Speciale vs Mistral Large v3 16:45 - Claude Soul Document https://lmcouncil.ai/ AI Insiders ($9!): https://www.patreon.com/AIExplained Guardian Interview: https://www.theguardian.com/technology/ng-interactive/2025/dec/02/jared-kaplan-artificial-intelligence-train-itself MIT Study on Jobs/Tasks: https://iceberg.mit.edu/report.pdf vs https://www.cnbc.com/2025/11/26/mit-study-finds-ai-can-already-replace-11point7percent-of-us-workforce.html Amodei on Scaling: https://www.youtube.com/watch?v=FEj7wAjwQIk Claude Soul Document: https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5-opus-soul-document Capabilities Original Stance: https://www.anthropic.com/news/core-views-on-ai-safety Ilya Interview: https://www.dwarkesh.com/p/ilya-sutskever-2 Ricursive Intelligence: https://x.com/RicursiveAI/status/1995932204703346946 Economist Worker Usage of GenAI: https://www.economist.com/finance-and-economics/2025/11/26/investors-expect-ai-use-to-soar-thats-not-happening#selection-1409.94-1413.42 Mistral v3 Large: https://docs.mistral.ai/models/mistral-large-3-25-12 Compute Slowdown Paper: https://joel-becker.com/images/publications/forecasting_time_horizon_under_compute_slowdown.pdf https://x.com/joel_bkr/status/1993023436541903155 METR Chart: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/ https://www.theinformation.com/articles/openais-350-billion-computing-cost-problem?rc=sy0ihq OpenAI Code Red: https://www.anthropic.com/news/core-views-on-ai-safety Rocket Company: https://www.independent.co.uk/news/world/americas/sam-altman-rocket-elon-musk-spacex-b2878351.html DeepSeek Paper: https://arxiv.org/html/2512.02556v1 DeepSeek Crowdstrike CCP: https://www.crowdstrike.com/en-us/blog/crowdstrike-researchers-identify-hidden-vulnerabilities-ai-coded-software/ https://simple-bench.com/ Patreon Post: https://www.patreon.com/c/aiexplained/posts Robot: https://x.com/jloganolson/status/1985850115379351799
	Gemini Exponential, Demis Hassabis' ‘Proto-AGI’ coming, but …	19 Dec 2025	00:19:59
The condensed highlights of hours of AI lab leader interviews, model releases, Gemini 3 Flash insights (plus it’s hidden flaw), Hassabis’ ‘proto-AGI’ and much more… https://matsprogram.org/apply?utm_source=ai-explained&utm_medium=youtube&utm_campaign=s26 Also, do check out my new app: https://lmcouncil.ai Chapters: 00:00 - Introduction 00:50 - Results 02:44 - But… the Flaw 04:49 - So Benchmarks are fake? No 07:37 - Spatial Reasoning + Hassabis 10:06 - Proto-AGI 12:07 - Minimal AGI 15:07 - Compute Slowdown 17:56 - New Data Paradigm Gemini 3 Flash: https://deepmind.google/models/gemini/flash/ Hassabis Interview: https://www.youtube.com/watch?v=PqVbypvxDto Legg Interview: https://www.youtube.com/watch?v=l3u_FAv33G0 Pre-training Lead Interview: https://www.youtube.com/watch?v=cNGDAqFXvew Altman Interview: https://www.youtube.com/watch?v=2P27Ef-LLuQ Brockman Video: https://x.com/OpenAI/status/2001336514786017417 Post-Training Reveal: https://x.com/OfficialLoganK/status/2001742530472534442 Hallucinations Paper: https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf Patreon Hallucinations Vid: https://www.patreon.com/posts/blockers-to-and-139264812 AA-Omniscience Benchmark: https://artificialanalysis.ai/evaluations/omniscience https://arxiv.org/pdf/2511.13029 lmcouncil.ai/benchmarks https://simple-bench.com/ https://x.com/scaling01/status/1999620587744813205 5.2 Codex Drop: https://cdn.openai.com/pdf/ac7c37ae-7f4c-4442-b741-2eabdeaf77e0/oai_5_2_Codex.pdf OpenAI Compute Trend: https://www.theinformation.com/articles/openais-350-billion-computing-cost-problem?rc=sy0ihq Cramer Tweet/Response: https://x.com/BorisMPower/status/2001440650210976018 OpenAI Valuation: https://www.theinformation.com/articles/openai-discussed-raising-tens-billions-valuation-around-750-billion?rc=sy0ihq Indian Data: https://www.reuters.com/world/india/with-freebies-openai-google-vie-indian-users-training-data-2025-12-17/ TheInformation Data: https://x.com/theinformation/status/2001421225751351778 Genie 3: https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/ Sima 2: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/ Veo 3.1: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/ METR: https://metr.org/blohttps://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/2025-03-19-measuring-ai-ability-to-complete-long-tasks/ AI Insiders ($9!): https://www.patreon.com/AIExplained Non-hype Newsletter: https://signaltonoise.beehiiv.com/
	Anthropic: Our AI just created a tool that can ‘automate all white collar work’, Me:	14 Jan 2026	00:18:16
A new tool, with code written by an AI model, has gone omega-viral: Claude Cowork. But is the hype justified? What do the stats say on productivity? Where is the truth in a sea of noise? What is truth? Can we handle the truth? Where's Nemo? https://matsprogram.org/s26-aie Check out my new app! https://lmcouncil.ai AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:00 - Introduction 01:12 - Claude Cowork 06:48 - Productivity Speed-up + jobs 09:33 - Comparing Models 12:00 - Brittle AI Paper Cowork Intro: https://x.com/claudeai/thread/2010805682434666759 'All of it': https://x.com/bcherny/status/2010813886052581538 'AGI' Claims: https://x.com/deepfates/status/2004994698335879383 Douglas Interview: https://www.youtube.com/watch?v=TOsNrV3bXtQ&t=2313s Job Stats: https://www.oxfordeconomics.com/wp-content/uploads/2026/01/Evidence-of-an-AI-driven-shakeup-of-job-markets-is-patchy.pdf Amodei Prediction: https://fortune.com/2025/05/28/anthropic-ceo-warning-ai-job-loss/ GenAI Traffic: https://x.com/demishassabis/status/2009075877347512545 Illusion of Insight: https://arxiv.org/pdf/2601.00514 Entropy Exploration: https://arxiv.org/pdf/2506.14758 ProRL: https://arxiv.org/pdf/2505.24864 Genesis Mission: https://www.whitehouse.gov/presidential-actions/2025/11/launching-the-genesis-mission/ https://deepmind.google/blog/how-were-supporting-better-tropical-cyclone-prediction-with-ai/ Non-hype Newsletter: https://signaltonoise.beehiiv.com/ Podcast: https://aiexplainedopodcast.buzzsprout.com/
	What the Freakiness of 2025 in AI Tells Us About 2026	23 Dec 2025	00:33:26
It’s probably not possible to satisfactorily condense a 12 month’s worth of weird progress in AI, as well as predictions for the year to come, into one video. But I’m gonna try anyway because it has been a very strange time. http://matsprogram.org/s26-aie My new app! https://lmcouncil.ai Patreon Interview: https://www.patreon.com/posts/robot-in-your-27-146376094 Chapters: 00:00 - Introduction 00:34 - Reasoning Models … and limits 02:54 - A playable world 03:36 - Realism 03:50 - AI Slop gone mainstream 05:03 - DolphinGemma 05:39 - Public Mood 07:34 - AI Enlisted 08:30 - GPT-5 11:05 - Open Weight not out 13:00 - METR Breakout 17:30 - VASA-1 18:28 - Lateral Productivity 20:15 - 1 or 1000 benchmarks needed? 24:54 - Continual Learning + Altman on Superintelligence 28:08 - Automated Information Discovery ft AlphaEvolve Hassabis on Generality: https://x.com/demishassabis/status/2003097405026193809 https://www.youtube.com/watch?v=PqVbypvxDto Gemini 3: https://storage.googleapis.com/gweb-uniblog-publish-prod/original_images/gemini_3_table_final_HLE_Tools_on.gif Reasoning Trade-offs: https://arxiv.org/pdf/2504.13837 DolphinGemma: https://blog.google/technology/ai/dolphingemma/?s=09 Genie 3: https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/ METR Time Horizon: https://arxiv.org/pdf/2503.14499 https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/ Flaws: https://x.com/ShashwatGoel7/status/2002369517499105443 https://shash42.substack.com/p/how-to-game-the-metr-plot https://x.com/METR_Evals/status/2002203627377574113 GPT-5 - Altman phd in everything: https://edition.cnn.com/2025/08/14/business/chatgpt-rollout-problems https://simple-bench.com/ AI Slop: https://www.youtube.com/watch?v=I_3vxoJDD9k https://www.theguardian.com/technology/2025/dec/16/boost-for-artists-in-ai-copyright-battle-as-only-3-per-cent-back-uk-active-opt-out-plan Survey: https://x.com/SearchlightInst/status/2001057144842387920/photo/1 Nvidia Nemotron: https://x.com/percyliang/status/2000608134205985169 OpenAI Compute Flywheel: https://x.com/OpenAI/status/2001363007209914399/photo/1 Altman Interview: https://www.youtube.com/watch?v=2P27Ef-LLuQ AI in Govt: https://x.com/jdcmedlock/status/1939814516503847259 Benchmark Gaming: https://techcrunch.com/2025/04/07/meta-exec-denies-the-company-artificially-boosted-llama-4s-benchmark-scores/ AlphaEvolve: https://deepmind.google/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/ https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/AlphaEvolve.pdf?utm_source=deepmind.google&utm_medium=referral&utm_campaign=gdm&utm_content= Continual Learning: https://abehrouz.github.io/files/NL.pdf Job Risk: https://archive.ph/20250708204527/https://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropic GPT4o: https://x.com/AISafetyMemes/status/1916889492172013989 Vasa-1: https://www.microsoft.com/en-us/research/project/vasa-1/ Three Views: https://www.lesswrong.com/posts/K2D45BNxnZjdpSX2j/ai-timelines Turing Test: https://x.com/tunguz/status/1907185471211422147 Karpathy Year in Review: https://karpathy.bearblog.dev/year-in-review-2025/ LLM Brainrot: https://arxiv.org/pdf/2510.13928 Lateral Productivity: https://www.aisi.gov.uk/frontier-ai-trends-report Emotional Quotient: https://arxiv.org/pdf/2511.08394 Non-hype Newsletter: https://signaltonoise.beehiiv.com/ Podcast: https://aiexplainedopodcast.buzzsprout.com/ AI Insiders ($9!): https://www.patreon.com/AIExplained
	Gemini 3.1 Pro and the Downfall of Benchmarks: Welcome to the Vibe Era of AI	20 Feb 2026	00:18:50
Do we have a new best AI model, or do we have the downfall of benchmarks in general, as a way of capturing machine intelligence? Full breakdown of Gemini 3.1 Pro, guest-starring the new Sonnet 4.6, plus analysis from 7 papers/posts that will give you much needed context. Oh, and a new record on Simple Bench! https://epoch.ai/ai-explained-datacenters Check out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.ai AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:00 - Introduction 00:30 - Post-training Dominance 04:00 - ARC-AGI 2 Caveat 05:54 - Simple Bench Record 08:22 - Hallucination Caveat 10:05 - Model Card 11:12 - Exponential Coming 12:20 - Amodei on Generalizing 15:10 - One True Benchmark? 17:02 - Other Metrics… Gemini 3.1 Model Card: https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-1-Pro-Model-Card.pdf Release: https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-pro/ Where are Agents deployed?: https://www.anthropic.com/research/measuring-agent-autonomy Newsletter Post: https://signaltonoise.beehiiv.com/p/4-ai-numbers-that-surprised-me-this-week Hallucination AA: https://artificialanalysis.ai/evaluations/omniscience Melanie Mitchell: https://x.com/MelMitchell1/status/2022738363548340526 ARC-AGI-2: https://x.com/arcprize/status/2024522812728496470/photo/1 Chollet on Agentic Coding and ML: https://x.com/fchollet/status/2024519439140737442 METR Caveat: https://metr.org/notes/2026-01-22-time-horizon-limitations/ Talaas Fast: https://chatjimmy.ai/ Amodei Interview Continual learning: https://www.dwarkesh.com/p/dario-amodei-2?open=false#%C2%A7002942-is-continual-learning-necessary-how-will-it-be-solved Metaculus FutureEval: https://www.metaculus.com/futureeval/ Next Vid to Watch: https://www.patreon.com/posts/what-you-need-to-150647292 Non-hype Newsletter: https://signaltonoise.beehiiv.com/ Podcast: https://aiexplainedopodcast.buzzsprout.com/
	The Two Best AI Models/Enemies Just Got Released Simultaneously	06 Feb 2026	00:19:49
The two models that you will hear discussed for at least the next two months - Claude Opus 4.6 and GPT 5.3 Codex - just got released within 26 mins or each other. The full breakdown of around 250 pages of reports, with just the most interest moments, from the battle of which is best, Claude personhood, the surprising misbehaviour of Opus 4.6, and much more https://assemblyai.com/aiexplained Check out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.ai AI Insiders ($9): https://www.patreon.com/AIExplained Chapters: 00:00 - Introduction 00:54 - Self-improvement? 02:44 - Knowledge Work 05:30 - Overly agentic behaviour 09:12 - Who Shouldn’t Use Claude Opus 11:39 - Step-change? 15:09 - Claude’s ‘Personhood’ Hassabis Roadmap: https://www.patreon.com/posts/hassabis-roadmap-149750869 Release of Opus 4.6: https://www.anthropic.com/news/claude-opus-4-6 212 Page System Card: https://www-cdn.anthropic.com/0dd865075ad3132672ee0ab40b05a53f14cf5288.pdf Claude Code Tip: https://x.com/bcherny/status/2019475897691124107 GPT Codex 5.3: https://openai.com/index/introducing-gpt-5-3-codex/ System Card: https://openai.com/index/gpt-5-3-codex-system-card/ Browse Comp: https://arxiv.org/pdf/2504.12516v1 Finance Agent: https://www.vals.ai/benchmarks/finance_agent Terminal Bench 2: https://arxiv.org/pdf/2601.11868 Vending Bench: https://andonlabs.com/blog/opus-4-6-vending-bench My X post: https://x.com/AIExplainedYT/status/2016851303436095647 Anthropic Apology: https://x.com/ch402/status/2014066134194995256/photo/1 Altman rebuttal: https://x.com/sama/status/2019139174339928189 https://x.com/sama/status/2019140276246442089 4% of GitHub: https://x.com/dylan522p/status/2019490550911766763 Non-hype Newsletter: https://signaltonoise.beehiiv.com/ Podcast: https://aiexplainedopodcast.buzzsprout.com/
	Claude AI Co-founder Publishes 4 Big Claims about Near Future: Breakdown	28 Jan 2026	00:22:12
Anthropic's CEO, who has consistently predicted transformative AI will arrive before 2030, recently published a nearly 20,000-word essay outlining his vision of where AI is heading. The video gives you the highlights. The essay argues that scaling and recursion will advance AI from coding automation to full engineering automation, while warning of economic displacement within 1-2 years and China's trajectory toward AI-enabled totalitarianism. Additionally, Dario Amodei predicts that AI models will increasingly be understood as collections of distinct personas rather than monolithic systems. 80,000 Hours: https://www.youtube.com/watch?v=B54EQiuO1UU Check out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.ai AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:00 - Introduction 01:10 - Scaling to software engineers 06:11 - Permanent Underclass 10:18 - Totalitarian Nightmares 16:38 - Collection of Personas Essay: https://www.darioamodei.com/essay/the-adolescence-of-technology Physics Prediction: https://www.quantamagazine.org/is-particle-physics-dead-dying-or-just-hard-20260126/ Axios: https://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropic World GDP: https://data.worldbank.org/indicator/NY.GDP.MKTP.KD.ZG?end=2024&start=1961&view=chart Demis Hassabis Counter: https://www.youtube.com/watch?v=q6fq4_uP7aM Karpathy 80%: https://x.com/karpathy/status/2015883857489522876 Machines of Loving Grace: https://www.darioamodei.com/essay/machines-of-loving-grace Anthropic LessWrong: https://www.lesswrong.com/posts/5aKRshJzhojqfbRyo/unless-its-governance-changes-anthropic-is-untrustworthy#1__In_private__Dario_frequently_said_he_won_t_push_the_frontier_of_AI_capabilities__later__Anthropic_pushed_the_frontier Original Constitution: https://www.anthropic.com/news/claudes-constitution New Constitution: https://www.anthropic.com/constitution Kimi K2.5: https://x.com/Kimi_Moonshot/status/2016024049869324599 Societies of Thought, Google DeepMind Paper: https://arxiv.org/pdf/2601.10825 https://lmcouncil.ai/benchmarks https://www.patreon.com/posts/our-new-age-of-133960279 Non-hype Newsletter: https://signaltonoise.beehiiv.com/ Podcast: https://aiexplainedopodcast.buzzsprout.com/
	Deadline Day for Autonomous AI Weapons & Mass Surveillance	27 Feb 2026	00:13:39
Will Anthropic be forced to make a version of Claude for war? And does a new paper expose the risks of Claude agents, in both OpenClaw and the field of war? Plus, 5 more twists in the story of the Pentagon versus Anthropic + some AI lab employees, and a petition that could change everything, or nothing... Check out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcouncil.ai AI Insiders ($9!): https://www.patreon.com/AIExplained Chapters: 00:00 - Introduction 00:44 - Deadline Day + Petition 02:42 - Twist 1: Existing Deal 03:26 - Twist 2: Existing Policy 04:21 - Twist 3: Twin Threats 05:54 - Twist 4: Interesting Objections 11:32 - Twist 5: Anthropic’s Dropped Policy Dario Statement: https://www.anthropic.com/news/statement-department-of-war Google/OpenAI Petition: https://notdivided.org/ Axios on Amodei Rejection: https://www.axios.com/2026/02/26/anthropic-rejects-pentagon-ai-terms FT on US Threat: https://www.ft.com/content/11d27612-d6c5-4cf7-94dd-f65603549b7f Politico on Latest: https://archive.ph/20260227013117/https://www.politico.com/news/2026/02/26/incoherent-hegseths-anthropic-ultimatum-confounds-ai-policymakers-00800135 The Verge on Current Deal: https://www.theverge.com/ai-artificial-intelligence/883456/anthropic-pentagon-department-of-defense-negotiations Anthropic RSP change: https://www.anthropic.com/news/responsible-scaling-policy-v3 Time Magazine on RSP: https://time.com/7380854/exclusive-anthropic-drops-flagship-safety-pledge/ Agent of Chaos Paper: https://x.com/NatalieShapira/status/2026062499599319526 AI Agent Reliability Paper: https://arxiv.org/pdf/2602.16666 My Patreon Video: https://www.patreon.com/posts/real-mystery-ai-151647211 Patreon Documentary: https://www.patreon.com/posts/our-new-age-of-133960279 Non-hype Newsletter: https://signaltonoise.beehiiv.com/ Podcast: https://aiexplainedopodcast.buzzsprout.com/

About us Privacy Policy