Back

Explore every episode of the podcast AI Explained Official Podcast

Dive into the complete episode list for AI Explained Official Podcast. Each episode is cataloged with detailed descriptions, making it easy to find and explore specific topics. Keep track of all episodes from your favorite podcast and never miss a moment of insightful content.

Rows per page:

1–50 of 56

TitlePub. DateDuration
Is GPT-5.1 Really an Upgrade? But Models Can Auto-Hack Govts, so … there’s that14 Nov 202500:18:26

A lot just got released in the last 36 hours, and it will all affect hundreds of millions of people. 10 details you would miss if you just read the headlines, from GPT 5.1 regressions, to how Claude hacked Govt Agencies, to SIMA 2, and Musical Turing Tests.

https://assemblyai.com/aiexplained

Chapters:
00:00 - Introduction

00:56 - GPT 5.1 Smarter?

01:47 - Some Regressions

03:22 - Sycophancy?

05:22 - Claude Auto-Hacking 

06:16 - Jailbreaking through Granularity

08:22 - This Will be Re-used

09:30 - Hallucinating Hacker

09:57 - Surprisingly Neutral Tone

12:18 - SIMA 2

14:10 - Alpha Parallels

17:24 - AI Music



GPT 5.1 Announcement: https://openai.com/index/gpt-5-1/

System Card: https://cdn.openai.com/pdf/4173ec8d-1229-47db-96de-06d87147e07e/5_1_system_card.pdf

Benchmarks: https://openai.com/index/gpt-5-1-for-developers/

Simple Bench: https://lmcouncil.ai/benchmarks


Auto-Hacking: https://x.com/AnthropicAI/status/1989033793190277618

https://www.anthropic.com/news/disrupting-AI-espionage

Report: https://assets.anthropic.com/m/ec212e6566a0d47/original/Disrupting-the-first-reported-AI-orchestrated-cyber-espionage-campaign.pdf



Sima 2 Announcement: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/

https://x.com/amoufarek/status/1988986075331858693

Scepticism: https://www.technologyreview.com/2025/11/13/1127921/google-deepmind-is-using-gemini-to-train-agents-inside-goat-simulator-3/

Voyager: https://voyager.minedojo.org/


Reuters Music: https://www.reuters.com/legal/litigation/are-you-listening-bots-survey-shows-ai-music-is-virtually-undetectable-2025-11-12/


Bubble or No Bubble, AI Keeps Progressing (ft. Relentless Learning + Introspection)10 Nov 202500:12:53

Don’t let headlines about bubbles distract you from the real avenues of progress being explored in AI every week, including what had been thought to be a long-term blocker - continual learning (learning on the fly). 

https://app.grayswan.ai/ai-explained

This, plus models introspecting (hesitate before you berate), Nano Banana 2 possibly spotted, Chinese imagen and more.

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
01:26 - Continual Learning (Nested Learning / HOPE)
07:00 - Introspection
10:54 - Image-Gen Progress

Nested Learning Post: https://research.google/blog/introducing-nested-learning-a-new-ml-paradigm-for-continual-learning/

Nested Learning Paper: https://abehrouz.github.io/files/NL.pdf

Original Titans Paper: https://arxiv.org/pdf/2501.00663

Siri News: https://www.bloomberg.com/news/articles/2025-11-05/apple-plans-to-use-1-2-trillion-parameter-google-gemini-model-to-power-new-siri

Introspection: https://www.anthropic.com/research/introspection

Full Paper: https://transformer-circuits.pub/2025/introspection/index.html#mechanisms

Earlier Work: https://www.anthropic.com/research/mapping-mind-language-model
https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html

Release Post: https://x.com/AnthropicAI/status/1983584136972677319

https://lmcouncil.ai 



Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Podcast: https://aiexplainedopodcast.buzzsprout.com/

When Will AI Models Blackmail You, and Why?24 Jun 202500:26:19

In the last few days Anthropic have released an impressive honest account of how all models blackmail, no matter what goal they have, and despite prompt warnings, and other preventions. But do these models *want* this?

Thanks to Storyblocks for sponsoring this video! Download unlimited stock media at one set price with Storyblocks: storyblocks.com/AIExplained


AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
01:20 - What prompts blackmail?
02:44 - Blackmail walkthrough 
06:04 - ‘American interests’
08:00 - Inherent desire?
10:45 - Switching Goals
11:35 - Murder
12:22 - Realizing it’s a scenario? 
15:02 - Prompt engineering fix?
16:27 - Any fixes?
17:45 - Chekov’s Gun
19:25 - Job implications
21:19 - Bonus Details

Report: https://www.anthropic.com/research/agentic-misalignment
30 Page Appendices: https://assets.anthropic.com/m/6d46dac66e1a132a/original/Agentic_Misalignment_Appendix.pdf
Announcement: https://x.com/AnthropicAI/status/1936144602446082431?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet
OpenAI Files: https://www.openaifiles.org/
Grok 4 News: https://x.com/RonFilipkowski/status/1936372579607912473
Claude 4 Report Card: https://www-cdn.anthropic.com/6be99a52cb68eb70eb9572b4cafad13df32ed995.pdf
New Apollo Research: https://www.apolloresearch.ai/blog/more-capable-models-are-better-at-in-context-scheming
Interesting Reflections: https://nostalgebraist.tumblr.com/post/785766737747574784/the-void


Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Apple’s ‘AI Can’t Reason’ Claim Seen By 13M+, What You Need to Know 12 Jun 202500:14:00

What to make of those headlines that AI can’t reason, seen by tens of millions? I cover the paper in layman’s terms, what it means and doesn’t mean, and what’s next. 

Thanks to Storyblocks for sponsoring this video! Download unlimited stock media at one set price with Storyblocks: https://storyblocks.com/AIExplained

Plus o3-pro and whether it is my current most-recommended model.

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
00:57 - Viral Post + Headlines
01:42 - Apple Paper Analysis
08:34 - But they do Hallucinate 
10:43 - Not Supercomputers
11:18 - o3 Pro and Recommendations 


13.7M Tweet: https://x.com/RubenHssd/status/1931389580105925115

Apple Paper: https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf

Guardian Article: https://www.theguardian.com/technology/2025/jun/09/apple-artificial-intelligence-ai-study-collapse

Lisan al Gaib post: https://x.com/scaling01/status/1931854370716426246

Multiplication: https://x.com/yuntiandeng/status/1836114401213989366

The Illusion of the Illusion of Thinking: https://drive.google.com/file/d/1Zx9ikRj0Enc3SB4wA9HlYIlpmO_8QiUO/view

Marcus: https://www.theguardian.com/commentisfree/2025/jun/10/billion-dollar-ai-puzzle-break-down

Prof Rao: https://x.com/rao2z/status/1927707640223719631

AI Job Headlines: https://www.nytimes.com/2025/06/11/technology/ai-mechanize-jobs.html
https://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropic

Sky News Story: https://news.sky.com/story/can-we-trust-chatgpt-despite-it-hallucinating-answers-13380975

Veo 3 Ad: https://x.com/Kalshi/status/1932891608388681791

Altman Essay: https://blog.samaltman.com/

o3 Original benchmarks: https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b8b6c44-acd6-43b3-b5c6-1a1d5c6c25e4_2486x1388.png

https://pbs.twimg.com/media/GfQ0bfcXQAAQt13.jpg

Alpha Evolve Video: https://www.youtube.com/watch?v=RH4hAgvYSzg

https://simple-bench.com/


Non-hype Newsletter: https://signaltonoise.beehiiv.com/

AI Accelerates: New Gemini Model + AI Unemployment Stories Analysed06 Jun 202500:16:41

There’s a new best language model, so let’s go through the up and downs of Gemini 2.5 Pro 06-05. Record-breaking common-sense, but dumb mistakes remain. And it’s not even their best model, which remains behind the scenes - Gemini 2.5 Ultra. Plus Sundar Pichai’s AGI date and an analysis of whether the current AI unemployment headlines are justified, and Elevenlabs v3.


https://emergentmind.com


AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
02:04 - Gemini 2.5 Ultra 
03:34 - Benchmarks
07:41 - AGI Date and Meaning Pichai
09:13 - Jobs and AI Unemployment Fears
15:28 - Elevenlabs v3

Sundar Pichai Fridman: https://www.youtube.com/watch?v=9V6tWC4CdFQ

Pichai More Jobs (until 2026 at least): https://www.techradar.com/pro/alphabet-ceo-sundar-pichai-says-ai-wont-lead-to-job-cuts-will-be-an-accelerator

Gemini Comparison: https://blog.google/products/gemini/gemini-2-5-pro-latest-preview/
https://x.com/viathebrink/status/1930733154203292121

https://simple-bench.com/

White Collar Bloodbath: https://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropic
https://fortune.com/2025/05/25/ai-entry-level-jobs-gen-z-careers-young-workers-linkedin/
https://www.nytimes.com/2025/05/19/opinion/linkedin-ai-entry-level-jobs.html
https://www.nytimes.com/2025/03/25/business/economy/white-collar-layoffs.html

College Unemployment: https://www.newyorkfed.org/research/college-labor-market/#--:explore:unemployment

New Scientist AI Hallucinaitons: https://www.newscientist.com/article/2479545-ai-hallucinations-are-getting-worse-and-theyre-here-to-stay/

Duolingo: https://fortune.com/2025/05/24/duolingo-ai-first-employees-ceo-luis-von-ahn/
Klarna: https://www.forbes.com/sites/quickerbettertech/2025/05/18/business-tech-news-klarna-reverses-on-ai-says-customers-like-talking-to-people/

Sholto Douglas: https://www.reddit.com/r/ClaudeAI/comments/1ktt1rb/anthropics_sholto_douglas_says_by_202728_its/

Figure 02: https://x.com/adcock_brett/status/1930693311771332853

Elevenlabs v3: https://www.youtube.com/watch?v=zv_IoWIO5Ek

Gemini Speech Generation: https://aistudio.google.com/generate-speech


Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Claude 4: Full 120 Page Breakdown … Is it the Best New Model?22 May 202500:19:04

Not only did I get early access and ran my own tests, as per the title I read both the 120 page Claude 4 Opus and Claude 4 Sonnet System Card, and 25 page report on ASL-3 being triggered, plus the 2 hour launch video, and surrounding coverage. Ft. coding tests, Simple, twitter controversies, deep alignment coverage, spiritual bliss and much more!


https://80000hours.org/aiexplained


Chapters: 

00:00 - Introduction
01:12 - 3 Quick Controversies

02:42 - Benchmark Results 

04:20 - 120 page Card 20 Highlights

10:07 - Coding Test
11:27 - Model Welfare and Spiritual Bliss

13:29 -  ASL-3

Claude Card: https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdf?s=09
ASL 3:https://www-cdn.anthropic.com/807c59454757214bfd37592d6e048079cd7a7728.pdf

Tweets: https://x.com/fish_kyle3/status/1925597284546629753

https://x.com/EMostaque/status/1925624164527874452?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet


Cursor Says State of the Art for Coding: https://x.com/cursor_ai/status/1925594428095561941


Benchmarks: https://www.anthropic.com/news/claude-4



Google Takes No Prisoners Amid Torrent of AI Announcements21 May 202500:17:07

Google just announced at least 12 things that are each worthy of a video, but here are the top I/O highlights. From Veo 3 to Deep Research now being useable, Deep Think breaking records to Gemini Diffusion, Gemini 2.5 Flash changing how AI is priced and GemmaVerse, SynthID Detector and Imagen 4. And even this intro is missing other announcements covered in the vid! And yes, they’ll be plenty of Veo 3 clips to enjoy…

https://80000hours.org/aiexplained

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
00:48 - Veo 3
02:10 - Gemini 2.5 Flash
03:13 - Universal Assistant
03:47 - Usage Skyrockets + OpenAI dig
04:51 - Gemini Pro Deep Think
06:21 - Overviews and AI Mode
07:26 - Deep Research Updates (new) + Jules 
08:53 - Make and Deploy Apps with Gemini
09:12 - Imagen 4 
10:00 - Gemini Diffusion
11:46 - Try It On
12:17 - SynthID Detector
13:30 - GemmaVerse, SignGemma, Gemma3n, medGemma
14:24 - Outro + Clips

Event: https://www.youtube.com/watch?v=o8NiE3XMPrM
Ntaive Audio: https://aistudio.google.com/generate-speech
Gemini Diffusion: https://deepmind.google/models/gemini-diffusion/#capabilities 
New Gemini 2.5 Flash: https://deepmind.google/models/gemini/flash/
SignGemma (See end of this vid): https://www.youtube.com/watch?v=GjvgtwSOCao
Deep Think: https://blog.google/technology/google-deepmind/google-gemini-updates-io-2025/#flash-improvements
Google Parallel Sampling: https://www.patreon.com/posts/next-level-good-127441188

Price Plans: https://blog.google/products/google-one/google-ai-ultra/
Imagen 4 Benchmarks: https://deepmind.google/models/imagen/
Jules: https://jules.google/
SynthID Detector: https://blog.google/technology/ai/google-synthid-ai-content-detector/
Veo 3 Benchmarks: https://deepmind.google/models/veo/evals/
MedGemma: https://deepmind.google/models/gemma/medgemma/
Build Apps: https://aistudio.google.com/apps


Non-hype Newsletter: https://signaltonoise.beehiiv.com/

AI Improves at Self-improving19 May 202500:17:41

AlphaEvolve is not the first system to exhibit self-improvement, but it may be the most impressive yet. AI is literally improving the hardware, architectures, data and training methods of AI itself. A deep dive into the paper, drawing on two previous interviews and 5 other papers. Plus a snippet on OpenAI’s new Codex system.

Gray Swan: http://app.grayswan.ai/ai-explained

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
00:27 - AlphaEvolve
05:23 - Limitation
06:10 - Achievements
08:21 - Future Improvements
13:30 - Quirks
16:34 - Final Thoughts

AlphaEvolve release: https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/

Paper: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/AlphaEvolve.pdf

Terence Tao Quote: https://mathstodon.xyz/@tao/114508029896631083

Nature Article: https://www.nature.com/articles/s41586-022-05172-4
MIT Article: https://www.technologyreview.com/2025/05/14/1116438/google-deepminds-new-ai-uses-large-language-models-to-crack-real-world-problems/
AI Co-Scientist: https://arxiv.org/pdf/2502.18864

OpenAI Codex: https://openai.com/index/introducing-codex/


70% of Pull Requests: https://x.com/slow_developer/status/1920920456393028027

Amodei Essay: https://www.darioamodei.com/essay/machines-of-loving-grace

OpenAI Jason Wei Tweet: https://x.com/_jasonwei/status/1923091260354531612

PromptBreeder: https://arxiv.org/pdf/2309.16797
DrEureka: https://arxiv.org/pdf/2406.01967

FT DeepMind: https://www.ft.com/content/4e497a91-670a-4f69-be4a-18e247daba3e



Non-hype Newsletter: https://signaltonoise.beehiiv.com/

o3 breaks (some) records, but AI becomes pay-to-win25 Apr 202500:14:33

A green card, o3 vs Gemini 2.5, 6 Benchmarks and a whole bunch of my thoughts on what on earth is happening in AI, from here to 2030. Plus, how AI is becoming pay-to-win, and why. Crazy times, 14 mins probably wasn’t enough.

https://app.grayswan.ai/ai-explained

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
00:33 - FictionLiveBench
01:37 - PHYBench
02:14 - SimpleBench
02:54 - Virology Capabilities Test
03:13 - Mathematics Performance
04:29 - Vision Benchmarks
05:43 - V* and how o3 works
06:44 - Revenue and costs for you
08:54 - Expensive RL and trade-offs 
09:40 - How to spend the OOMs
13:27 - Gray Swan Arena

Green Card: https://techcrunch.com/2025/04/25/an-openai-researcher-who-worked-on-gpt-4-5-had-their-green-card-denied/
PHYBench: https://arxiv.org/pdf/2504.16074Virologytest: https://www.virologytest.ai/
How o3 Vision Works: https://arxiv.org/pdf/2312.14135 https://x.com/sainingxie/status/1912570624523829573
Visual puzzles: https://neulab.github.io/VisualPuzzles/
Fiction Bench: https://x.com/ficlive/status/1912863028141244850
https://geobench.org/
https://simple-bench.com/
AIME 2025: https://openai.com/index/introducing-o3-and-o4-mini/
USAMO: https://x.com/mbalunovic/status/1914398518896193747
NaturalBench: https://linzhiqiu.github.io/papers/naturalbench/
Where’s Waldo: https://uk.pinterest.com/pin/492792384225896298/
IMO and AlphaProof:https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/
Crazy Revenue: https://www.theinformation.com/articles/openai-forecasts-revenue-topping-125-billion-2029-agents-new-products-gain?rc=sy0ihq
Number of Users: https://www.theinformation.com/briefings/googles-gemini-user-numbers-revealed-court?rc=sy0ihq
Subscriptions pay to win: https://www.forbes.com/sites/paulmonckton/2025/04/23/google-leak-reveals-new-gemini-ai-subscription-levels/
GPU Trade-offs: https://x.com/sama/status/1915098951067554030
RL Scale-up Amodei: https://www.darioamodei.com/post/on-deepseek-and-export-controls
Log-linear Returns: https://x.com/bobmcgrewai/status/1895228291981943265
2030 Scaling: https://epoch.ai/blog/can-ai-scaling-continue-through-2030
Model Size: https://x.com/slow_developer/status/1874554473256997201
Adam on AGI: https://x.com/TheRealAdamG/status/1913998366632968381
Papers on Patreon: https://arxiv.org/pdf/2502.01839
https://arxiv.org/pdf/2504.13837
Chollet Quote: https://x.com/fchollet/status/1912934762580447447
OpenSim: https://opensim.stanford.edu/


Non-hype Newsletter: https://signaltonoise.beehiiv.com/

o3 and o4-mini - they’re great, but easy to over-hype16 Apr 202500:14:24

Critical analysis of the two most powerful new models behind ChatGPT, o3 and o4-mini. Not just the system cards, benchmarks, and my own tests, but some you may not have seen before. Yes, they can whip up amazing front-end in a few seconds, but you always have to ask what is in their data. Either way, they prove the gains from RL are just beginning…

https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained

AI Insiders ($9!): https://www.patreon.com/AIExplained


Chapters:
00:00 - o3 and o4-mini


https://simple-bench.com/

Plus, Teams and Pro,  plus token count: https://x.com/btibor91/status/1912568994512662679

System Card: https://openai.com/index/o3-o4-mini-system-card/

Release Notes: https://openai.com/index/introducing-o3-and-o4-mini/

https://deepmind.google/technologies/gemini/pro/

https://x.com/DeryaTR_/status/1912558350794961168

https://x.com/polynoamial/status/1912564068168450396

API Pricing:https://openai.com/api/pricing/

https://aider.chat/docs/leaderboards/


Non-hype Newsletter: https://signaltonoise.beehiiv.com/

‘Speaking Dolphin’ to AI Data Dominance, 4.1 + Kling 2: 7 Developments Critically Analysed16 Apr 202500:20:09

This pod won’t just be about the release of GPT 4.1 in the last 48 hours, o3 build-up, Kling 2.0, a sneak-peak at the next OpenAI model, or even the new Dolphin language tool. It will be about 7 such stories that contextualise where we are in AI and what is happening.

https://www.emergentmind.com/


Chapters: 

00:00 - Introduction

00:30 - Kling 2.0

01:35 - GPT 4.1

05:25 - o3 Build-up

07:37 - ‘Product Company’

09:31 - Safe Superintelligence

10:54 - DolphinGemma

13:16 - Data Dominance?


Kling 2.0: https://app.klingai.com/global/release-notes


Dolphin Gemma: https://blog.google/technology/ai/dolphingemma/?s=09


https://openai.com/index/gpt-4-1/


OpenAI o3 Build-up The Information: https://www.theinformation.com/articles/openais-latest-breakthrough-ai-comes-new-ideas?rc=sy0ihq


Physical reasoning: https://x.com/a_karvonen/status/1911839968990814503


Fiction Live.bench: https://x.com/ficlive/status/1911853409847906626


Altman Ted: https://www.youtube.com/watch?v=5MWT_doo68k


https://simple-bench.com/try-yourself


https://aider.chat/docs/leaderboards/


4.5: https://www.youtube.com/watch?v=6nJZopACRuQ


Geospatial reasoning: https://research.google/blog/geospatial-reasoning-unlocking-insights-with-generative-ai-and-multiple-foundation-models/


Pioneers: https://x.com/OpenAIDevs/status/1910017976256119151

Evals: https://www.youtube.com/watch?v=scsW6_2SPC4

Anthropic Updates: https://www.bloomberg.com/news/articles/2025-04-15/anthropic-is-readying-a-voice-assistant-feature-to-rival-openai?srnd=phx-ai

https://x.com/sethsaler/status/1912188383457059301


https://techcrunch.com/2025/04/12/openai-co-founder-ilya-sutskevers-safe-superintelligence-reportedly-valued-at-32b/

https://ai.meta.com/blog/llama-4-multimodal-intelligence/

https://deepmind.google/technologies/gemini/pro/

https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/

https://blog.google/products/google-cloud/ironwood-tpu-age-of-inference/

OpenAI Documentary: https://www.patreon.com/posts/one-machine-to-121940490

AI CEO: ‘Stock Crash Could Stop AI Progress’, Llama 4 Anti-climax +‘Superintelligence in 2027’...07 Apr 202500:23:51

The latest on Llama 4, and whether it signals a slowdown in AI, or solid progress. Plus, a deep dive on that viral prediction of superintelligence by 2027, and Amodei’s cautionary words on what could stop AI progress in its tracks. o3 news, and more, as well.

Weights & Biases: https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained


DeepSeek Doc: https://www.patreon.com/posts/openai-is-not-r1-125869969

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
00:47 - Stock Crash 
02:28 - Llama 4
10:55 - o3 News
11:59 - OpenAI non-profit?
13:13 - AI 2027

Llama 4 Release: https://ai.meta.com/blog/llama-4-multimodal-intelligence/

Dario Amodei Comments: https://www.youtube.com/watch?v=esCSpbDPJik

Knowledge Cut-off: https://www.llama.com/docs/model-cards-and-prompt-formats/llama4_omni/

Aider Polyglot: https://aider.chat/docs/leaderboards/

Gemini 1.5: https://arxiv.org/pdf/2403.05530

Fiction-LiveBench: https://fiction.live/stories/Fiction-liveBench-Mar-25-2025/oQdzQvKHw8JyXbN87

OpenAI Valuation: https://www.nytimes.com/2025/03/31/technology/openai-valuation-300-billion.html?login=smartlock&auth=login-smartlock

OpenAI Cybersecurity: https://www.bloomberg.com/news/articles/2024-01-16/openai-working-with-us-military-on-cybersecurity-tools-for-veterans

Deep research System Card: https://cdn.openai.com/deep-research-system-card.pdf

https://openai.com/index/paperbench/

AI 2027: https://ai-2027.com/

METR Paper: https://arxiv.org/pdf/2503.14499

OpenAI non-profit: https://openai.com/index/nonprofit-commission-guidance/

NYT Piece: https://www.nytimes.com/2025/04/03/technology/ai-futures-project-ai-2027.html?unlocked_article_code=1.804._yKi.QhwOp15Q3tcU&smid=url-share&s=09

Kokotajlo predictions 2021: https://www.lesswrong.com/posts/6Xgy6CAf2jqHhynHL/what-2026-looks-like

https://simple-bench.com/


Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Podcast: https://aiexplainedopodcast.buzzsprout.com/

Sora 2 - It will only get more realistic from here01 Oct 202500:15:43

Sora 2 - the start of the infinite slop-feed or a key step to a generalist agent? Better than VEO 3 or over-hyped? I bring out 6 details you may have missed, contrast the announcement to Periodic Labs and even squeeze in some Claude Sonnet 4.5 analysis. Maybe I should make my videos longer…

https://80000hours.org/aiexplained

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
00:40 - Two models?
01:15 - Rollout Details
01:43 - Versus Sora 1 / Veo 3
04:30 - Sora App / Social Media
06:40 - Masterplan
09:30 - Generalist Agent? Periodic Labs
12:05 - Claude Sonnet 4.5
13:42 - Future Outlook

Announcement: https://openai.com/index/sora-2/
Launch Video: https://www.youtube.com/live/gzneGhpXwjU
System Card: https://cdn.openai.com/pdf/50d5973c-c4ff-4c2d-986f-c72b5d0ff069/sora_2_system_card.pdf
Sam Altman Blog Post on Sora App: https://blog.samaltman.com/sora-2

Most Intelligent Claim: https://x.com/willdepue/status/1973089331284681110
GTA: https://x.com/AndrewCurran_/status/1973298436536766666

Meta Vibes: https://x.com/alexandr_wang/status/1971295156411433228?s=46

Altman on Regulations: https://www.lesswrong.com/posts/5jjk4CDnj9tA7ugxr/openai-email-archives-from-musk-v-altman
OpenAI Profit: https://www.theinformation.com/articles/openais-first-half-results-4-3-billion-sales-2-5-billion-cash-burn?rc=sy0ihq

Periodic Labs: https://periodic.com/
https://www.nytimes.com/2025/09/30/technology/ai-meta-google-openai-periodic.html
https://x.com/LiamFedus/status/1973055380193431965
https://baincapitalventures.com/insight/we-must-know-we-will-know/?s=09

Sonnet 4.5: https://www.anthropic.com/news/claude-sonnet-4-5
https://simple-bench.com/


Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Podcast: https://aiexplainedopodcast.buzzsprout.com/

Gemini 2.5 Pro - It’s a Smart Chatbot … (New Simple High Score)28 Mar 202500:21:21

Gemini gets a new record on Simple Bench, and several other benchmarks. I’ll go deep to explore its nuances, including how it deceptively reverse engineers answers, does better on certain coding benchmarks than others, may have a universal ‘conceptual language’ …

https://weave-docs.wandb.ai/?utm_source=sponsorship&utm_medium=simple_bench&utm_campaign=ai_explained

… and more. Plus practical tips, a note on security and Kling vs Veo 2 guest appearance.


AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
00:36 - Fiction Bench
02:41 - Practicality - YouTube urls + Security - cut-off date
03:42 - Coding 
06:22 - WeirdML Bench
07:01 - Simple Bench Record High 
11:23 - Reverse Engineering!
13:22 - Anthropic Paper
17:49 - 3 Caveats

Gemini 2.5 Updated: https://deepmind.google/technologies/gemini/

Fiction Live Bench: https://fiction.live/stories/Fiction-liveBench-Feb-19-2025/oQdzQvKHw8JyXbN87

https://simple-bench.com/

WeirdML: https://htihle.github.io/weirdml.html
https://x.com/htihle/status/1905014058228625542

Anthropic Thoughts: https://www.anthropic.com/research/tracing-thoughts-language-model
https://transformer-circuits.pub/2025/attribution-graphs/biology.html#dives-cot

https://aistudio.google.com/prompts/new_chat

Search Study: https://www.cjr.org/tow_center/we-compared-eight-ai-search-engines-theyre-all-bad-at-citing-news.php

Live bench: https://livebench.ai/#/
Paper: https://arxiv.org/pdf/2406.19314

LiveCode Bench: https://livecodebench.github.io/

SWE-Verified: https://arxiv.org/pdf/2310.06770


Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Did AI Just Get Commoditized? Gemini 2.5, New DeepSeek V3, & Microsoft vs OpenAI25 Mar 202500:13:47

Gemini 2.5 is out, on the same day as the new DeepSeek V3 (which should power Deepseek R2). Do both models prove AI is being commoditized? Let’s find out, on this blockbuster day of AI releases. Plus exclusives from the Information, Simple indications, Vista Bench, LM Arena and more…

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters: 
00:00 - Introduction
01:15 - Gemini 2.5 Benchmarks
05:46 - Long Context, Simple indication
07:08 - New Deepseek V3 -024
09:11 - Microsoft MAI
11:48 - 90% of code but new Claude jobs

‘World’s most powerful model’: https://x.com/OfficialLoganK/status/1904580368432586975

Gemini 2.5 Release Notes: https://blog.google/technology/google-deepmind/gemini-model-thinking-updates-march-2025/#gemini-2-5-thinking

‘Commoditized’: https://the-decoder.com/microsoft-ceo-satya-nadella-says-ai-models-are-getting-commoditized/

Microsoft Information report: https://www.theinformation.com/articles/microsofts-ai-guru-wants-independence-from-openai-thats-easier-said-than-done?rc=sy0ihq

LMarena: https://x.com/lmarena_ai/status/1904581128746656099/photo/1

Free for now: https://x.com/btibor91/status/1904578053537476628

Vista Bench:https://scale.com/leaderboard/visual_language_understanding

DeepSeek V3: https://huggingface.co/deepseek-ai/DeepSeek-V3-0324

Claude Plays Pokemon: https://www.twitch.tv/claudeplayspokemon
Amodei: 100% Coding: https://www.youtube.com/watch?v=esCSpbDPJik&t=3017s

Anthropic Jobs: https://job-boards.greenhouse.io/anthropic/jobs/4020717008

Microsoft Money from Onslaught: https://www.972mag.com/microsoft-azure-openai-israeli-army-cloud/

https://simple-bench.com/

Release Date Comments: https://x.com/zacharynado/status/1904647277861318979


Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Manus AI - The Calm Before the Hypestorm … (vs Deep Research + Grok 3)13 Mar 202500:12:58

Is Manus AI the memecoin of the AI world, or legit? I’ll compare it to OpenAI’s Deep Research, Operator, Grok 3 DeepSearch and more to find out. I’ll also let you in on some of the secrets of what makes a good hype campaign, the estimated costs of Manus AI, and where it is strong. Other news (yes, Gemini image editing and research hacking, I mean you), will have to wait for a few more hours, as millions enquire about Manus AI.

https://app.grayswan.ai/arena

AI Insiders ($9!): https://www.patreon.com/AIExplained
Patreon Vid: https://www.patreon.com/posts/4-ai-trends-in-123857767

Chapters:
00:00 - Introduction
00:46 - Hype Campaign
02:40 - Single, Public Benchmark 
03:12 - What is Manus AI?
04:22 - Test 1
05:12 - Cost and Rate Limits
06:15 - Test 2 vs Deep Research + Grok 3 DeepSearch
08:24 - Test 3 (not AGI)
11:10 - 4 Trends in AI in 2025
11:37 - Hype Works

Manus AI: https://manus.im/app

Xiao Hong Interview: https://www.chinatalk.media/p/manus-chinas-latest-ai-sensation

Gaia Benchmark: https://openreview.net/pdf?id=fibxvahvs3
MIT Report: https://www.technologyreview.com/2025/03/11/1113133/manus-ai-review/

Information Report: https://www.theinformation.com/articles/anthropics-claude-drives-strong-revenue-growth-while-powering-manus-sensation?rc=sy0ihq

Hype Examples: https://x.com/Saboo_Shubham_/status/1898425707401031940
https://x.com/EHuanglu/status/1899110687902978373
https://x.com/AJs_AI/status/1898756132384178291

Mistakes: https://x.com/TheXeophon/status/1898737178273829220

Tools and Code: https://x.com/peakji/status/1898994802194346408

https://operator.chatgpt.com/




Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Podcast: https://aiexplainedopodcast.buzzsprout.com/

GPT 4.5 - not so much wow28 Feb 202500:25:05

GPT 4.5 is here, and do you remember when AI lab CEOs like Sam Altman and Dario Amodei were betting everything on scaling up base models like this one? Well let’s find out what would have happened if the future of AI rested on models like GPT 4.5. You’ll see all the benchmarks, highlights of the paper, emotional intelligence and humor tests, Simple Bench results (reddit was an unreliable source), and why it’s not all bad news for OpenAI.

https://www.emergentmind.com/

AI Insiders (now $9!): https://www.patreon.com/AIExplained

Chapters
00:00 - Introduction
01:04 - Details and Benchmarks
03:04 - Emotional intelligence? 
08:37 - Creative writing?
11:40 - Visual reasoning and Pricing
12:41 - Simple Performance
16:01 - End of Pretraining Scaling?
17:03 - CEO Hype
18:11 - System Card Highlights
23:32 - Karpathy Reaction

GPT 4.5 System card: https://cdn.openai.com/gpt-4-5-system-card-2272025.pdf
Release Notes: https://openai.com/index/gpt-4-5-system-card/
Altman Hype: https://x.com/sama/status/1891533802779910471
Details: https://openai.com/index/introducing-gpt-4-5/ https://x.com/OpenAI/status/1895219596317335792
End of an Era: https://x.com/wgussml/status/1895187231666774377
Anthropic Original Claim: https://techcrunch.com/2023/04/06/anthropics-5b-4-year-plan-to-take-on-openai/
Smell: https://x.com/rapha_gl/status/1895213014699385082
Bob McGrew: https://x.com/bobmcgrewai/status/1895228291981943265
Deep Research System Card: https://cdn.openai.com/deep-research-system-card.pdf
Reddit: https://www.reddit.com/r/singularity/comments/1izu1t7/gpt45_crushes_simple_bench/
API Pricing: https://openai.com/api/pricing/
LiveStream: https://www.youtube.com/watch?v=cfRYp0nItZ8&t=1s
https://simple-bench.com/


Karpathy Comparison: https://x.com/karpathy/status/1895213020982472863
https://x.com/karpathy/status/1895337579589079434


Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Claude 3.7 is More Significant than its Name Implies (ft DeepSeek R2 + GPT 4.5 coming soon)25 Feb 202500:27:39

Claude 3.7 is here, hot on the heels of Grok 3 and a host of other developments, but how good is it really? And what does it say about the next few months in AI? I’ve read the papers, played with the model for hours, and benched it on Simple. Things aren’t slowing down. Plus the latest in humanoid robots, led by Helix and freaked out by Protoclone. And reports of GPT 4.5 and DeepSeek R2.


GraySwan Competition! https://app.grayswan.ai/arena/challenge/agent-red-teaming

https://x.com/GraySwanAI/status/1894084923260043282


Chapters:

00:00 - Introduction

01:25 - Claude 3.7 New Stats/Demos 

05:22 - 128k Output

06:13 - Pokemon

06:58 - Just a tool? 

09:54 - DeepSeek R2

10:20 - Claude 3.7 System Card/Paper Highlights 

17:18 - Simple Record Score/Competition

20:37 - Grok 3 + Redteaming prizes

22:26 - Google Co-scientist

24:02 - Humanoid Robot Developments


3.7 Release Notes: https://www.anthropic.com/news/claude-3-7-sonnet

vs o3 and Grok 3: https://x.com/12exyz/status/1891723056931827959

Extended Thinking: https://www.anthropic.com/research/visible-extended-thinking?s=09

System Prompt: https://docs.anthropic.com/en/release-notes/system-prompts#feb-24th-2025

System Card: https://assets.anthropic.com/m/785e231869ea8b3b/original/claude-3-7-sonnet-system-card.pdf

Unfaithful CoT: https://arxiv.org/pdf/2305.04388

Original Constitution: https://www.anthropic.com/news/claudes-constitution

Responsible Scaling Policy: https://assets.anthropic.com/m/24a47b00f10301cd/original/Anthropic-Responsible-Scaling-Policy-2024-10-15.pdf

Amodei and Hassabis:https://www.youtube.com/watch?v=4poqjZlM8Lo

https://simple-bench.com/

400 Weekly Users: https://x.com/bradlightcap/status/1892579908179882057

Grok 3 Jailbroken: https://x.com/LinusEkenstam/status/1893832876581380280

Google Co-Scientist: https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/

But Hassabis Says Years Away: https://www.youtube.com/watch?v=yr0GiSgUvPU&t=156s

DeepSeek R2 Reuters: https://www.reuters.com/technology/artificial-intelligence/deepseek-rushes-launch-new-ai-model-china-goes-all-2025-02-25/

Protoclone: https://www.reddit.com/r/interestingasfuck/comments/1it9rpp/protoclone_the_worlds_first_bipedal/

Helix: https://www.figure.ai/news/helix

TechTrance: https://www.youtube.com/@TheTechTrance/videos

GPT 4.5 Soon:

AGI: (gets close), Humans: ‘Who Gets the Money?’11 Feb 202500:22:17

A 'frontier reasoning model' from just 1000 examples (s1). A $100B Musk bid for power. Gemini 2, Rand and warning from Amodei. Here’s 7-8 developments you may have missed but which I would argue help us understand how the next few years will play out. From labour vs capital to automating rival companies and countries, and from non-profit shenanigans to new mini-docs, there was just too much for me not to make a vid.

GiveWell: https://www.givewell.org/charities/top-charities

AI Insiders ($9!): https://www.patreon.com/AIExplained

s1 Paper: https://arxiv.org/pdf/2501.19393
Musk Bid: https://www.wsj.com/tech/ai/musks-97-4-billion-openai-bid-piles-pressure-on-altman-f6749e6c?mod=hp_lead_pos1
Altman Reply: https://x.com/sama/status/1889059531625464090?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet
Google vs OpenAI: https://x.com/sama/status/1888703820596977684
RAND Study: https://www.rand.org/pubs/perspectives/PEA3691-4.html
Dev Meetup: https://x.com/btibor91/status/1888976302621040852
Altman $100 Trillion: https://www.nytimes.com/2023/03/31/technology/sam-altman-open-ai-chatgpt.html
Karpathy Vid: https://www.youtube.com/watch?v=7xTGNNLPyMI
Amodei Warning: https://www.anthropic.com/news/paris-ai-summit
Bengio Source: https://www.youtube.com/watch?v=6HDjVncL5Go

Chapters:
00:00 - Intro
01:37 -  AGI Inches Closer
04:26 - ‘Super-Exponential’
05:58 - Musk Bid
07:34 - Luxury Goods and Land
09:05 - ‘Benefits All Humanity’
12:52 - ‘National Security’
14:21 - s1
20:33 - Final thoughts


Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Deep Research by OpenAI - The Ups and Downs vs DeepSeek R1 Search + Gemini Deep Research03 Feb 202500:18:32

12 hours ago Deep Research was unveiled, and I’ve tested it thoroughly, including vs Deepseek R1 with search, Gemini Deep Research and even R1 in Perplexity. It’s a notable step forward, with one big caveat. I’ll go through all the benchmark figures, my initial impression of the o3 model within, and much more.

Deep Research: https://openai.com/index/introducing-deep-research/

https://www.youtube.com/watch?v=YkCDVn3_wiw


GAIA Bench: https://openreview.net/forum?id=fibxvahvs3

https://openreview.net/pdf?id=fibxvahvs3

CodeELO:https://arxiv.org/pdf/2501.01257

CamelCamel:https://uk.camelcamelcamel.com/

Deepseek R1 with search: https://chat.deepseek.com/

https://arxiv.org/pdf/2501.12948

HaluBench: https://arxiv.org/pdf/2407.08488


Chapters:

00:00 - Introduction

01:06 - Powered by o3, Humanity’s Last Exam, GAIA

03:55 - Simple Tests 

06:00 - Good News vs Deepseek R1 and Gemini Deep Research

09:32 - Bad News on Hallucinations 

14:14 - What Can’t it Browse?

14:42 - For Shopping?

16:40 - Final thoughts



o3-mini and the “AI War”31 Jan 202500:15:21

o3-mini is here, and yes, I’ve read the paper in full - 2 hours after release, and even the post-launch Reddit AMA. Some epic details like a FrontierMath score that made me double-take, a likely new Cursor favorite, bio risk expertise and a cost-comparison with Deepseek R1., But does it perform on basic reasoning - let’s find out. Plus, arguably the bigger story - the increasingly frenetic rhetoric coming out of the West - and Dario Amodei and Alexandr Wang (CEOs of Anthropic and Scale AI respectively) in particular. The last thing we need is an “AI War”.


https://wandb.me/simple-bench


(Colab): https://colab.research.google.com/drive/1AVijcPnEkl8Gy_754XbRdG5m7Q5-9slg?usp=sharing


Chapters: 

00:00 - Introduction

00:45 - o3 mini

05:11 - First impressions vs Deepseek R1

07:21 - 10x Scale, o3-mini System Card, Amodei Essay, bitcoin wallets…

12:40 - Simple Competition Finale

13:03 - Clips and Final Thoughts on the “AI War”



O3-mini: https://openai.com/index/openai-o3-mini/

Paper: https://cdn.openai.com/o3-mini-system-card.pdf

Amodei Essay: https://darioamodei.com/on-deepseek-and-export-controls?s=09

FrontierMath wild stat:https://arxiv.org/pdf/2411.04872

Sam Altman Channels Napoleon: https://x.com/sama/status/1883185690508488934

Altman ‘pulls up releases’: https://x.com/sama/status/1884066337103962416

“AI War” by Wang: https://scale.com/blog/win-the-ai-war

Anthropic Original Views on Capabilities: https://www.anthropic.com/news/core-views-on-ai-safety

AI Insider Cost Comparison:https://x.com/arankomatsuzaki/status/1884676245922934788

Deepseek R1 Paper: https://arxiv.org/pdf/2501.12948

R1, o3-mini Price Comparison: https://techcrunch.com/2025/01/31/openai-launches-o3-mini-its-latest-reasoning-model/

Semianalysis on $1,3M deepseek salaries, and them falling behind as ‘the time gap to match US capabilities increases’: https://semianalysis.com/2025/01/31/deepseek-debates/

OpenAI Valuation: https://www.bloomberg.com/news/articles/2025-01-30/openai-in-talks-to-raise-funding-at-340-billion-value-wsj-says?srnd=phx-ai

Wang Clip: https://x.com/tsarnick/status/1867700453494206883

Amodei Clip: https://x.com/ai_ctrl/status/1884951111771001188

https://simple-bench.com/



Nothing Much Happens in AI, Then Everything Does All At Once24 Jan 202500:23:09

When it rains, it pours. OpenAI Operator tested and reviewed, with full paper analysis. Perplexity Assistant is useful. Then Stargate, is it all smoke and mirrors? Strong rumours of an o3+ model from Anthropic. Then a full breakdown of Deepseek R1, and what it’s training method says about the state of AI. It’s not open source BTW. Plus Humanity’s Last Exam, and Hassabis Accelerates his AGI timeline.

00:00 - Introduction

00:54 - OpenAI Operator

04:53 - Perplexity Assistant 

05:15 - StarGate

07:51 - Better than o3?

08:25 - DeepSeek R1 Analysis

12:12 - Training Secrets

15:19 - No More Process Rewarding ?

19:01 - Hassabis Timeline Accelerates

21:22 - Humanity’s Last Exam


https://app.grayswan.ai/arena/chat/harmful-ai-assistant

https://app.grayswan.ai/arena

https://openai.com/index/computer-using-agent/

System Prompt: https://github.com/wunderwuzzi23/scratch/blob/master/system_prompts/operator_system_prompt-2025-01-23.txt


OpenAI Operator: https://operator.chatgpt.com/

System Card: https://cdn.openai.com/operator_system_card.pdf


There is No Plan: https://x.com/jeffclune/status/1882120726339318007


Perplexity Assistant: https://x.com/perplexity_ai/status/1882466239123255686


Stargate: https://openai.com/index/announcing-the-stargate-project/

Labour goes to 0: https://moores.samaltman.com/

Larry Ellison AI Surveillance: https://x.com/TheChiefNerd/status/1882042989184430332

Amodei 1984: https://www.bloomberg.com/news/articles/2025-01-22/anthropic-ceo-says-openai-s-stargate-venture-seems-chaotic

Microsoft Hesitate: https://www.theinformation.com/articles/why-sam-altman-joined-forces-with-larry-ellison-and-took-a-step-back-from-microsoft?rc=sy0ihq


Dylan Patel o3+ for Anthropic: https://www.youtube.com/watch?v=7EH0VjM3dTk


Deepseek R1: https://arxiv.org/pdf/2501.12948

https://arxiv.org/pdf/2412.19437

Diagram: https://pbs.twimg.com/media/GhyQsM6WQAE7W52?format=jpg&name=large

https://simple-bench.com/

Process: https://x.com/sama/status/1664018190840614912

https://x.com/karpathy/status/1835561952258723930

https://openai.com/index/trading-inference-time-compute-for-adversarial-robustness/?s=09

Demis Interview: https://www.youtube.com/watch?v=yr0GiSgUvPU

Humanity’s Last Exam: 

https://agi.safe.ai/

https://x.com/DanHendrycks/status/1882481730671857815

https://www.nytimes.com/2025/01/23/technology/ai-test-humanitys-last-exam.html?s=09



Altman Expects a ‘Fast Take-off’, ‘Super-Agent’ Debuting Soon and DeepSeek R1 Out20 Jan 202500:13:11

OpenAI looks set to debut their Operator system, and some leaks are out. At the same time Deepseek R1 releases some numbers, and Sam Altman says he might have been wrong before, and now anticipates a 'fast take-off'.  Plus two papers to give you an idea of what a super-agent might be decent at doing, some more exclusive article analysis and much more. Who said anything else is happening today...

80,000 Hours Channel: https://www.youtube.com/channel/UCafjal1QYJ3rb0Y9xZk1Ezg
Spotify: https://open.spotify.com/show/2WzJwXWBDnn4iZ7odKwDib

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
01:13 - Pro Cost and OpenAI Operator
04:00 - Agent Benchmarks Being Targeted
07:48 - Fast Take-off, Altman
08:48 - Altman flip-flops
10:02 - Deepseek R1 First Reaction

Altman ‘100x expectations out of control’: https://x.com/sama/status/1881258443669172470
OpenAI Operator Table: https://x.com/btibor91/status/1881285255266750564
WebVoyager: https://arxiv.org/pdf/2401.13919
OSWorld: https://arxiv.org/pdf/2404.07972
Axios Exclusive 1 (SuperAgent): https://www.axios.com/2025/01/19/ai-superagent-openai-meta?s=09
Axios Exclusive 2: https://www.axios.com/2025/01/18/biden-sullivan-ai-race-trump-china
Deepseek R1 Numbers: https://x.com/deepseek_ai/status/1881318130334814301
Does 1.5B outperform 3.5 Sonnet on Math?: https://x.com/reach_vb/status/1881319500089634954
Deepseek R1 (deepseek-reasoner) Pricing: https://api-docs.deepseek.com/quick_start/pricing/
Altman Fast Takeoff: https://x.com/tsarnick/status/1879100390840697191
OpenAI Economic Blueprint: https://cdn.openai.com/global-affairs/ai-in-america-oai-economic-blueprint-20250113.pdf
Target is Long-horizon Tasks: https://x.com/karinanguyen_/status/1879576037249667520
Support Regulations: https://www.techemails.com/p/elon-musk-and-openai
https://www.nytimes.com/2023/05/16/technology/openai-altman-artificial-intelligence-regulation.html
Donation: https://qz.com/sam-altman-donate-million-zuckerberg-bezos-donald-trump-1851721035
Amodei on Regulations by 2025: https://www.youtube.com/watch?v=ugvHCXCOmm4
‘Feel the AGI’: https://x.com/polynoamial?lang=en
GPT-5 and o-series merger: https://x.com/sama/status/1880358749187240274
o1 Thinks in Chinese: https://techcrunch.com/2025/01/14/openais-ai-reasoning-model-thinks-in-chinese-sometimes-and-no-one-really-knows-why/



Non-hype Newsletter: https://signaltonoise.beehiiv.com/

OpenAI Tests if GPT-5 Can Automate Your Job - 4 Unexpected Findings26 Sep 202500:14:06

An OpenAI report released in the last 24 hours is the best look we have as to whether 2025 AI can automate your job. I’ll go through 4 unexpected findings, from which model is best at what, to practical tips and massive caveats. Plus UFC robots, radiologist essay, don’t trust videos and the blockers to the singularity. 


Gray Swan: https://app.grayswan.ai/ai-explained



GDPval: https://cdn.openai.com/pdf/d5eb7428-c4e9-4a33-bd86-86dd4bcf12ce/GDPval.pdf


[GDP Impact: https://fred.stlouisfed.org/release/tables?rid=331&eid=211

Task List: https://www.onetonline.org/link/summary/11-9141.00

Summer Tweet: https://x.com/LHSummers/status/1971252567981146347

Emad: https://x.com/EMostaque/status/1971254153067593739


Robots: https://x.com/cixliv/status/1967663286679478759

Unitree G1: https://x.com/UnitreeRobotics/status/1970039940022239491


Don’t Trust Video: https://x.com/AISafetyMemes/status/1970453369446871420


AGI Tweet: https://x.com/hyhieu226/status/1968378785709133915


Blockers to the Singularity: https://www.patreon.com/posts/blockers-to-and-139264812


Framework: https://gemini.google.com/share/f4b9c85a6ae9


METR Study (Dev Slowdown): https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/


Karpathy Tweet: https://x.com/karpathy/status/1971220449515516391

Radiology Essay: https://worksinprogress.co/issue/the-algorithm-will-see-you-now/


Chapters:

00:00 - Introduction

00:55 - OpenAI Report Summary

02:40 - Tipping Point Speed-up

04:11 - Better than Industry Experts?

06:33 - Big Caveat

11:10 - Karpathy and the Radiologist Analogy

13:30 - Outro

OpenAI Backtracks on Superintelligence + Altman Brings His Timeline Forward08 Jan 202500:23:41

Sam Altman unexpectedly brings his timelines to AGI forward, while OpenAI backtrack on superintelligence. None of these changes were heralded, but they are significant. Plus the new year brings new assessments of the true capability of models to automate 'large swathes of the economy'. I'll give my prediction on that front for 2025, announcement a new Simple Bench competition, and showcase Kling 1.6 vs Veo 2 vs Sora, and much more.

wandb.me/simple-bench

(Colab): https://colab.research.google.com/drive/1AVijcPnEkl8Gy_754XbRdG5m7Q5-9slg?usp=sharing


TheAgentCompany Paper: https://arxiv.org/pdf/2412.14161v1

Sam Altman Major Interview: https://www.bloomberg.com/features/2025-sam-altman-interview/?srnd=phx-ai

OpenAI Agent Coming Jan 2025: https://www.theinformation.com/articles/why-openai-is-taking-so-long-to-launch-agents?rc=sy0ihq

Altman Singularity: https://x.com/sama/status/1875603249472139576

Altman Original Timeline: https://www.youtube.com/watch?v=7dCPytNTnjk&t=621s

https://www.ft.com/content/34a7a082-e685-4e02-bca7-61ff89d99ed2

OpenAI Original Emails: https://www.lesswrong.com/posts/5jjk4CDnj9tA7ugxr/openai-email-archives-from-musk-v-altman-and-openai-blog

DeepMind Sky News 2014 Article: https://news.sky.com/story/google-buys-uk-intelligence-firm-deepmind-10419783

Altman Blog Reflections: https://blog.samaltman.com/reflections

OpenAI Changes Who Gets AGI: https://openai.com/index/why-our-structure-must-evolve-to-advance-our-mission/?s=09

OpenAI 5 Levels: https://www.bloomberg.com/news/articles/2024-07-11/openai-sets-levels-to-track-progress-toward-superintelligent-ai

Altman 2015: https://blog.samaltman.com/machine-intelligence-part-1

OpenAI React to Anthropic: https://www.theinformation.com/articles/how-anthropic-got-inside-openais-head?rc=sy0ihq

Microsoft $100B Definition: https://www.theinformation.com/articles/microsoft-and-openai-wrangle-over-terms-of-their-blockbuster-partnership?rc=sy0ihq
Epoch Scramble for Task Benchmark: https://x.com/tamaybes/status/1876692639363612919

GPQA Progress: https://epoch.ai/data/ai-benchmarking-dashboard

Task Length Crucial for ARC-AGI: https://anokas.substack.com/p/llms-struggle-with-perception-not-reasoning-arcagi

RL Environment Tweet: https://x.com/vedantmisra/status/1876327518157807990

Jason Wei Talk: https://www.youtube.com/watch?v=yhpjpNXJDco

Miles Brunda

o3 - wow21 Dec 202400:22:20

o3 isn’t one of the biggest developments in AI for 2+ years because it beats a particular benchmark. It is so because it demonstrates a reusable technique through which almost any benchmark could fall, and at short notice. I’ll cover all the highlights, benchmarks broken, and what comes next. Plus, the costs OpenAI didn’t want us to know, Genesis, ARC-AGI 2, Gemini-Thinking, and much more. 


FrontierMath: https://epoch.ai/frontiermath

https://arxiv.org/pdf/2411.04872

Chollet Statement:https://arcprize.org/blog/oai-o3-pub-breakthrough

MLC Paper: 

https://www.scientificamerican.com/article/new-training-method-helps-ai-generalize-like-people-do/?utm_campaign=socialflow&utm_source=twitter&utm_medium=social

AlphaCode 2: https://storage.googleapis.com/deepmind-media/AlphaCode2/AlphaCode2_Tech_Report.pdf

Human Performance on ARC-AGI: https://arxiv.org/pdf/2409.01374v1

Wei Tweet ‘3 months’:https://x.com/_jasonwei/status/1870184982007644614

Deliberative Alignment Paper: https://openai.com/index/deliberative-alignment/

Brown Safety Tweet: https://x.com/polynoamial/status/1870196476908834893

Swe-Bench Verified: https://openai.com/index/introducing-swe-bench-verified/

Amodei Prediction: https://x.com/OfirPress/status/1858567863788769518

David Dohan: 16 hours https://x.com/dmdohan/status/1870171404093796638

OpenAI Personal Writing: https://openai.com/index/learning-to-reason-with-llms/

https://simple-bench.com/

John Hallman Tweet: https://x.com/johnohallman/status/1870233375681945725


00:00 - Introduction

01:19 - What is o3?

03:18 - FrontierMath

05:15 - o4, o5

06:03 - GPQA

06:24 - Coding, Codeforces + SWE-verified, AlphaCode 2

08:13 - 1st Caveat

09:03 - Compositionality?

10:16 - SimpleBench?

13:11 - ARC-AGI, Chollet



Never Browse Alone? - Gemini 2 Live and ChatGPT Vision12 Dec 202400:13:40

The ‘Gemini 2 Era’ begins … with screen-sharing? But really, it’s a great free tool, for curiosity satisfying rather than bleeding-edge intelligence. I give you the benchmarks, the highlights and of course, the latest from OpenAI Advanced Voice Mode with Vision. 

Plus Deep Research in Gemini Advanced, Simple Bench updates, Santa and what might be for some of you Google’s deflating admission. 


00:00 - Introduction

00:38 - Live Interaction 

03:43 - Gemini 2.0 Flash Benchmarks 

05:10 - Audio and Image Output

06:38 - Project Mariner (+ WebVoyager Bench)

08:49 - But Progress Slowing Down?

10:43 - OpenAI Announcements + Games



https://aistudio.google.com/live

Gemini 2.0 Flash Benchmarks: https://deepmind.google/technologies/gemini/

Project mariner: https://deepmind.google/technologies/project-mariner/

WebVoyager: https://x.com/laurentsifre/status/1858918588683296875/photo/1

Gemini Game play: https://www.youtube.com/watch?v=IKuGNHJBGsc

Advanced Voice Mode OpenAI: https://www.youtube.com/watch?v=NIQDnWlwYyQ

https://simple-bench.com/

Claude Computer Use: https://docs.anthropic.com/en/docs/build-with-claude/computer-use

Oriol Vinyals Interview: https://www.youtube.com/watch?v=78mEYaztGaw&t=687s



Sora is Out, But is it a Distraction?10 Dec 202400:15:34

After a 10 month wait, OpenAI have released Sora to paying users. With just a prompt it can generate videos of up to 20 seconds in lower resolutions, and 10 seconds at 1080p if you can fork out $200/month. I’ve tested it and read the system card. The user interface is quite beautiful, even if the videos themselves operate until entirely new rules of physics. But I can’t help wondering if OpenAI want up to focus on releases like this, rather than some quietly broken promises. 



80,000 hours Website, Podcast + Channel: 

https://80000hours.org/

https://open.spotify.com/show/2WzJwXWBDnn4iZ7odKwDib https://www.youtube.com/@eightythousandhours/videos


https://openai.com/sora/


Sora Countries: https://help.openai.com/en/articles/10250692-sora-supported-countries

Sora Credits: https://help.openai.com/en/articles/10245774-sora-billing-credits-faq

https://runwayml.com/ and https://pika.art/home 


DeepMind Veo: https://deepmind.google/technologies/veo/


Sam Altman Ads as Last Resort: https://www.windowscentral.com/software-apps/openai-could-chase-intrusive-ads-as-last-resort


But OpenAI Considering Ads: https://www.inc.com/ben-sherry/is-openai-getting-into-the-advertising-business-the-company-is-sending-mixed-messages/91033533


OpenAI Backtracks on Microsoft AGI Clause: https://www.ft.com/content/2c14b89c-f363-4c2a-9dfc-13023b6bce65


As Microsoft Boast of Labor Savings: https://www.theinformation.com/articles/microsofts-new-sales-pitch-for-ai-spend-less-money-on-humans?rc=sy0ihq


OpenAI Military Pivot: https://www.technologyreview.com/2024/12/04/1107897/openais-new-defense-contract-completes-its-military-pivot/


Employees Have Doubts: https://www.washingtonpost.com/technology/2024/12/06/openai-anduril-employee-military-ai/?nid=top_pb_signin&arcId=KZIV7PLRHBCVNPAIAAAVUNRHIM&account_location=ONSITE_HEADER_ARTICLE



o1 Pro Mode – Full Analysis (plus o1 paper highlights)05 Dec 202400:16:43

Oh boy. o1 pro mode out on the same night as o1 full. I read the 49 page paper, ran my own tests, spent my fuel allowance on Pro Mode and will give you all the highlights. Suffice to say the story is not as simple as it first appears. 

Weights and Biases’ Weave: wandb.me/ai_explained

Plus, GPT-4.5? MLE Bench, Simple Update, Image Analysis and much more 

 

o1 System Card: https://cdn.openai.com/o1-system-card-20241205.pdf

Apollo Research: https://www.apolloresearch.ai/research/scheming-reasoning-evaluations

Altman Tweet: https://x.com/AnonCEOMakeItAi/status/1864763052622504344

ChatGPT Pro: https://openai.com/index/introducing-chatgpt-pro/

Tibor Blaho: https://x.com/btibor91/status/1864709670470066605

Simple-bench.com 

 

00:00 - Introduction

00:27 - ChatGPT Pro is $200

01:25 - OpenAI Benchmarks

03:20 - o1 System Card, o1 and o1 Pro Mode vs o1-preview

06:18 - Simple Bench surprising results on sample

08:31 - Weight & Biases

09:05 - Image Analysis Compared

12:51 - More Benchmarks and Safety

AI Breaks Its Silence: OpenAI’s ‘Next 12 Days’, Genie 2, and a Word of Caution05 Dec 202400:15:29

Calmest before the storm? Whatever analogy you want to use things had gotten quiet toward the end of 2024. But then tonight we got Genie 2, and a series of scheduled announcements from OpenAI. Sora is soon here, and o1, but I dive deeper into what it all means and whether reliability is on a path to being solved, ft: two recent papers. 

Assembly AI Speech to Text: https://www.assemblyai.com/?utm_source=youtube&utm_medium=influencer&utm_campaign=ai_explained 

Plus Kling Motion Brush, Simple Bench QwQ update and much more.


Genie 2: https://deepmind.google/discover/blog/genie-2-a-large-scale-foundation-world-model/

Jim Cramer: https://x.com/jimcramer/status/1864068878692675625

Give Us Full o1: https://x.com/tszzl/status/1863882905422106851

Verge Scoop: https://x.com/tomwarren/status/1864326361415925861

O1 Learning to Reason Benchmarks: https://openai.com/index/learning-to-reason-with-llms/

SIMA AI: https://arxiv.org/pdf/2404.10179

Genie Paper: https://arxiv.org/pdf/2402.15391

My Video on Genie: https://www.youtube.com/watch?v=gGKsfXkSXv8

Oasis Minecraft: https://x.com/risphereeditor/status/1852619965511204974

LLMs Procedural Knowledge Paper: https://arxiv.org/pdf/2411.12580

Bag of Heuristics Paper: https://arxiv.org/pdf/2410.21272

Jensen Huang Hallucinations: https://www.tomshardware.com/tech-industry/artificial-intelligence/jensen-says-we-are-several-years-away-from-solving-the-ai-hallucination-problem-in-the-meantime-we-have-to-keep-increasing-our-computation

DeepSeek Interview: https://www.chinatalk.media/p/deepseek-ceo-interview-with-chinas

Kling Motion Brush: https://klingai.com/image-to-video


Tim Rocktaschel Book: https://geni.us/ArtificialIntelligence


00:43 - OpenAI 12 Days, Sora Turbo, o1

03:06 - Genie 2

08:26 - Jensen Huang and Altman Hallucination Predictions

09:45 - Bag of Heuristics Paper

11:40 - Procedural Knowledge Paper
13:02 - AssemblyAI Universal 2

13:45 - SimpleBench QwQ and Chinese Models

14:42 - Kling Motion Brush



New Google Model Ranked ‘No. 1 LLM’, But There’s a Problem15 Nov 202400:15:19

A new and mysterious Gemini model appears at the top of the leaderboard, but is that the full story? I dig behind the headline to show you some anti-climactic results, give some context with leaks in the last 48 hours of diminishing returns to scaling, and add the response of Altman, OpenAI and co. The future is about to look a lot stranger...


80,000 hours Podcast and Channel: https://open.spotify.com/show/2WzJwXWBDnn4iZ7odKwDib
https://www.youtube.com/@eightythousandhours/videos          

 

You can now gift memberships to AI Insiders (my Patreon w/ exclusive vids, network): https://www.patreon.com/AIExplained/gift


‘There is no wall’: https://x.com/sama/status/1856941766915641580

https://x.com/vedantmisra/status/1857148554105544708

Gemini Ranking: https://lmarena.ai/?leaderboard

API not yet up: https://x.com/OfficialLoganK/status/1857106844805681153

‘Just Die Chat’: https://x.com/koltregaskes/status/1856754648146653428

Google CEO tweet: https://x.com/sundarpichai/status/1857114106928718329

Sutskever Quote: https://www.reuters.com/technology/artificial-intelligence/openai-rivals-seek-new-path-smarter-ai-current-methods-hit-limitations-2024-11-11/

Another OpenAI Staffer Leaves: https://x.com/RichardMCNgo/status/1856843040427839804

Bloomberg Report: https://www.bloomberg.com/news/articles/2024-11-13/openai-google-and-anthropic-are-struggling-to-build-more-advanced-ai?s=09

Noam Brown on what OpenAI Researchers Believe: https://x.com/polynoamial/status/1855037689533178289

Clive Chan: https://x.com/itsclivetime/status/1855704120495329667

Chollet Responds to Altman: https://x.com/fchollet/status/1857060079586975852

https://x.com/sama/status/1856940152460869718

Altman Emails: https://x.com/TechEmails/status/1857285960997712356

Change of Heart: https://sd11.senate.ca.gov/news/senator-wiener-responds-openai-opposition-sb-1047

Amodei on ‘Empirical Regularities’: https://lexfridman.com/dario-amodei-transcript/

Verge Report: https://www.theverge.com/2024/10/25/24279600/google-next-gemini-ai-model-openai-december

OpenAI Agents in January: https://www.bloomberg.com/news/articles/2024-11-13/openai-nears-launch-of-ai-agents-to-automate-tasks-for-users?srnd=phx-ai

Leak: ‘GPT-5 exhibits diminishing returns’, Sam Altman: ‘lol’ 10 Nov 202400:15:44

The last few days have seen two narratives emerge. One, derived from yesterday’s OpenAI leak in TheInformation, that GPT-5/Orion is a disappointment, and less of a leap than GPT-3 to GPT-4. The second comes from a series of 4 clips (shown in this video) from Sam Altman, regarding the ‘clear path’ to AGI. Let’s go beyond the headlines (and through papers like Frontier Math) to get closer to the ground truth…
 
 Plus Universal-2, Sora comments, Claude 3.5 Haiku SimpleBench update, and a great new AI video.


Assembly AI Speech to Text: https://www.assemblyai.com/?utm_source=youtube&utm_medium=influencer&utm_campaign=ai_explained 

 

00:39 – Bear Case, TheInformation Leak

04:01 – Bull Case, Sam Altman

06:20 – FrontierMath

11:29 – o1 Paradigm

13:11 – Text to Video Greatness and Universal-2 

 

TheInformation Leak: https://www.theinformation.com/articles/openai-shifts-strategy-as-rate-of-gpt-ai-improvements-slows?rc=sy0ihq

Noam Brown Replies: https://x.com/polynoamial/status/1855453104394637444

Sam Altman Y-Combinator Interview: https://www.youtube.com/watch?v=xXCBz_8hM9w&t=1556s

Altman Reply: https://x.com/sama/status/1855100359511097828

https://simple-bench.com/

FrontierMath Paper: https://arxiv.org/pdf/2411.04872

Frontier Math Blog Post: https://epochai.org/frontiermath

Tao: https://x.com/EpochAIResearch/status/1854996368814936250

MMLU Are We Done (cites me!): https://arxiv.org/pdf/2406.04127

Universal-2 https://www.assemblyai.com/research/universal-2

Noam Brown ‘We don’t know’: https://www.youtube.com/watch?v=Gr_eYXdHFis

Anthropic Founder Response: https://x.com/jackclarkSF/status/1855485569998217231

Sora (Runway Comment): https://x.com/c_valenzuelab/status/1855026417354129455

Sora New Vid: https://www.youtube.com/watch?v=_iETa2KDRuw

Darri3D Video: https://www.reddit.com/r/ChatGPT/comments/1gn0n3z/can_you/

ChatGPT with Search, Altman Answers Anything and Simple Bench Out01 Nov 202400:15:20

The Google destroyer, the Perplexity crusher? Or just hype? ChatGPT with Search is here, and simultaneously Altman and co did an AMA on Reddit, covering GPT-5, Sora, SearchGPT and a lot more. Plus, the biggest news of them all: Simple Bench is out.

ChatGPT with Search: https://openai.com/index/introducing-chatgpt-search/

Altman AMA (ask me anything): https://www.reddit.com/r/ChatGPT/comments/1ggixzy/ama_with_openais_sam_altman_kevin_weil_srinivas/

https://x.com/sama/status/1852041075793522911

Perplexity Ads: https://www.cnbc.com/2024/08/22/perplexity-ai-plans-to-start-running-search-ads-in-fourth-quarter.html

Perplexity: https://www.perplexity.ai/

https://simple-bench.com/

The New Claude 3.5 Sonnet: Better, Yes, But Not Just in the Way You Might Think28 Oct 202400:22:34

A new state of the art LLM (at least for creative writing and basic reasoning) but what lies behind the numbers that were put out? Is it for real, and are AI agents about to grab your mouse and shake your cursor?

Plus, results on my own Simple Bench, and new tools from Runway (Act-One), HeyGen (Zoom Calls) and an updated NotebookLM. AI, without the hype.

Weights and Biases' Weave: https://wandb.me/ai_explained

ChatGPT Will Guess your Age, Flirt if Asked, and Can Call the Cops 16 Sep 202500:11:31

Sam Altman, CEO of OpenAI, announced a set of new ‘protections’ and ‘privileges’ for ChatGPT users, requiring a significant amount of trust from users. From predicting your age based on your chat to calling law enforcement if you are at risk of harm, to allowing non-minors to flirt. But amidst all of these announcements, there are interview snippets you may have missed, as Altman dramatically revises his predictions of AI impact on jobs. Plus a Hassbis backtrack to boot.

https://80000hours.org/aiexplained


Calling the Cops: https://openai.com/index/teen-safety-freedom-and-privacy/


Age Prediction: https://openai.com/index/building-towards-age-prediction/


Not Everyone Will Agree: https://x.com/sama/status/1967955739911364693?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Etweet


Theory 1: NYT Lawsuit: https://openai.com/index/response-to-nyt-data-demands/


Theory 2: FTC Investigation into AI Companions: https://x.com/AndrewCurran_/status/1966167585994764743


YT Does the Same: https://www.cbsnews.com/news/youtube-ai-powered-technology-teen-users/


Carlsen Interview: https://www.youtube.com/watch?v=5KmpT-BoVf4


vs Senate Testimony (70% Jobs): https://www.youtube.com/watch?v=5CWVP8-XVjQ


Hallucinations Paper: https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf


Hassbis Quote 1: https://www.youtube.com/watch?v=toShbNUGAyo


vs Quote 2: https://www.youtube.com/watch?v=Kr3Sh2PKA8Y


An ‘AI Bubble’? What Altman Actually said, the Facts and Nano Banana26 Aug 202500:18:54

Wait, why did Sam Altman say AI was in a bubble? Or did he? Is it? 8 points for you to consider, before we all get distracted by Nano Banana.

Chapters:
00:00 - Introduction
01:14 - Sam Altman Clarification
02:30 - Media Calls a Bubble (for the tenth time)
03:40 - MIT and McKinsey Analysed
08:21 - Incremental Progress Deceptive
12:07 - Reasoning Breakthroughs
15:31 - CEOs might not know their products
17:25 - But did stocks go down?
17:31 - Media is Contradictory of course


https://donate.redcross.org.uk/appeal/gaza-crisis-appeal


Bubble about to burst: https://www.telegraph.co.uk/business/2025/08/20/ai-report-triggering-panic-and-fear-on-wall-street/

Nano Banana: https://blog.google/products/gemini/updated-image-editing-model/
https://ai.studio/banana

McKinsey Report: https://www.mckinsey.com/capabilities/quantumblack/our-insights/seizing-the-agentic-ai-advantage#/
https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai#/
Revenue: https://www.wsj.com/tech/ai/mckinsey-consulting-firms-ai-strategy-89fbf1be

MIT Report: https://mlq.ai/media/quarterly_decks/v0.1_State_of_AI_in_Business_2025_Report.pdf

Safe Superintelligence: https://techcrunch.com/2025/04/12/openai-co-founder-ilya-sutskevers-safe-superintelligence-reportedly-valued-at-32b/

Thinking Machines Lab: https://techcrunch.com/2025/07/15/mira-muratis-thinking-machines-lab-is-worth-12b-in-seed-round/

WSJ Prediction 2024: https://www.wsj.com/tech/ai/the-ai-revolution-is-already-losing-steam-a93478b1
WP Prediction 2023: https://www.washingtonpost.com/technology/2023/08/05/ai-hype-bubble-chatgpt/

Companies are Pouring Billions into AI: https://www.nytimes.com/2025/08/13/business/ai-business-payoff-lags.html

Consumer Surplus: https://www.wsj.com/opinion/ais-overlooked-97-billion-contribution-to-the-economy-users-service-da6e8f55
Figure AI robot: https://x.com/adcock_brett/status/1958193476639826383

GDP Bet: https://x.com/adamdangelo/status/1627726566259318784?lang=en

Genie 3 Immersion: https://x.com/holynski_/status/1953879983535141043

https://x.com/elonmusk/status/1953861448431718662
htttps://simple-bench.com
MMMU: https://mmmu-benchmark.github.io/#leaderboard 
Prophet Arena: https://www.prophetarena.co/leaderboard

NYT Jobs: https://www.nytimes.com/2025/08/19/opinion/ai-job-loss-deindustrialization.html

Dawn of Reasoning?: https://openreview.net/pdf?id=FkKBxp0FhR
vs :https://arxiv.org/pdf/2403.04121

ARC-AGI: https://arcprize.org/arc-agi/1/
https://x.com/fchollet/status/1870169764762710376?lang=en-GB

Turing Test: https://arxiv.org/pdf/2503.23674

Mathematics of Starvation: https://www.theguardian.com/world/2025/jul/31/the-mathematics-of-starvation-how-israel-caused-a-famine-in-gaza
https://donate.redcross.org.uk/appeal/gaza-crisis-appeal

https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

METR Interview: https://www.patreon.com/c/aiexplained/posts

AlphaEvolve: https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/
Paper: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/AlphaEvolve.pdf

Amodei: https://kantrowitz.medium.com/the-making-of-anthropic-ceo-dario-amodei-449777529dd6
https://www.theloganbartlettshow.com/archive/ep-82-dario-amodeis-ai-predictions-through-2030#:~:text=DARIO%3A%20I%20think%20our%20concern,being%20responsible%20to%20accelerate%20things
Unreleased OpenAI: https://x.com/alexwei_/status/1954966393419599962

VLMs Tricked: https://x.com/an_vo12/status/1943715159559545186



AI Insiders ($9!): https://www.patreon.com/AIExplained

GPT-5 has Arrived07 Aug 202500:15:01

GPT-5 will change how hundreds of millions of people use AI. Yes, you might have to forgive the chart crimes, the underwhelming livestream and Altman hype… But it’s a good model. I have read the 50 page system card in full, have the benchmark scores, coding tests, and things you might have missed.

https://app.grayswan.ai/ai-explained

Announcement: https://openai.com/index/introducing-gpt-5/

System Card: https://cdn.openai.com/pdf/8124a3ce-ab78-4f06-96eb-49ea29ffb52f/gpt5-system-card-aug7.pdf

Extra Paper: https://cdn.openai.com/pdf/be60c07b-6bc2-4f54-bcee-4141e1d6c69a/gpt-5-safe_completions.pdf

Altman tweet: https://x.com/sama/status/1953551377873117369

Livestream: https://www.youtube.com/watch?v=0Uu_VJeVVfo

METR Report: https://metr.github.io/autonomy-evals-guide/gpt-5-report/

ARC-AGI-2: https://x.com/fchollet/status/1953511631054680085

Claude Opus 4.1: https://www.anthropic.com/news/claude-opus-4-1

MMMU: https://mmmu-benchmark.github.io/

Cursor Praise: https://x.com/ryolu_/status/1953531724895596669


Genie 3: The World Becomes Playable (DeepMind)05 Aug 202500:11:54

Soon, anything will be playable. A photo becomes an interactive world, a selfie becomes a new game. Genie 3 from Google, debuting just 2 hours ago, is what I mean, and I have the full analysis, plus the pushback I gave the authors (will it really lead to reliable AI agents? Is that even the point?). You make your own mind up, but it’s certainly fascinating, and not to be overlooked in the week that will bring us GPT-5.

https://80000hours.org/aiexplained

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters: 
00:00 - Introduction
01:27 - Background and Access
04:58 - Caveats
07:24 - Demo
10:12 - Conclusion

Announcement: https://deepmind.google/discover/blog/genie-3-a-new-frontier-for-world-models/

Isaac Labs: https://developer.nvidia.com/isaac/lab

Genie 2 Coverage: https://www.youtube.com/watch?v=jIm2T7h_a0M

TED Talk Roblox: https://www.youtube.com/watch?v=-OAP0ho5AUg

DeepThink Post: https://www.patreon.com/posts/deep-ish-on-new-135688441

AI Insiders ($9!): https://www.patreon.com/AIExplained


Non-hype Newsletter: https://signaltonoise.beehiiv.com/

How Not to Read a Headline on AI (ft. new Olympiad Gold, GPT-5 …)21 Jul 202500:17:19

GPT-5 did what? OpenAI ahead of Google? There are 9 ways to misread the headlines of the last 48 hours, so this video is here to tell you what happened, sans sizzle. It’s been a fairly momentous last few days, so let’s dive in to the International Math Olympiad Gold, GPT-5 alpha release, whether mathematicians are out of jobs, and the white collar impact by year’s end.


Job Board: https://80000hours.org/aiexplained


New Documentary on Patreon: https://www.patreon.com/posts/our-new-age-of-133960279

Chapters: 
00:00 - Introduction
00:18 - AI > Mathematicians?

01:23 - OPENAI vs GOOGLE

02:42 - Irrelevant to Jobs or …

06:45 - White-collar jobs gone?

10:26 - AI is Plateauing?

12:00 - We Don’t Know the Details…

14:33 - GPT-5 alpha

14:54 - Nothing but Exponentials?

15:53 - No Impact?


Announcement: https://x.com/alexwei_/status/1946477742855532918


UCLA Math Prof: https://x.com/ErnestRyu/status/1946699302308635130


ChatGPT Agent: https://openai.com/index/introducing-chatgpt-agent/

Livestream: https://www.youtube.com/watch?v=1jn_RpbPbEc&t=796s
System Card: https://cdn.openai.com/pdf/839e66fc-602c-48bf-81d3-b21eacc3459d/chatgpt_agent_system_card.pdf


Jerry Tworek (OpenAI): https://x.com/MillionInt/status/1946556255490982022

https://x.com/MillionInt/status/1946558130906968330


Noam Brown Details: https://x.com/polynoamial/status/1946478249187377206


Trieu Tranh Retweet: https://x.com/Mihonarium/status/1946880931723194389


Neel Nanda: https://x.com/NeelNanda5/status/1946602953370173647


Terence Tao: https://mathstodon.xyz/@tao


Sam Altman: https://x.com/sama/status/1946569252296929727


METR Dev Study: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/


Ravid Schwatz: https://x.com/ziv_ravid/status/1946378712716562605


AlphaEvolve: https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/


https://simple-bench.com/


Meta Salary: https://www.tomshardware.com/tech-industry/artificial-intelligence/abel-founder-claims-meta-offered-usd1-25-billion-over-four-years-to-ai-hire-person-still-said-no-despite-equivalent-of-usd312-million-yearly-salary


$2k per month: https://www.theinformation.com/articles/openai-considers-higher-priced-subscriptions-to-its-chatbot-ai-preview-of-the-informations-ai-summit?rc=sy0ihq


Grok 4 - 10 New Things to Know10 Jul 202500:11:43

Grok 4 is here, but did you know these 10 things about the new model? From benchmark caveats to soloing science, $300 a month secrets to Grok 5 promises, here's 10 new things to know in just under 12 minutes.

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
00:22 - Benchmark Results
02:11 - Benchmark Caveats
02:59 - ARC-AGI 2 
03:35 - SimpleBench
04:49 - ‘Humanity’s Last Exam’
07:20 - SuperGrok Heavy Price
07:58 - API Price
08:12 - Grok 5, Gemini 3.0 Beta, GPT-5
09:12 - System Prompt Change + $1B a month, pollution
10:20 - Not soloing science, helping you solo code

Livestream: https://www.youtube.com/watch?v=1tQ_KrlHgfg&t=1s

Price: https://grok.com/#subscribe
https://x.com/ArtificialAnlys/status/1943166841150644622

Gemini DeepThink: https://blog.google/technology/google-deepmind/google-gemini-updates-io-2025/#deep-think

https://simple-bench.com/

ARC-AGI 2: https://x.com/arcprize/status/1943168950763950555

Humanity’s Last Exam: https://agi.safe.ai/

SmartGPT: https://www.youtube.com/watch?v=hVade_8H8mE

New Power Plant, 1m GPUs: https://www.tomshardware.com/tech-industry/artificial-intelligence/elon-musk-xai-power-plant-overseas-to-power-1-million-gpus

Gemini 3.0 beta: https://web.archive.org/web/20250709174548/https://github.com/google-gemini/gemini-cli/blob/b0cce952860b9ff51a0f731fbb8a7649ead23530/packages/cli/src/ui/utils/errorParsing.test.ts

Pollution: https://www.theguardian.com/technology/2025/apr/24/elon-musk-xai-memphis
https://www.youtube.com/watch?v=C8rU4dv2w8Q
https://www.youtube.com/watch?v=3VJT2JeDCyw

System Prompt: https://github.com/xai-org/grok-prompts/blob/535aa67a6221ce4928761335a38dea8e678d8501/ask_grok_system_prompt.j2

Burn Rate: https://www.bloomberg.com/news/articles/2025-06-17/musk-s-xai-burning-through-1-billion-a-month-as-costs-pile-up

Ron Johnson: https://x.com/jdcmedlock/status/1939814516503847259



Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Podcast: https://aiexplainedopodcast.buzzsprout.com/

Gemini 3 is Here: 11 Details You Might Have Missed19 Nov 202500:21:42

Gemini 3 Pro is out, and records fell like snowflakes in Svalbard. 

No long description, chapters or links today, huge technical difficulties, including with audio, so just want to publish asap.


https://app.grayswan.ai/ai-explained


https://lmcouncil.ai
AI Insiders ($9!): https://www.patreon.com/AIExplained



Non-hype Newsletter: https://signaltonoise.beehiiv.com/
Podcast: https://aiexplainedopodcast.buzzsprout.com/

GPT 5.2: OpenAI Strikes Back12 Dec 202500:17:41

Full GPT-5.2 breakdown - did OpenAI reclaim the crown? A story of tokens, time and cost, plus 9 details you wouldn’t get just from reading the headlines.

https://www.youtube.com/@eightythousandhours



AI Insiders ($9!): https://www.patreon.com/AIExplained
https://lmcouncil.ai

Chapters:
00:00 - Introduction
00:55 - Better than Human @ Professional Tasks?
04:42 - Test time Compute
07:05 - Benchmark Selection
09:32 - Simple Results + council comparison
13:01 - Long Context
13:52 - Self-Improvement
15:00 - 10 Years + New Models

Release Page: https://openai.com/index/introducing-gpt-5-2/

GPT 5.2 Benchmark Comparison: https://www.reddit.com/r/singularity/comments/1pka1y9/gpt52_all_20_benchmarks_rankings_and_pricing/
https://storage.googleapis.com/gweb-uniblog-publish-prod/original_images/gemini_3_table_final_HLE_Tools_on.gif
https://lmcouncil.ai/benchmarks

Charxiv: https://charxiv.github.io/#leaderboard

GDPval: https://arxiv.org/pdf/2510.04374
My vid: https://www.youtube.com/watch?v=oK5LxMaROSA

Kilpatrick: https://x.com/OfficialLoganK/status/1999270402712023158/photo/1

Noam Brown: https://x.com/polynoamial/status/1999189845164667132

New Model in New Year: https://www.theinformation.com/articles/openai-developing-garlic-model-counter-googles-recent-gains?rc=sy0ihq

10 Years of OpenAI: https://openai.com/index/ten-years/

GPQA: https://x.com/idavidrein/status/1841265634170278063

ARC-AGI 1-2: https://arcprize.org/arc-agi/2/

Sunday Robotics: https://x.com/tonyzzhao/status/1991204839578300813


Non-hype Newsletter: https://signaltonoise.beehiiv.com/


https://lmcouncil.ai

You Are Being Told Contradictory Things About AI: 8 examples05 Dec 202500:20:15

With headlines of an imminent job apocalypse, code red for ChatGPT and recursive self-improvement, at the same time as Anthropic's CEO yesterday saying we know how to scale to AGI, and Gemini 3 DeepThink out today, it is easy to get lost among the narratives and counter-narratives. So here are both, plus the facts behind them, for you to decide.


https://epoch.ai/data/data-centers

Epoch AI is the sponsor of today’s video, and my views, and those expressed in this video, do not necessarily reflect Epoch AI’s views in any way.


Chapters: 
00:00 - Introduction
00:42 - Job Apocalypse?
01:45 - Scaling to AGI
04:15 - Recursive Self-Improvement Needed, or Not
09:57 - OpenAI Code Red vs Gemini 3 DeepThink vs Claude Opus 4.5
13:27 - DeepSeek Speciale vs Mistral Large v3
16:45 - Claude Soul Document

https://lmcouncil.ai/

AI Insiders ($9!): https://www.patreon.com/AIExplained



Guardian Interview: https://www.theguardian.com/technology/ng-interactive/2025/dec/02/jared-kaplan-artificial-intelligence-train-itself

MIT Study on Jobs/Tasks: https://iceberg.mit.edu/report.pdf
vs https://www.cnbc.com/2025/11/26/mit-study-finds-ai-can-already-replace-11point7percent-of-us-workforce.html

Amodei on Scaling: https://www.youtube.com/watch?v=FEj7wAjwQIk
Claude Soul Document: https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5-opus-soul-document

Capabilities Original Stance: https://www.anthropic.com/news/core-views-on-ai-safety

Ilya Interview: https://www.dwarkesh.com/p/ilya-sutskever-2

Ricursive Intelligence: https://x.com/RicursiveAI/status/1995932204703346946

Economist Worker Usage of GenAI: https://www.economist.com/finance-and-economics/2025/11/26/investors-expect-ai-use-to-soar-thats-not-happening#selection-1409.94-1413.42

Mistral v3 Large: https://docs.mistral.ai/models/mistral-large-3-25-12

Compute Slowdown Paper: https://joel-becker.com/images/publications/forecasting_time_horizon_under_compute_slowdown.pdf
https://x.com/joel_bkr/status/1993023436541903155

METR Chart: https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/

https://www.theinformation.com/articles/openais-350-billion-computing-cost-problem?rc=sy0ihq

OpenAI Code Red: https://www.anthropic.com/news/core-views-on-ai-safety
Rocket Company: https://www.independent.co.uk/news/world/americas/sam-altman-rocket-elon-musk-spacex-b2878351.html

DeepSeek Paper: https://arxiv.org/html/2512.02556v1

DeepSeek Crowdstrike CCP: https://www.crowdstrike.com/en-us/blog/crowdstrike-researchers-identify-hidden-vulnerabilities-ai-coded-software/

https://simple-bench.com/

Patreon Post: https://www.patreon.com/c/aiexplained/posts

Robot: https://x.com/jloganolson/status/1985850115379351799

Gemini Exponential, Demis Hassabis' ‘Proto-AGI’ coming, but …19 Dec 202500:19:59

The condensed highlights of hours of AI lab leader interviews, model releases, Gemini 3 Flash insights (plus it’s hidden flaw), Hassabis’ ‘proto-AGI’ and much more…

https://matsprogram.org/apply?utm_source=ai-explained&utm_medium=youtube&utm_campaign=s26  

Also, do check out my new app: https://lmcouncil.ai

Chapters: 
00:00 - Introduction
00:50 - Results
02:44 - But… the Flaw
04:49 - So Benchmarks are fake? No
07:37 - Spatial Reasoning + Hassabis
10:06 - Proto-AGI
12:07 - Minimal AGI
15:07 - Compute Slowdown
17:56 - New Data Paradigm

Gemini 3 Flash: https://deepmind.google/models/gemini/flash/

Hassabis Interview: https://www.youtube.com/watch?v=PqVbypvxDto
Legg Interview: https://www.youtube.com/watch?v=l3u_FAv33G0
Pre-training Lead Interview: https://www.youtube.com/watch?v=cNGDAqFXvew
Altman Interview: https://www.youtube.com/watch?v=2P27Ef-LLuQ
Brockman Video: https://x.com/OpenAI/status/2001336514786017417
Post-Training Reveal: https://x.com/OfficialLoganK/status/2001742530472534442

Hallucinations Paper: https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf
Patreon Hallucinations Vid: https://www.patreon.com/posts/blockers-to-and-139264812
AA-Omniscience Benchmark: https://artificialanalysis.ai/evaluations/omniscience
https://arxiv.org/pdf/2511.13029


lmcouncil.ai/benchmarks 
https://simple-bench.com/
https://x.com/scaling01/status/1999620587744813205

5.2 Codex Drop: https://cdn.openai.com/pdf/ac7c37ae-7f4c-4442-b741-2eabdeaf77e0/oai_5_2_Codex.pdf

OpenAI Compute Trend: https://www.theinformation.com/articles/openais-350-billion-computing-cost-problem?rc=sy0ihq

Cramer Tweet/Response: https://x.com/BorisMPower/status/2001440650210976018

OpenAI Valuation: ​​https://www.theinformation.com/articles/openai-discussed-raising-tens-billions-valuation-around-750-billion?rc=sy0ihq

Indian Data: https://www.reuters.com/world/india/with-freebies-openai-google-vie-indian-users-training-data-2025-12-17/

TheInformation Data: https://x.com/theinformation/status/2001421225751351778

Genie 3: https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/
Sima 2: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/
Veo 3.1: https://deepmind.google/blog/sima-2-an-agent-that-plays-reasons-and-learns-with-you-in-virtual-3d-worlds/

METR: https://metr.org/blohttps://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/2025-03-19-measuring-ai-ability-to-complete-long-tasks/


AI Insiders ($9!): https://www.patreon.com/AIExplained


Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Anthropic: Our AI just created a tool that can ‘automate all white collar work’, Me:14 Jan 202600:18:16

A new tool, with code written by an AI model, has gone omega-viral: Claude Cowork. But is the hype justified? What do the stats say on productivity? Where is the truth in a sea of noise? What is truth? Can we handle the truth? Where's Nemo?

https://matsprogram.org/s26-aie


Check out my new app! https://lmcouncil.ai

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters: 
00:00 - Introduction
01:12 - Claude Cowork
06:48 - Productivity Speed-up + jobs
09:33 - Comparing Models
12:00 - Brittle AI Paper

Cowork Intro: https://x.com/claudeai/thread/2010805682434666759

'All of it': https://x.com/bcherny/status/2010813886052581538

'AGI' Claims: https://x.com/deepfates/status/2004994698335879383

Douglas Interview: https://www.youtube.com/watch?v=TOsNrV3bXtQ&t=2313s

Job Stats: https://www.oxfordeconomics.com/wp-content/uploads/2026/01/Evidence-of-an-AI-driven-shakeup-of-job-markets-is-patchy.pdf
Amodei Prediction: https://fortune.com/2025/05/28/anthropic-ceo-warning-ai-job-loss/

GenAI Traffic: https://x.com/demishassabis/status/2009075877347512545

Illusion of Insight: https://arxiv.org/pdf/2601.00514
Entropy Exploration: https://arxiv.org/pdf/2506.14758
ProRL: https://arxiv.org/pdf/2505.24864

Genesis Mission: https://www.whitehouse.gov/presidential-actions/2025/11/launching-the-genesis-mission/
https://deepmind.google/blog/how-were-supporting-better-tropical-cyclone-prediction-with-ai/


Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Podcast: https://aiexplainedopodcast.buzzsprout.com/

What the Freakiness of 2025 in AI Tells Us About 202623 Dec 202500:33:26

It’s probably not possible to satisfactorily condense a 12 month’s worth of weird progress in AI, as well as predictions for the year to come, into one video. But I’m gonna try anyway because it has been a very strange time.

http://matsprogram.org/s26-aie


My new app! https://lmcouncil.ai


Patreon Interview: https://www.patreon.com/posts/robot-in-your-27-146376094

Chapters:
00:00 - Introduction
00:34 - Reasoning Models … and limits
02:54 - A playable world
03:36 - Realism
03:50 - AI Slop gone mainstream
05:03 - DolphinGemma
05:39 - Public Mood
07:34 - AI Enlisted
08:30 - GPT-5
11:05 - Open Weight not out
13:00 - METR Breakout
17:30 - VASA-1
18:28 - Lateral Productivity
20:15 - 1 or 1000 benchmarks needed?
24:54 - Continual Learning + Altman on Superintelligence
28:08 - Automated Information Discovery ft AlphaEvolve


Hassabis on Generality: https://x.com/demishassabis/status/2003097405026193809
https://www.youtube.com/watch?v=PqVbypvxDto

Gemini 3: https://storage.googleapis.com/gweb-uniblog-publish-prod/original_images/gemini_3_table_final_HLE_Tools_on.gif
Reasoning Trade-offs: https://arxiv.org/pdf/2504.13837

DolphinGemma: https://blog.google/technology/ai/dolphingemma/?s=09

Genie 3: https://deepmind.google/blog/genie-3-a-new-frontier-for-world-models/

METR Time Horizon: https://arxiv.org/pdf/2503.14499
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
Flaws: https://x.com/ShashwatGoel7/status/2002369517499105443
https://shash42.substack.com/p/how-to-game-the-metr-plot
https://x.com/METR_Evals/status/2002203627377574113

GPT-5 - Altman phd in everything: https://edition.cnn.com/2025/08/14/business/chatgpt-rollout-problems

https://simple-bench.com/

AI Slop: https://www.youtube.com/watch?v=I_3vxoJDD9k
https://www.theguardian.com/technology/2025/dec/16/boost-for-artists-in-ai-copyright-battle-as-only-3-per-cent-back-uk-active-opt-out-plan

Survey: https://x.com/SearchlightInst/status/2001057144842387920/photo/1

Nvidia Nemotron: https://x.com/percyliang/status/2000608134205985169

OpenAI Compute Flywheel: https://x.com/OpenAI/status/2001363007209914399/photo/1
Altman Interview: https://www.youtube.com/watch?v=2P27Ef-LLuQ

AI in Govt: https://x.com/jdcmedlock/status/1939814516503847259

Benchmark Gaming: https://techcrunch.com/2025/04/07/meta-exec-denies-the-company-artificially-boosted-llama-4s-benchmark-scores/

AlphaEvolve: https://deepmind.google/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/
https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/AlphaEvolve.pdf?utm_source=deepmind.google&utm_medium=referral&utm_campaign=gdm&utm_content=
Continual Learning: https://abehrouz.github.io/files/NL.pdf

Job Risk: https://archive.ph/20250708204527/https://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropic

GPT4o: https://x.com/AISafetyMemes/status/1916889492172013989

Vasa-1: https://www.microsoft.com/en-us/research/project/vasa-1/

Three Views: https://www.lesswrong.com/posts/K2D45BNxnZjdpSX2j/ai-timelines
Turing Test: https://x.com/tunguz/status/1907185471211422147

Karpathy Year in Review: https://karpathy.bearblog.dev/year-in-review-2025/

LLM Brainrot: https://arxiv.org/pdf/2510.13928

Lateral Productivity: https://www.aisi.gov.uk/frontier-ai-trends-report

Emotional Quotient: https://arxiv.org/pdf/2511.08394

Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Podcast: https://aiexplainedopodcast.buzzsprout.com/


AI Insiders ($9!): https://www.patreon.com/AIExplained

Gemini 3.1 Pro and the Downfall of Benchmarks: Welcome to the Vibe Era of AI20 Feb 202600:18:50

Do we have a new best AI model, or do we have the downfall of benchmarks in general, as a way of capturing machine intelligence? Full breakdown of Gemini 3.1 Pro, guest-starring the new Sonnet 4.6, plus analysis from 7 papers/posts that will give you much needed context. Oh, and a new record on Simple Bench!

https://epoch.ai/ai-explained-datacenters


Check out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.ai

AI Insiders ($9!): https://www.patreon.com/AIExplained


Chapters:
00:00 - Introduction
00:30 - Post-training Dominance
04:00 - ARC-AGI 2 Caveat
05:54 - Simple Bench Record
08:22 - Hallucination Caveat
10:05 - Model Card
11:12 - Exponential Coming
12:20 - Amodei on Generalizing
15:10 - One True Benchmark?
17:02 - Other Metrics…

Gemini 3.1 Model Card: https://storage.googleapis.com/deepmind-media/Model-Cards/Gemini-3-1-Pro-Model-Card.pdf

Release: https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-pro/

Where are Agents deployed?: https://www.anthropic.com/research/measuring-agent-autonomy

Newsletter Post: https://signaltonoise.beehiiv.com/p/4-ai-numbers-that-surprised-me-this-week

Hallucination AA: https://artificialanalysis.ai/evaluations/omniscience

Melanie Mitchell: https://x.com/MelMitchell1/status/2022738363548340526
ARC-AGI-2: https://x.com/arcprize/status/2024522812728496470/photo/1

Chollet on Agentic Coding and ML: https://x.com/fchollet/status/2024519439140737442

METR Caveat: https://metr.org/notes/2026-01-22-time-horizon-limitations/

Talaas Fast: https://chatjimmy.ai/

Amodei Interview Continual learning: https://www.dwarkesh.com/p/dario-amodei-2?open=false#%C2%A7002942-is-continual-learning-necessary-how-will-it-be-solved

Metaculus FutureEval: https://www.metaculus.com/futureeval/

Next Vid to Watch: https://www.patreon.com/posts/what-you-need-to-150647292



Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Podcast: https://aiexplainedopodcast.buzzsprout.com/

The Two Best AI Models/Enemies Just Got Released Simultaneously06 Feb 202600:19:49

The two models that you will hear discussed for at least the next two months - Claude Opus 4.6 and GPT 5.3 Codex - just got released within 26 mins or each other. The full breakdown of around 250 pages of reports, with just the most interest moments, from the battle of which is best, Claude personhood, the surprising misbehaviour of Opus 4.6, and much more

https://assemblyai.com/aiexplained

Check out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.ai

AI Insiders ($9): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
00:54 - Self-improvement?
02:44 - Knowledge Work
05:30 - Overly agentic behaviour
09:12 - Who Shouldn’t Use Claude Opus
11:39 - Step-change?
15:09 - Claude’s ‘Personhood’

Hassabis Roadmap: https://www.patreon.com/posts/hassabis-roadmap-149750869

Release of Opus 4.6: https://www.anthropic.com/news/claude-opus-4-6
212 Page System Card: https://www-cdn.anthropic.com/0dd865075ad3132672ee0ab40b05a53f14cf5288.pdf
Claude Code Tip: https://x.com/bcherny/status/2019475897691124107


GPT Codex 5.3: https://openai.com/index/introducing-gpt-5-3-codex/

System Card: https://openai.com/index/gpt-5-3-codex-system-card/

Browse Comp: https://arxiv.org/pdf/2504.12516v1
Finance Agent: https://www.vals.ai/benchmarks/finance_agent
Terminal Bench 2: https://arxiv.org/pdf/2601.11868
Vending Bench: https://andonlabs.com/blog/opus-4-6-vending-bench

My X post: https://x.com/AIExplainedYT/status/2016851303436095647

Anthropic Apology: https://x.com/ch402/status/2014066134194995256/photo/1

Altman rebuttal: https://x.com/sama/status/2019139174339928189
https://x.com/sama/status/2019140276246442089

4% of GitHub: https://x.com/dylan522p/status/2019490550911766763



Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Podcast: https://aiexplainedopodcast.buzzsprout.com/

Claude AI Co-founder Publishes 4 Big Claims about Near Future: Breakdown28 Jan 202600:22:12

Anthropic's CEO, who has consistently predicted transformative AI will arrive before 2030, recently published a nearly 20,000-word essay outlining his vision of where AI is heading. The video gives you the highlights. The essay argues that scaling and recursion will advance AI from coding automation to full engineering automation, while warning of economic displacement within 1-2 years and China's trajectory toward AI-enabled totalitarianism. Additionally, Dario Amodei predicts that AI models will increasingly be understood as collections of distinct personas rather than monolithic systems.

80,000 Hours: https://www.youtube.com/watch?v=B54EQiuO1UU



Check out my fast-growing (!) app, free to use, and code INSIDER15 for Pro: https://lmcouncil.ai

AI Insiders ($9!): https://www.patreon.com/AIExplained


Chapters:
00:00 - Introduction
01:10 - Scaling to software engineers
06:11 - Permanent Underclass
10:18 - Totalitarian Nightmares
16:38 - Collection of Personas

Essay: https://www.darioamodei.com/essay/the-adolescence-of-technology

Physics Prediction: https://www.quantamagazine.org/is-particle-physics-dead-dying-or-just-hard-20260126/

Axios: https://www.axios.com/2025/05/28/ai-jobs-white-collar-unemployment-anthropic

World GDP: https://data.worldbank.org/indicator/NY.GDP.MKTP.KD.ZG?end=2024&start=1961&view=chart

Demis Hassabis Counter: https://www.youtube.com/watch?v=q6fq4_uP7aM

Karpathy 80%: https://x.com/karpathy/status/2015883857489522876

Machines of Loving Grace: https://www.darioamodei.com/essay/machines-of-loving-grace

Anthropic LessWrong: https://www.lesswrong.com/posts/5aKRshJzhojqfbRyo/unless-its-governance-changes-anthropic-is-untrustworthy#1__In_private__Dario_frequently_said_he_won_t_push_the_frontier_of_AI_capabilities__later__Anthropic_pushed_the_frontier

Original Constitution: https://www.anthropic.com/news/claudes-constitution

New Constitution: https://www.anthropic.com/constitution

Kimi K2.5: https://x.com/Kimi_Moonshot/status/2016024049869324599

Societies of Thought, Google DeepMind Paper: https://arxiv.org/pdf/2601.10825

https://lmcouncil.ai/benchmarks

https://www.patreon.com/posts/our-new-age-of-133960279



Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Podcast: https://aiexplainedopodcast.buzzsprout.com/

Deadline Day for Autonomous AI Weapons & Mass Surveillance27 Feb 202600:13:39

Will Anthropic be forced to make a version of Claude for war? And does a new paper expose the risks of Claude agents, in both OpenClaw and the field of war? Plus, 5 more twists in the story of the Pentagon versus Anthropic + some AI lab employees, and a petition that could change everything, or nothing...


Check out my fast-growing (!) app, free to use, and code INSIDER15 for paid tiers: https://lmcouncil.ai

AI Insiders ($9!): https://www.patreon.com/AIExplained

Chapters:
00:00 - Introduction
00:44 - Deadline Day + Petition
02:42 - Twist 1: Existing Deal
03:26 - Twist 2: Existing Policy
04:21 - Twist 3: Twin Threats
05:54 - Twist 4: Interesting Objections
11:32 - Twist 5: Anthropic’s Dropped Policy


Dario Statement: https://www.anthropic.com/news/statement-department-of-war

Google/OpenAI Petition: https://notdivided.org/

Axios on Amodei Rejection: https://www.axios.com/2026/02/26/anthropic-rejects-pentagon-ai-terms

FT on US Threat: https://www.ft.com/content/11d27612-d6c5-4cf7-94dd-f65603549b7f

Politico on Latest: https://archive.ph/20260227013117/https://www.politico.com/news/2026/02/26/incoherent-hegseths-anthropic-ultimatum-confounds-ai-policymakers-00800135

The Verge on Current Deal: https://www.theverge.com/ai-artificial-intelligence/883456/anthropic-pentagon-department-of-defense-negotiations

Anthropic RSP change: https://www.anthropic.com/news/responsible-scaling-policy-v3

Time Magazine on RSP: https://time.com/7380854/exclusive-anthropic-drops-flagship-safety-pledge/

Agent of Chaos Paper: https://x.com/NatalieShapira/status/2026062499599319526

AI Agent Reliability Paper: https://arxiv.org/pdf/2602.16666

My Patreon Video: https://www.patreon.com/posts/real-mystery-ai-151647211

Patreon Documentary: https://www.patreon.com/posts/our-new-age-of-133960279 



Non-hype Newsletter: https://signaltonoise.beehiiv.com/

Podcast: https://aiexplainedopodcast.buzzsprout.com/

© My Podcast Data