Why AlphaEvolve Is Already Obsolete: When AI Discovers The Next Transformer | Machine Learning Street Talk Podcast

March 15, 2026

Here's something that’ll blow your mind — AlphaEvolve, the AI framework everyone’s talking about, might already be outdated. According to /u/44th--Hokage, Robert Lange from Sakana AI highlights that systems like AlphaEvolve can only optimize fixed problems but struggle to innovate by themselves. That's where Shinka Evolve comes in, mixing large language models with evolutionary algorithms to actually invent new problems on the fly. It’s built with a clever architecture: islands of programs, LLMs as mutation tools, and a UCB bandit selecting between models like GPT-5 and Gemini. The results? State-of-the-art solutions, faster evaluations, even top finishes in programming contests. But here’s the catch — when LLMs run entirely on their own, they tend to repeat what they already know. As Robert points out, evolution doesn’t need to think outside the box — just recombine useful bits. Looking ahead, he predicts a seismic shift in scientific research, with AI uncovering ideas humans can’t even imagine yet. This is just the beginning.

Robert Lange, founding researcher at Sakana AI, joins Tim to discuss Shinka Evolve — a framework that combines LLMs with evolutionary algorithms to do open-ended program search. The core claim: systems like AlphaEvolve can optimize solutions to fixed problems, but real scientific progress requires co-evolving the problems themselves.

In this episode: - Why AlphaEvolve gets stuck: it needs a human to hand it the right problem. Shinka Evolve tries to invent new problems automatically, drawing on ideas from POET, PowerPlay, and MAP-Elites quality-diversity search.

The architecture of Shinka Evolve: an archive of programs organized as islands, LLMs used as mutation operators, and a UCB bandit that adaptively selects between frontier models (GPT-5, Sonnet 4.5, Gemini) mid-run. The credit-assignment problem across models turns out to be genuinely hard.
Concrete results: state-of-the-art circle packing with dramatically fewer evaluations, second place in an AtCoder competitive programming challenge, evolved load-balancing loss functions for mixture-of-experts models, and agent scaffolds for AIME math benchmarks.
Are these systems actually thinking outside the box, or are they parasitic on their starting conditions?: When LLMs run autonomously, "nothing interesting happens." Robert pushes back with the stepping-stone argument — evolution doesn't need to extrapolate, just recombine usefully.
The AI Scientist question: can automated research pipelines produce real science, or just workshop-level slop that passes surface-level review? Robert is honest that the current version is more co-pilot than autonomous researcher.
Where this lands in 5-20 years: Robert's prediction that scientific research will be fundamentally transformed, and Tim's thought experiment about alien mathematical artifacts that no human could have conceived.

Link to the Full Episode: https://www.youtube.com/watch?v=EInEmGaMRLc

Spotify

Apple Podcasts

submitted by /u/44th--Hokage
[link] [comments]

Audio Transcript

The architecture of Shinka Evolve: an archive of programs organized as islands, LLMs used as mutation operators, and a UCB bandit that adaptively selects between frontier models (GPT-5, Sonnet 4.5, Gemini) mid-run. The credit-assignment problem across models turns out to be genuinely hard.
Concrete results: state-of-the-art circle packing with dramatically fewer evaluations, second place in an AtCoder competitive programming challenge, evolved load-balancing loss functions for mixture-of-experts models, and agent scaffolds for AIME math benchmarks.
Are these systems actually thinking outside the box, or are they parasitic on their starting conditions?: When LLMs run autonomously, "nothing interesting happens." Robert pushes back with the stepping-stone argument — evolution doesn't need to extrapolate, just recombine usefully.
The AI Scientist question: can automated research pipelines produce real science, or just workshop-level slop that passes surface-level review? Robert is honest that the current version is more co-pilot than autonomous researcher.
Where this lands in 5-20 years: Robert's prediction that scientific research will be fundamentally transformed, and Tim's thought experiment about alien mathematical artifacts that no human could have conceived.

Link to the Full Episode: https://www.youtube.com/watch?v=EInEmGaMRLc

Spotify

Apple Podcasts

submitted by /u/44th--Hokage
[link] [comments]

View original article

0:00/0:00