The Era of Secret LLM Evals

March 4, 2026

Here’s something that might blow your mind — companies are secretly testing giant language models, and the stakes are huge. Byrne Hobart points out that now, firms are quietly evaluating these models without revealing their true capabilities, almost like a secret weapon. This 'hidden eval' game means companies can stay ahead without tipping their hand to rivals. So, why does this matter? Well, it’s about gaining an edge in AI dominance, without exposing what they’re really up to. Byrne highlights that this covert testing isn’t just about tech — it’s about controlling talent, commoditizing AI tools, and maintaining a competitive thermostat. Meanwhile, firms like Anthropic are experimenting with capacity management, trying to keep models responsive and efficient. And get this — by keeping their evals secret, companies can avoid giving away how they’re improving or what they’re planning next. It’s a sneaky, strategic move that could reshape how AI powers the future — and Byrne Hobart’s analysis makes it clear: the race is just heating up.

Screenshot-2022-02-17-at-12.51.50-1.png

Plus! Commoditizing the Complement; Talent Density; Thermostatic State Capacity and Anthropic; Merchant Banking; Negawatts

Audio Transcript

Screenshot-2022-02-17-at-12.51.50-1.png

Plus! Commoditizing the Complement; Talent Density; Thermostatic State Capacity and Anthropic; Merchant Banking; Negawatts

View original article

0:00/0:00