AIs can generate near-verbatim copies of novels from training data

February 24, 2026

Here's something that might surprise you — AI models can now produce near-perfect copies of bestselling novels, just from their training data. So, why does this matter? Well, it challenges the industry's claim that these systems don’t store copyrighted works. According to Melissa Heikkilä writing in the Financial Times, recent studies show that giants like OpenAI, Google, and Meta have a lot more of these works memorized than we thought. That’s a big deal because it means the classic defense — 'we don’t store copyrighted content' — starts to fall apart. Legal experts are warning that this could seriously impact ongoing copyright lawsuits, as it blurs the line between learning from data and copying it outright. Now, here’s where it gets interesting — if these models can generate verbatim copies, what does that mean for creators and publishers? As Heikkilä reports, we might need a whole new way to think about AI and intellectual property. Keep an eye on this space; it’s not going away anytime soon.

The world’s top AI models can be prompted to generate near-verbatim copies of bestselling novels, raising fresh questions about the industry’s claim that its systems do not store copyrighted works.

A series of recent studies has shown that large language models from OpenAI, Google, Meta, Anthropic, and xAI memorize far more of their training data than previously thought.

AI and legal experts told the FT this “memorization” ability could have serious ramifications on AI groups’ battle against dozens of copyright lawsuits around the world, as it undermines their core defense that LLMs “learn” from copyrighted works but do not store copies.

Read full article

Comments

Audio Transcript

A series of recent studies has shown that large language models from OpenAI, Google, Meta, Anthropic, and xAI memorize far more of their training data than previously thought.

Read full article

Comments

View original article

0:00/0:00