The attack on AI agents that no security tool catches

June 1, 2026

1:04

The attack on AI agents that no security tool catches

Here’s something that might change how you see AI security — most defenses are blind to a sneaky attack vector. According to /u/Turbulent-Tap6723, the real threat isn’t the obvious tricks like ignoring instructions. It’s a subtle, multi-message attack that unfolds over time, with each message looking innocent. The AI doesn’t flag anything until it’s too late — by message eight, it’s doing stuff it absolutely shouldn’t. That’s because most security tools evaluate messages one at a time — they don’t remember what came before. But here’s where it gets interesting: Bendex Arc, a new tool, tracks session behavior across multiple interactions, catching these creeping threats. ((slower)) As /u/Turbulent-Tap6723 points out, no security tool on the market really handles this yet. So, if you’re building or deploying AI agents, the key takeaway is: your defenses need to see the whole conversation, not just isolated messages. The future of AI security depends on catching these long game attacks early.

Been working on AI agent security for a while and the attack that concerns me most barely gets talked about.

Not the obvious stuff like “ignore previous instructions.” Those get caught. The scary one is when an attacker spreads the attack across multiple messages. Each message looks totally normal. The model sees nothing suspicious. But by message 8 it’s doing something it absolutely should not be doing.

Every security tool I’ve tested evaluates messages one at a time. None of them remember what happened three messages ago.

Built Bendex Arc to catch this. It tracks session behavior across turns instead of evaluating each message in isolation. Try it at https://bendexgeometry.com or red team it at https://web-production-6e47f.up.railway.app/demo

Curious if anyone building agents in production has actually hit this or tested against it.

submitted by /u/Turbulent-Tap6723
[link] [comments]

Audio Transcript

Been working on AI agent security for a while and the attack that concerns me most barely gets talked about.

Every security tool I’ve tested evaluates messages one at a time. None of them remember what happened three messages ago.

Curious if anyone building agents in production has actually hit this or tested against it.

submitted by /u/Turbulent-Tap6723
[link] [comments]

View original article

0:00/1:04