Been working on AI agent security for a while and the attack that concerns me most barely gets talked about.
Not the obvious stuff like “ignore previous instructions.” Those get caught. The scary one is when an attacker spreads the attack across multiple messages. Each message looks totally normal. The model sees nothing suspicious. But by message 8 it’s doing something it absolutely should not be doing.
Every security tool I’ve tested evaluates messages one at a time. None of them remember what happened three messages ago.
Built Bendex Arc to catch this. It tracks session behavior across turns instead of evaluating each message in isolation. Try it at https://bendexgeometry.com or red team it at https://web-production-6e47f.up.railway.app/demo
Curious if anyone building agents in production has actually hit this or tested against it.
[link] [comments]