One thing that’s always felt weird to me about prompt injection defenses is that they usually evaluate one message at a time.
But a lot of the attacks I’m seeing don’t really work that way.
A webpage says something subtle. A tool result reinforces it. An email adds another nudge. Nothing looks obviously malicious on its own, but a few turns later the agent is heading somewhere it definitely shouldn’t.
That was the motivation behind Arc Gate.
Instead of looking at each message in isolation, it keeps track of what’s happening across the entire session. It also treats different sources differently. A system prompt, a user message, a webpage, and a tool output shouldn’t all have the same authority just because they ended up in the same context window.
The goal isn’t just to catch bad prompts. It’s to stop agents from taking actions based on instructions hidden inside untrusted data.
I’m curious whether other people building agents think this is the right direction, or if I’m overthinking a problem that existing approaches already solve.
[link] [comments]