Can an AI agent complete a task and still fail?

June 15, 2026

1:09

Can an AI agent complete a task and still fail?

Here's a surprising twist — an AI can finish a task but still be considered a failure. It’s not just about whether the job gets done, but *how* it’s done. According to /u/AccomplishedLeg1508, many discussions focus on completion, but they overlook the risks of unsafe or policy-violating actions. In their ACM CAIS 2026 paper, they introduce the idea of the 'Verifier Tax' — a way to differentiate between safe success, unsafe success, and outright failure. They studied tool-using language models using a two-layer verification system: quick deterministic checks first, then more nuanced, contextual ones with LLMs. The key insight? Verification helps make AI safer by catching unsafe actions — but it can also slow down progress, especially on longer tasks. So, here’s the real question — if an AI completes a task but breaches safety rules, is that really success? According to /u/AccomplishedLeg1508, it might be time to rethink what ‘success’ actually means in AI performance. If this trend continues, future AI design will have to prioritize safety as much as efficiency.

A lot of AI-agent discussions focus on whether the agent completed the task. But I think there is a missing category: the agent may complete the task, but do it in an unsafe or policy-violating way.

For example, an agent could finish the job but use the wrong tool, skip an approval step, expose private information, or take an action that should have been blocked.

In our ACM CAIS 2026 paper, we call this the Verifier Tax. The idea is to separate:

safe success
unsafe success
failure

We studied this in tool-using LLM agent scenarios using τ-bench and proposed a two-tier verification architecture: deterministic checks first, then an LLM-based verifier for more contextual cases.

The main takeaway: verification can make agents safer by reducing unsafe success, but it may also reduce task completion as tasks get longer.

Paper: https://dl.acm.org/doi/full/10.1145/3786335.3813160

Curious what people think: if an AI agent completes a task but violates a safety rule, should that count as success or failure?

submitted by /u/AccomplishedLeg1508
[link] [comments]

Audio Transcript