Not All Agent Work Is Equal. Here’s How to Tell the Difference

One of the most useful frameworks I’ve seen lately came from one of our most AI-forward engineers, Bian Jiang, at our weekly Engineering All Hands at Attentive

He shared his approach when using AI agents and how much care one should have about the outcome from their agent interactions. I’ve been thinking about it but hadn’t articulated it as clearly as he did. Not all agent interactions are equal, and the difference between good and bad outcomes often comes down to one thing: your self-awareness about what you know and what you don’t about what you ask the agent to do.

He categorized it into four types of scenarios.

✅ I know how to do it, but I don’t want to.#

This is the cleanest case. You understand the work deeply. You’re handing it to the agent because it’s tedious, time-consuming, or repetitive, not because you’re lost or don’t know how to do it.

The key here is that you can fully verify the output generated by the agent. You’re not guessing whether it’s right. You know.

Example

Running a database schema migration. You know the code that depends on the data and the data itself deeply. You can judge every line the agent changes. If it gets something wrong, you’ll catch it.

This is the highest-trust delegation scenario. The agent does the work. You verify and ship.

✅ I don’t know how to do it, but I want to learn.#

You’re not an expert yet, but you’re following along as the agent reasons and executes the task. Turning on reasoning and streaming is key here. You can see the agent “thinking” out loud. It is really cool.

This is one of the most underrated scenarios for working with agents. Instead of blocking on a knowledge gap or asking someone to do it for you, you let the agent do the work while you watch and learn.

Example

Updating a build pipeline to add a load testing verification step. You don’t know every detail yet, but you can follow what the agent is doing and learn as you go.

At the end of the task, you understand how it was done and you can then verify the output. But the bonus here is that it helped you learn by “doing it” with the agent.

✅ I don’t know how to do it, but I know good from bad when I see it.#

You’re not well versed in a certain domain, but you have taste and judgment about the output. That’s enough.

Example

I can’t use Figma to create designs. I don’t know how to build a nice logo or icon. But I know whether I like what the agent created and I can iterate on it.

This mode opens up huge surface area. Engineers who never touched design can quickly create something that is useful independently. Product managers who don’t write code can prototype interfaces. The bar isn’t expertise. It’s judgment and taste.

🛑 The one to avoid: I don’t know how to do it, and I can’t tell if the output is good or bad.#

This is the dangerous zone. Not because agents are unreliable, but because you have no way to catch it when they’re wrong.

Avoid

I don’t understand how the personalization service works. I asked the agent to add a new feature to it. I have no way to judge whether the code is correct or whether it breaks anything downstream.

In this mode, you are not supervising the agent. You are just hoping. That’s not agentic-first work. That’s flying blind and you are not being a great partner to your team, who will probably have to deal with the impact of it.

Impact Radius of a Mistake#

Even within those three safe categories, not all mistakes cost the same. So I am adding a second axis to Bian’s agentic scenarios: who gets impacted if the agent gets it wrong under your supervision?

What if it:

Breaks the agent user workflow or automation: this is very low stakes (if there is no data loss). They can revert, or go back to doing it manually. And try to fix it later.
Impacts an internal team: You need to understand the output well enough to roll it back. Have a plan to recover quickly.
Impacts many people internally: You need to be able to judge the agent’s output very well and whether it’s safe before merging it. And you need a tested rollback path, not just an idea of one.
Impacts external customers: If you can’t verify the output with confidence and you can’t test it, don’t do it. Full stop.

—

The exciting thing about agents is that they extend what’s possible for everyone. Not just the most senior engineers. Not just the people with the most time.

But that access comes with a responsibility: know which scenario you’re in before you delegate a task to an agent. Know what you can and can’t verify. And know who gets impacted if the agent gets it wrong.

Self-awareness is key here.

Thanks Bian for sharing your approach with us. 🙏