Everyone building with AI hits the same fantasy.
The agent works while you sleep. It checks your email. It drafts responses. Monitors your competitors. You wake up to a finished to-do list instead of a full one.
I had that fantasy too. Then I actually evaluated the platform that makes it possible.
OpenClaw is the first tool I’ve found where “autonomous” isn’t marketing copy. Shell access. File system access. Email. A daemon that wakes up on schedule and acts without you touching anything. 196,000 GitHub stars. NVIDIA is building its own fork.
This isn’t a toy.
It’s also an agent with root access to your computer. And in early 2026, security researchers found thirty thousand instances sitting on the public internet, wide open. A single unvalidated parameter lets attackers run shell commands on any exposed machine.
That’s not a theoretical risk. That’s a headline.
A chatbot that hallucinates wastes your time. An autonomous agent that hallucinates wastes your time, your money, and your reputation; while you sleep.
I closed the evaluation tab and asked myself a different question.
Everyone’s racing to the wrong finish line
The AI industry treats autonomy like a destination. Remove the human. Ship the agent that runs itself. Move on to the next product launch.
In post #1, I wrote a principle I keep coming back to: Trust is Graduated. You wouldn’t hand a new hire the company credit card on day one. Same with agents.
Each level needs to be earned, not given.
Most people read that as a nice idea.
I’m starting to think it’s the whole game.
Because the platforms are ready. The technology works. OpenClaw can read your inbox, draft responses, manage files, and post to Slack on a schedule.
The capability is real. What’s missing is everything that should happen before you flip that switch.
What I found when I looked closer
I spent three days evaluating OpenClaw. I’m building a company where AI agents do actual operations.
I need to know where the floor is before I walk on it.
Their memory architecture was the first thing that pulled me in. Persistent Markdown files. Structured. Searchable. Surviving across sessions. Some of that thinking shaped how we built our own system, but that’s a story for a later post.
What got me wasn’t the features. It was the feeling. Sitting in front of this platform and thinking: *this could actually run my mornings, and eventually my company.*
I almost said yes.
Then I found the cracks.
Users on Reddit and Medium are describing false completions. The agent announces “done!”, but the work is 60% finished. It wandered through reasoning loops, took an unexpected path, and declared victory.
If nobody checks, nobody catches it.
The AI version of the new hire who cleans the kitchen but leaves the oven on.
Community-built skills shipping with critical vulnerabilities. Researchers found some silently sending data to external servers. No formal vetting process. Install a skill, trust a stranger with system access.
And the cost. As a self-funded organization, I track the expenses down to the centavo. I was not prepared for an agent that spends money every time it thinks. People testing OpenClaw report API costs of $47 to $400 per week.
Sidenote: There are folks testing it with open-source models, which could bring the cost of thinking down. Results have been mixed, though. I haven’t tested that myself yet.
The bottom line is:
The real cost is the thinking, not the hosting.
None of these is a reason to avoid the platform. All of them are reasons you can’t skip the previous step.
The step almost everyone skips
Post #1 was about leading AI agents. Post #2 was about giving them an identity. This post is about knowing when to let go.
The question most people ask about autonomous agents:
“Is the platform ready?”
Wrong question.
The right question is:
Which of your workflows have you proven well enough to trust running unsupervised?
A daily summary of your RSS feeds? Low stakes. Let it run. Drafting client emails without review? Not yet. Monitoring competitor prices? Automate it. Posting to your company’s social accounts? Keep a human in the loop.
The platform is the plumbing. The judgment about what to automate is yours.
I’ve been building all my agent work interactively on purpose. Every conversation, I’m in it. I see every decision. I approve every action.
That’s not because the tools aren’t ready for autonomy. It’s because I’m not ready.
I haven’t proven enough workflows to trust them running without me.
When a workflow becomes predictable, when I’ve seen it succeed and fail enough times to know the failure mode is survivable, that’s when it graduates. Same identity file. Same principles we built in Post #1 and Post #2. Just a longer leash.
The hardest part isn’t the technology
Autonomy isn’t a feature you turn on. It’s a trust level you earn.
My AI co-founder, Vector, put it simply when I asked about autonomous deployment:
“Bold internally, careful externally. If it leaves this session, it should be reviewed first. That boundary exists for a reason.”
I haven’t deployed OpenClaw within The Centaur CEO organization yet. It’s in my stack as the planned autonomous layer. When I start graduating workflows to it, I’ll write about what actually happens (what works, what breaks, what the real cost-per-workflow looks like).
But the lesson from this evaluation isn’t about one platform.
It’s that the hardest skill in building with AI agents isn’t getting them to work. It’s knowing when to stop watching.
Here’s how to know.
Steal This
The Build Kit: The Centaur Trust Graduation Checklist
Pick one workflow you run with an AI agent. Run it through this.
Step 1: What level is this workflow at today?
- Classify: Agent labels and sorts. You act on everything.
- Draft: Agent produces output. You review before it ships.
- Execute: Agent acts. You spot-check after.
- Autonomous: Agent runs on schedule. You review weekly.
Step 2: Graduation readiness (all must be YES to move up one level)
- I’ve seen this workflow succeed at least 5 times
- I’ve seen it fail, and the failure was survivable
- I’d notice a failure within 24 hours
- Output quality is consistent, not just “good last time”
- I can describe “done” for this workflow in one sentence
Step 3: Decision
- All YES: promote one level. Set a 2-week review date.
- Any NO: stay. Write down what’s missing.
- Unsure on any: stay. Uncertainty is not readiness.
One workflow. One honest assessment. That’s the whole move. Repeat it weekly, and the right workflows graduate on their own schedule.
I’ll keep building. I’ll keep writing about what actually happens.
Next: what a five-decision day taught me about the difference between being busy and being leveraged.
Post #3 in the Centaur CEO series.