april 24, 2026 · security

Kage: A Pentest Sandbox for Agents

Kage is a sandbox that lets an AI agent run a full security engagement end-to-end: recon, testing, verification, filtering, and reporting, all inside a throwaway environment. The goal is to give the agent the same tools and workflow a human would use, but in a way that's safe to run anywhere, even on your laptop.

by Shadan · engineer, workers io

The first time I tried Kage on a very well-known AI company, it found an issue that earned me a bounty.

I just gave it a domain and let it run. When I came back, the report was waiting.

Kage had found a PyPI upload token sitting in a file in one of the company’s public repos. This is very serious, by the way.

A PyPI upload token is the key that lets anyone publish new versions of a public Python package. If an attacker gets that token, they can ship a malicious update that looks normal. Anyone who runs pip install next time gets the malware on their machine.

Before I even looked at the report, Kage had already done the heavy lifting. It checked the PyPI API to confirm the token was valid, generated a Python script to demonstrate that a malicious package could be published, assigned a severity score of 9.8 out of 10, and drafted a suggested fix.

My only input was the domain.

I don’t know why most companies don’t take security seriously. I’ve tested Kage on a bunch of them. Every single one had serious stuff: DB access, admin takeover, leaked tokens.

A single PyPI upload token is all it takes for an attacker to distribute malware to every user of that company’s package. It’s a good example of how much impact a single credential can have.

I mean, it’s good to ship features fast, but leak user data once and you’re done. Look at the last few months: Lovable, Next.js, Axios. Doesn’t matter if you’re a tiny startup or a Fortune 500.

That’s why we made Kage open-source and straightforward to use. You don’t need to be a security expert—just provide a target, let it run, and review the report when it’s done.

You can run Kage in your own environment whenever you ship new features. A penetration test is essentially a simulated attack on your application to identify vulnerabilities before an attacker does. Companies often pay security professionals for this kind of assessment, but with Kage, you can perform these checks yourself, right from your laptop.

Here’s how that PyPI report actually got built. Most of the work is done by the agent. The interesting part is the judge—the piece that throws away most of what the agent finds.

Why is this hard to hand off

Pentest tooling is a real mess. Most of it is built for Kali (a flavor of Linux that comes preloaded with hacker tools), written in six different programming languages, and you really don’t want any of it on your laptop.

Try installing wfuzz next to ffuf on the same machine. Add a clean Python install on top. You’ll be debugging dependencies before you even get started.

An even bigger challenge is the quality of the output. In one recent manual review I helped with, a static scanner generated 47 findings, but only three were legitimate issues. The rest of the effort was spent filtering out false positives.

If you give an agent the same toolbox and let it run, you get the same kind of output: fifty leads, three real ones, and no idea which three. A report like that is worse than no report at all. The next one just gets ignored.

Kage addresses both of these issues. Each run operates in its own isolated Kali container, so any tool conflicts are contained. Findings are processed through a pipeline that verifies, filters, and ranks them before they are included in the final report.

The judge is where Kage earns its keep.

Every finding the agent produces gets four questions, in order. If anyone fails, the finding is dropped before it’s even scored.

Is there a real attack path? Not “this could in theory.” A specific action an attacker takes that gets them to a place they shouldn’t be.

Can the attacker actually reach it? If the bug sits behind an auth they don’t have, or an edge rule that blocks the request, it doesn’t matter how interesting it looks.

Is something already stopping it? A working CSRF token, a rate limit, and input sanitization that already fires. If the defense is in place, the bug has nothing to do.

Would a stranger believe the writeup? I read every report assuming the reader hasn’t been in our standup. The judge does the same. If the writeup needs context that only the agent has, it gets cut.

The rule is to drop, not to downgrade. A weak High that turns into a Low is still a weak finding. Weak findings diminish trust, and after a noisy report, the next one gets less attention.

Every drop is logged with a one-line reason. If you ever wonder why a borderline finding didn’t make the cut, the answer is right there. It’s the first file I open.

What a run actually does

A run has five stages.

It starts with recon. Kage maps the target: which subdomains exist, which endpoints are live, and which code has been pushed to public repos. Just looking, no probing yet.

Then the specialists go to work. Different agents fan out in parallel, each looking for a specific kind of bug — broken auth, broken access control, injection, business logic flaws. They overreport on purpose. Filtering comes later.

Every promising lead becomes a small repro script: a few lines of code that try to recreate the bug. The script runs three times in clean sessions. Same outcome three times, or the lead gets dropped. Inconsistent results are the easiest way to spot a false positive.

Then the judge filters what survives.

Then the report gets written.

Testers are told to overreport. Filtering happens later, in the judge, where it’s deliberate and logged. I don’t want a tester quietly dropping something just because it didn’t feel strong.

Where chains come in

Take CSRF — a bug that tricks your browser into submitting a form on your behalf. Forgettable on its own. Put it on a password-change endpoint that doesn’t ask for your old password, though, and that’s account takeover.

Take an open redirect — a URL that bounces visitors wherever it’s told to bounce them. Just noise alone. Drop it inside an OAuth login flow (the dance behind “Sign in with Google”), and now you’re stealing session tokens.

Before the judge looks at anything one at a time, another agent walks through the verified findings and tries to combine them.

Some of this is pattern matching. An open redirect that lets you steal an OAuth token. A server tricked into hitting its own cloud metadata. A stored script that runs the next time an admin opens the page.

Most of it is just asking, for each finding, what an attacker can do next.

When a chain fits, the pieces collapse into one finding. Three mediums become one critical. One story. One impact that fits what would actually happen.

How you talk to it

You pick one of three modes.

Black box. Just a URL. Kage works the target from outside, the way an external attacker would.

Grey box. A URL plus two accounts with different roles. Now Kage can check access control as real users see it. Can the regular account reach an admin action? Can one user access another’s data?

White box. Add the source. Before any traffic goes out, Kage reads the code for trust boundaries, auth flows, and sensitive endpoints, then briefs each tester with that context.

The pipeline is the same in all three modes. Only the briefing changes. The more Kage knows up front, the sharper the findings.

Each run lives in its own Kali container, tied to the working directory. Same directory, same container. Come back a week later, and you’re right back where you left off. Different directory, different container. Two engagements never share a state. When the run ends, the container is gone.

Try it

Kage doesn’t make the API calls or interact with the target directly. That part is handled by Claude or any coding agent you’re using, our LLM-powered agent. Claude receives instructions from Kage, performs the security tests, and analyzes the results. Kage runs the environment, assembles the right tools and testers, filters out the noise, and pulls the findings together in a report. That’s the part that used to eat my day.

npx skills add workersio/spec

Then point it at a target:

/kage https://target.com

Runs take a while. When one finishes, you’ll find an audit-report.md in the results directory, sorted by severity, with a working proof-of-concept for every finding. If you want to know how the judge made a call, the log is right next to it.