Workload

How to write a workload that finds useful reliability bugs.

A workload is a small program or command that exercises behavior your users depend on. It should do real work, check that the result is correct, and fail clearly when the system breaks a rule.

The goal is not to write every test case by hand. The goal is to describe one important behavior well enough that Workers IO can run it repeatedly under different conditions.

What a good workload does

A good workload has three parts:

Setup: create the data, users, queues, files, or service state the workflow needs.
Actions: drive the behavior your users care about.
Checks: verify the result and exit non-zero when it is wrong.

Good workloads are usually small. They cover one workflow or one subsystem boundary, not the whole product.

Examples:

A user signs up and receives access.
A payment is charged exactly once.
A queued job is eventually processed.
A sync completes without losing records.
A retry succeeds without duplicating state.

Check invariants

An invariant is a rule that must always hold. Prefer invariants over checking for one exact transcript of events.

Good invariants:

No lost writes.
No duplicate IDs.
No double charge.
Balances never go negative.
Every accepted job finishes once.
A retry does not create extra state.

When an invariant fails, print the important context and exit with a non-zero status code.

Example workload

This example drives a checkout workflow and checks that an order is paid exactly once.

import json
import os
import sys
import urllib.request

BASE_URL = os.environ["APP_URL"]


def post(path, body):
    data = json.dumps(body).encode("utf-8")
    req = urllib.request.Request(
        f"{BASE_URL}{path}",
        data=data,
        headers={"content-type": "application/json"},
        method="POST",
    )
    with urllib.request.urlopen(req, timeout=10) as res:
        return json.loads(res.read())


order = post("/checkout", {"user_id": "u_123", "sku": "starter"})

if order.get("status") != "paid":
    print(f"INVARIANT FAIL checkout_paid status={order.get('status')}")
    sys.exit(1)

if order.get("charge_count") != 1:
    print(f"INVARIANT FAIL charged_once count={order.get('charge_count')}")
    sys.exit(1)

print(f"INVARIANT PASS checkout order_id={order.get('id')}")

The exact language does not matter. The important shape is: run the workflow, check the rule, print useful output, and fail clearly.

Good vs bad workloads

Good:

Create a user, submit a checkout request, wait for the order to settle,
then verify the order is paid and exactly one charge exists.

Bad:

Run the whole end-to-end suite and fail if any snapshot changes.

The good workload has a clear behavior and clear correctness rule. The bad workload is too broad, hard to debug, and likely to fail for reasons unrelated to reliability.

Good:

Create 20 jobs, let workers process them, then verify every accepted job
finished once and no job ID appears twice.

Bad:

Sleep for 5 minutes, then check whether the queue looks empty.

The good workload checks the state that matters. The bad workload relies on wall-clock waiting and gives little information when it fails.

Output

Print enough information to debug a failure from logs:

The invariant name.
The expected result.
The actual result.
The IDs or keys involved.

Keep output concise. Logs should explain the failure without forcing someone to rerun the workload just to understand what happened.

Command and file path

When you run a simulation, Workers IO needs the command and the workload file path.

Example:

wio simulate create <project-id> \
  --command "python3 .workers/workloads/checkout.py" \
  --workload-path ".workers/workloads/checkout.py" \
  --depth 1

The workload path is captured with the run so you can inspect the exact file used for that result.