Post

The Audited Agentic Workflow

The Audited Agentic Workflow

Agentic AI workflows have a trust problem. The faster an agent moves — generating files, opening PRs, calling APIs — the harder it becomes to know what it did, why it did it, and whether it should have. Speed and auditability feel like they’re in tension.

They don’t have to be. The key is designing the workflow so that every handoff between agent and human produces a durable record, and every consequential action requires a human to cross a gate first.

The pattern

The Audited Agentic Workflow is a three-stage chain: agent drafts, humans approve, automation publishes. Each stage has a clear input, a clear output, and a clear owner.

The agent’s job is to do the time-consuming translation work — converting natural language intent into structured output, running analysis, and surfacing impact in a form humans can review quickly. The human’s job is to read the output, decide if it’s correct, and approve or push back. Automation’s job is to execute the approved output reliably, with pre-flight checks that stop it if the human step was skipped.

What makes this audited rather than just sequential is that each stage leaves a trace. The agent’s output is committed to Git. The human’s decision is a PR approval with a timestamp and an identity. The automation’s execution is a version-controlled publish with a before/after record. At any point, you can reconstruct exactly what was requested, what the agent produced, what a human approved, and what was activated.

Applied to Octopus Platform Hub policies

The concrete implementation of this pattern in the platform-hub repo works as follows.

A policy request arrives as a GitHub issue using a structured issue template. The requester describes what to enforce in plain English and names the projects in scope. That’s the input — unambiguous enough for an agent to act on, human-readable enough for a reviewer to understand.

The Policy Author agent picks up the issue, runs four phases (intent parsing, OCL authoring, what-if analysis against live Octopus data, report generation), and opens a pull request. The PR contains two things: the OCL policy file committed to .octopus/policies/, and the what-if report committed to octopus-agent/output-<timestamp>/reports/ and posted as a PR comment.

A human reads the what-if report. It lists every in-scope project as Pass, Will Be Affected, or Unable to Evaluate Statically, with specific reasons. If the impact is acceptable, the PR is merged. If not, the reviewer requests changes and the agent iterates.

After the merge, the Policy Publisher agent activates. It runs its own pre-flight checks — verifying the OCL file exists, is not disabled, is committed, and has been pushed to the remote default branch — before calling the Octopus REST API to publish and activate the policy. It returns a summary table showing the previous version, the new version, and the activation status.

flowchart TD
    I["GitHub Issue\nPolicy intent + project scope"] --> A["Policy Author Agent"]
    A --> O["OCL file committed\n.octopus/policies/policy_name.ocl"]
    A --> W["What-if report committed\noctopus-agent/output-timestamp/reports/"]
    O --> P["Pull Request opened\nwhat-if report posted as PR comment"]
    W --> P
    P --> R{"Human Review"}
    R -->|"Needs changes"| A
    R -->|"Approved + merged"| M["Merge to main"]
    M --> U["Policy Publisher Agent\npre-flight checks"]
    U --> X["Publish + Activate\nOctopus Platform Hub"]

Why the gates are non-negotiable

The human review step and the publisher’s pre-flight checks aren’t overhead — they’re what makes the rest of the workflow trustworthy.

Without the review gate, an agent that misinterprets scope could activate a policy that blocks deployments across more projects than intended. Rego that looks correct can have subtle issues — an overly broad default evaluate := true that catches runbook runs the policy wasn’t meant to cover, or a slug mismatch because the agent used a display name instead of the actual environment slug. The what-if report surfaces these before anyone is affected. The PR is where you catch them.

Without the publisher’s pre-flight checks, a common failure mode emerges: someone authors a policy locally, forgets to push before running the publisher, and the Octopus API silently publishes the previous committed version. The policy appears to activate but enforces the wrong content. The publisher’s hard stop on unpushed commits closes that gap — if the check fails, it tells you exactly why and what to do.

The agent is fast. The gates are what keep fast from becoming reckless.

The broader principle

This pattern is not specific to Octopus policies. The same structure — agent produces a reviewable artifact, human approves at a PR gate, automation executes with pre-flight checks — applies wherever you have a workflow that benefits from AI acceleration but requires human accountability before consequential action.

The audit trail it produces isn’t a compliance checkbox. It’s what lets you move faster with confidence. When something goes wrong — and eventually something will — you can trace exactly what was requested, what the agent decided, what a human approved, and what was activated. That’s the foundation for improving the workflow, not just debugging an incident.

AI drafts. Humans approve. Automation publishes. The value is in holding that sequence.

This post is licensed under CC BY 4.0 by the author.