What this means
A guardrail is a technical constraint embedded in the agent's design. It might prevent the agent from accessing certain data categories, block outputs that contain specific content types, cap the financial value of transactions the agent can initiate, or halt execution entirely when a defined anomaly condition is met. Guardrails operate without human involvement — they are enforced by the system.
An approval flow is a structured pause: the agent reaches a decision point, determines that human authorisation is required, and suspends execution while routing the pending action to a reviewer. The reviewer approves, modifies or rejects. The agent then proceeds based on the decision.
Both mechanisms are necessary. Guardrails handle predictable risk categories at scale without human cost. Approval flows handle situations that are consequential enough, novel enough or context-dependent enough that algorithmic constraint is insufficient.
Why it matters for business
Agentic AI without controls is an operational liability. An agent that can send emails, update CRM records, modify pricing, or trigger purchase orders without appropriate constraints introduces the same risks as giving a junior employee unrestricted system access without supervision.
Australia's regulatory environment reinforces the commercial case. Under the Privacy Act 1988 and the Australian Privacy Principles, automated processing of personal information carries accountability obligations. The proposed mandatory guardrails in Australia's AI governance framework specifically address high-risk automated decision-making. Organisations that design approval flows and guardrails proactively are building compliance infrastructure, not just operational safety nets.
How it works technically
Guardrails are implemented at several layers of the stack:
- Input guardrails: Filters applied to the task or prompt before the agent begins reasoning. A topic classifier might redirect certain request types away from an autonomous agent entirely.
- Execution guardrails: Tool permission scopes that enforce least-privilege access; rate limits that prevent runaway API calls; value caps on financial operations.
- Output guardrails: Validators that check agent outputs against defined schemas, content policies or business rules before the output takes effect — either being written to a system, sent to a user, or passed to the next agent.
- Circuit breakers: Monitoring logic that detects anomalous behaviour patterns (unexpected tool call frequency, outputs that deviate significantly from expected structure) and suspends the agent until a human reviews.
Approval flows require orchestration-layer support: the ability to pause a workflow mid-execution, serialise its state, surface the pending action to a reviewer through an appropriate interface, and resume execution with the reviewer's decision incorporated.
A well-designed approval flow also captures the reviewer's reasoning, creating an audit trail that supports both compliance reporting and model improvement.
Practical implementation considerations
The starting point for designing guardrails is a risk taxonomy for the agent's capabilities. For each tool the agent can call, assess: what is the worst-case outcome if this tool is called incorrectly? Is it reversible? Who bears the consequence? That taxonomy drives the guardrail design — tools with high worst-case impact get the tightest constraints.
Approval flows should be designed with the reviewer's experience in mind. A reviewer who receives a notification with no context for why the agent paused, no visibility into its reasoning, and no clear description of the proposed action cannot make a reliable decision. The approval interface must present the agent's planned action, the reasoning behind it, and the relevant context — in a format that enables a genuine decision, not a rubber stamp.
Working with Edison AI's AI implementation team on agentic builds, a common finding is that approval flow volume must be estimated before deployment. If approval queues are expected to receive more items than reviewers can process within operational SLAs, either the guardrail thresholds need adjustment or additional reviewer capacity is required. Neither is a reason to skip approval flows — but both must be planned for.
Common mistakes
- Designing guardrails only for known failure modes: Guardrails must also account for adversarial inputs (prompt injection attempts) and unexpected combinations of valid inputs that produce harmful outputs.
- No expiry on approval requests: Agents that park a pending action indefinitely while waiting for review can leave downstream systems in an inconsistent state. Approval requests need time limits and a default action if the review lapses.
- Approval flows that only capture yes/no decisions: A binary approval misses the opportunity to capture context about why the reviewer decided as they did — context that is valuable for improving the model and demonstrating governance.
- Treating guardrails as set-and-forget: Agent behaviour evolves as models update and the business context changes. Guardrails must be reviewed on a scheduled cadence.
- Conflating guardrails with content moderation: Content moderation addresses output quality. Guardrails address operational risk and action scope. Both are necessary; neither substitutes for the other.
What leaders should do next
Before any agentic deployment, produce a controls register: list each tool the agent can access, the maximum consequence of that tool being called incorrectly, and the guardrail or approval flow mechanism designed to address that consequence. Review this register with your risk and compliance function. Treat it as a living document that is updated whenever the agent's tool set or operating context changes.
Edison AI designs and ships AI agents and workflow automation built around how your business actually runs.