Automation Readiness Score

The Risk

Why automation fails when you skip the score

Organizations deploy AI agents into workflows without understanding the baseline risk. The result: compliance incidents, efficiency disasters, and automation projects that consume capital and erode trust.

~1%

of enterprises have mature governance infrastructure for AI agents

Everest Group, 2025

78%

of C-suite leaders agree agentic AI requires a fundamentally new operating model

IBM IBV, 2025

40%+

of agentic AI projects will be canceled by end of 2027

Gartner, June 2025

Compliance Incidents

Agents handle regulated steps without knowing the approval rules, audit trail requirements, or exemption criteria. Blind automation into high-compliance workflows creates liability.

Exception Overload

High-variance workflows trap agents in decision branches they can't resolve. What looked like an 80% automation opportunity becomes 20% when edge cases surface at runtime.

Efficiency Promises Unmet

Workflow mining shows theoretical efficiency gains. Real deployment reveals waiting states, rework loops, and system latencies that prevent automation from delivering promised ROI.

“Risk increases sharply from single to multi-agent systems unless governance evolves in parallel.” Harvard Business Review, June 2025

The Canonical Dimensions

Seven canonical dimensions. Two co-equal halves.

The score isn't a single number pulled from thin air. It's a composite across two co-equal sub-components — Process Stability and Execution Risk — each rolling up from weighted behavioral dimensions. Every filed interaction penalty pairs one Process Stability dimension with one Execution Risk dimension, catching the looks stable but execution is risky patterns that a linear combination would miss.

Process Stability

Can an agent run this workflow predictably?

Consistency

How deterministic is execution?

Does the workflow follow the same sequence every execution, or do the same inputs produce different paths? A high Consistency score means agents can predict what comes next; low means they'll hit decision points without pattern support.

Matters because: Consistent execution is the precondition for safe automation. Inconsistent workflows require agent reasoning at every branch — which is where judgment errors happen.

UI stability

How stable are the interfaces the workflow touches?

Do applications the workflow touches change layouts, selectors, or field names? Do rework loops, repeated failed actions, or navigation stalls appear in the observed execution? Instability flag patterns (including rework loops, failed actions, loading stalls, hesitation pauses, path deviations, and validation failures) aggregate into this dimension.

Matters because: Agents that hardcode selectors or assume UI layouts break the moment a vendor ships an update. UI stability scoring surfaces the workflows where automation will age the fastest.

Repetition

How frequently does this workflow actually execute?

How often does the workflow run across the observation window, and how consistent is that frequency? Higher observed volume means a stronger statistical basis for the score itself, and a bigger operational footprint for automation to impact.

Matters because: Low-frequency workflows don't pay back an automation investment and don't accumulate enough observation for a confident score. Repetition frequency is the ROI and evidence-density dimension rolled into one.

Execution Risk

If automation misbehaves, how bad is the damage?

Complexity

How many systems does the workflow span?

How many applications does the workflow touch, and how are handoffs between them structured? Single-application workflows are simpler to automate; workflows that span multiple systems with informal handoffs (copy-paste, re-entry, manual lookups) carry higher integration complexity and more failure modes.

Matters because: Every cross-application handoff is a place where automation can lose context. Complexity is the primary predictor of integration brittleness.

Data structure

How structured is the data the workflow processes?

Are the inputs a workflow receives well-structured (typed fields, consistent schemas, validated formats) or unstructured (free-text notes, mixed formats, implicit relationships)? Structured inputs produce predictable agent reasoning; unstructured inputs require judgment agents often get wrong.

Matters because: Agents are only as reliable as their inputs. A well-modeled workflow with messy data is still an agent hazard.

Exception rate

How often does execution deviate from the happy path?

What fraction of observed executions take an exception path — escalation, retry, human-judgment loop, timeout, validation failure? The exception set is where business risk concentrates: 5% of cases can produce 95% of the legal, financial, or customer exposure.

Matters because: Happy-path automation looks great in pilot and breaks in production. Exception rate scoring is the difference between an automation that works at demo scale and one that survives real operations.

Compliance risk

Regulatory exposure if automation misjudges the work

Are regulated data elements (PII, PHI, PCI, financial instruments) touched by the workflow? Do cross-system flows create audit gaps? Are there human-judgment regulatory decisions a workflow step currently encodes that should not be agent-delegated? Every element compounds the blast radius of a misjudged automation.

Matters because: Compliance failures aren't efficiency problems — they're legal and regulatory problems. Automation with the wrong guardrails in a regulated workflow can produce liability that dwarfs the efficiency gain.

Diagnostic Lenses

The companions that explain AR

Both appear alongside AR in every readout. Neither is a decision signal on its own — they explain why a workflow's AR is what it is, and where the time is going if you decide to redesign instead of automate.

Entropy

Behavioral variability across the session population

0.6

Shannon entropy of the variant frequency distribution. Computed from the actual distribution of execution paths across observed sessions — low entropy when a handful of variants dominate, high entropy when execution fragments into many distinct paths.

Claims Intake · 30 days: 142 sessions · 15 distinct variants · H = 0.62 · top 3 variants cover 87% of sessions · long tail is 12 rare variants.

Why it matters: 0.6 is why AR is 74 rather than higher — the long-tail variants mean the agent will periodically face branches outside the dominant three. Entropy is a diagnostic signal, not a peer to AR.

Flow Efficiency

Ratio of value-adding work to total elapsed time

68%

Active work time divided by total elapsed time per session. Active = typing, clicking, waiting for system response. Non-active = idle waits for approvals, cross-system response delays, manual lookups, and rework loops.

Claims Intake · 30 days: avg session 12:04 · active 8:13 (68%) · wait 2:45 · rework 1:06 · biggest single-step stall: prior-auth lookup averaging 1:38.

Why it matters: 32% of elapsed time is wait and rework — a redesign opportunity orthogonal to automation. A workflow can have high AR and still carry meaningful Flow Efficiency gains from process simplification.

The Output

The score produces a Living Blueprint

The Automation Readiness Score isn't just a number. It produces a machine-readable execution plan that agents consume at runtime. The score becomes operational boundaries.

Approved Paths

Which decision branches is the agent allowed to take? Which paths are high-confidence and which require escalation?

{ "paths": [ "approve_if_amt < 5k", "escalate_if_amt > 50k", "request_doc_if_incomplete" ] }

Blocked Paths

Which branches is the agent forbidden from taking? What requires human judgment or regulatory review?

{ "blocked": [ "override_compliance", "waive_verification", "process_if_vip_flag" ] }

Escalation Rules

When does the agent hand off to a human? What thresholds trigger escalation or review?

{ "escalate_if": { "confidence < 0.75": true, "exception_count > 3": true, "regulatory_flag": true } }

Safety Thresholds

What confidence scores, data quality flags, or system health conditions must be met for the agent to proceed?

{ "thresholds": { "confidence": 0.85, "data_completeness": 0.95, "system_uptime": "99%" } }

Rollback Triggers

What conditions halt agent execution and alert operations? How does TNDRL protect you if something goes wrong?

{ "rollback_if": { "error_rate > 0.05": true, "failed_audit": true, "policy_changed": true } }

Audit Trail

Every agent decision includes a reference back to the blueprint that authorized it. Governance is auditable, not opaque.

{ "decision": "approve", "governed_by": "blueprint_v3", "timestamp": "2026-01-15T...", "audit": "complete" }

After Deployment

The score stays live. Drift gets caught.

Automation readiness isn't a one-time assessment. TNDRL keeps watching. If the real work diverges from the model, alerts fire and enforcement kicks in.

Behavioral Observation Continues

After automation launches, TNDRL keeps collecting behavioral data — both from agents and from any remaining manual work. The Workflow Twin updates continuously as real execution patterns emerge.

Blueprint Comparison

Live execution is compared against the Living Blueprint. If agents start taking paths they weren't approved for, or if human work deviates from the modeled process, TNDRL detects the divergence.

Drift Alerts

When divergence exceeds configured thresholds, alerts fire in the web app with severity (informational, warning, critical) and recommended actions (update the blueprint, suspend automation, escalate to compliance).

Re-scoring and Policy Update

If drift is structural (new decision branches, changed approval rules), TNDRL re-scores the workflow and recalculates automation readiness. Policy is updated in the web app. Agents pick up the new blueprint on next sync.

The score doesn't go stale. Governance doesn't become theater. You always know whether your automation is still safe.

Every AI platform tells you what to automate. None of them score the risk first.