The Canonical Dimensions
Seven canonical dimensions. Two co-equal halves.
The score isn't a single number pulled from thin air. It's a composite across two co-equal sub-components — Process Stability and Execution Risk — each rolling up from weighted behavioral dimensions. Every filed interaction penalty pairs one Process Stability dimension with one Execution Risk dimension, catching the looks stable but execution is risky patterns that a linear combination would miss.
Process Stability
Can an agent run this workflow predictably?
Does the workflow follow the same sequence every execution, or do the same inputs produce different paths? A high Consistency score means agents can predict what comes next; low means they'll hit decision points without pattern support.
Matters because: Consistent execution is the precondition for safe automation. Inconsistent workflows require agent reasoning at every branch — which is where judgment errors happen.
Do applications the workflow touches change layouts, selectors, or field names? Do rework loops, repeated failed actions, or navigation stalls appear in the observed execution? Instability flag patterns (including rework loops, failed actions, loading stalls, hesitation pauses, path deviations, and validation failures) aggregate into this dimension.
Matters because: Agents that hardcode selectors or assume UI layouts break the moment a vendor ships an update. UI stability scoring surfaces the workflows where automation will age the fastest.
How often does the workflow run across the observation window, and how consistent is that frequency? Higher observed volume means a stronger statistical basis for the score itself, and a bigger operational footprint for automation to impact.
Matters because: Low-frequency workflows don't pay back an automation investment and don't accumulate enough observation for a confident score. Repetition frequency is the ROI and evidence-density dimension rolled into one.
Execution Risk
If automation misbehaves, how bad is the damage?
How many applications does the workflow touch, and how are handoffs between them structured? Single-application workflows are simpler to automate; workflows that span multiple systems with informal handoffs (copy-paste, re-entry, manual lookups) carry higher integration complexity and more failure modes.
Matters because: Every cross-application handoff is a place where automation can lose context. Complexity is the primary predictor of integration brittleness.
Are the inputs a workflow receives well-structured (typed fields, consistent schemas, validated formats) or unstructured (free-text notes, mixed formats, implicit relationships)? Structured inputs produce predictable agent reasoning; unstructured inputs require judgment agents often get wrong.
Matters because: Agents are only as reliable as their inputs. A well-modeled workflow with messy data is still an agent hazard.
What fraction of observed executions take an exception path — escalation, retry, human-judgment loop, timeout, validation failure? The exception set is where business risk concentrates: 5% of cases can produce 95% of the legal, financial, or customer exposure.
Matters because: Happy-path automation looks great in pilot and breaks in production. Exception rate scoring is the difference between an automation that works at demo scale and one that survives real operations.
Are regulated data elements (PII, PHI, PCI, financial instruments) touched by the workflow? Do cross-system flows create audit gaps? Are there human-judgment regulatory decisions a workflow step currently encodes that should not be agent-delegated? Every element compounds the blast radius of a misjudged automation.
Matters because: Compliance failures aren't efficiency problems — they're legal and regulatory problems. Automation with the wrong guardrails in a regulated workflow can produce liability that dwarfs the efficiency gain.
Diagnostic Lenses
The companions that explain AR
Both appear alongside AR in every readout. Neither is a decision signal on its own — they explain why a workflow's AR is what it is, and where the time is going if you decide to redesign instead of automate.
Shannon entropy of the variant frequency distribution. Computed from the actual distribution of execution paths across observed sessions — low entropy when a handful of variants dominate, high entropy when execution fragments into many distinct paths.
Claims Intake · 30 days: 142 sessions · 15 distinct variants · H = 0.62 · top 3 variants cover 87% of sessions · long tail is 12 rare variants.
Why it matters: 0.6 is why AR is 74 rather than higher — the long-tail variants mean the agent will periodically face branches outside the dominant three. Entropy is a diagnostic signal, not a peer to AR.
Active work time divided by total elapsed time per session. Active = typing, clicking, waiting for system response. Non-active = idle waits for approvals, cross-system response delays, manual lookups, and rework loops.
Claims Intake · 30 days: avg session 12:04 · active 8:13 (68%) · wait 2:45 · rework 1:06 · biggest single-step stall: prior-auth lookup averaging 1:38.
Why it matters: 32% of elapsed time is wait and rework — a redesign opportunity orthogonal to automation. A workflow can have high AR and still carry meaningful Flow Efficiency gains from process simplification.