Skip to content

AQ Score: A Foundational Framework for Independent Measurement of Autonomous System Governance

Disclosure statement. This paper states what the framework requires, the Five Laws of AI Governance, the four properties of a standards body, and the role of an independent measurement layer, across both digital and physical autonomy. It does not disclose how the measurement is computed, nor the specific claim language of the filed patent portfolio; those are deferred to the non-provisional filing window and the published USPTO record. The numbers cited are verifiable from public USPTO records and the NIST OLIR catalog.


Abstract

Autonomous systems now act without a human at the controls, AI agents that approve transactions and modify records, and physical systems that hold airspace, drive vehicles, and act on sensor data. The field treats the governance of these as four separate problems: AI, drones, RF, robotics. They are one. Each requires binding what the autonomous actor is allowed to do, at the moment it acts, with evidence the actor could not have produced about itself. What none of the current approaches provide is an independent way to measure how well the autonomous actor was governed: a scale, produced by a party that does not also build or sell the system, for what a system was authorized to do, what it did, and the distance between the two. That gap leaves the risk not merely unmeasured but unpriced. This paper argues that the absence is structural, not accidental: the operators who build autonomous systems cannot credibly grade their own conformance, and the coalitions that might do so collapse into neutral taxonomy when ranking their own members. The seat for an independent measurement authority is empty for the same reason FICO is independent of lenders and UL of manufacturers. This paper sets out a foundational framework to occupy that seat: the Five Laws of AI Governance as the normative core, a four-property test for what a standards body must satisfy, and AQ Score™ as the measurement layer, across both digital and physical autonomy. The framework is offered as a federated standard, not a proprietary scheme: the criteria are published, the evidence is verifiable, and the practitioner who builds the work first does not own the standard.


1. The Category Problem

Governance for autonomous systems is the most-discussed unbuilt category in security.

The trade press of the last twelve months runs to thousands of column inches on AI governance frameworks, products, certifications, and maturity models. The same period records drone incidents at airports, RF interference events at substations, autonomous-vehicle edge cases at intersections, unmanned aerial system incursions across critical infrastructure, and the steady accumulation of autonomous systems entering critical infrastructure environments without binding rules for how they are governed when they act.3 Each story is covered in its own silo. AI gets one beat; drones get another; RF a third; physical robotics a fourth. The connecting thread is what nobody is writing about.

The connecting thread is this: an autonomous system, by definition, acts without a human at the controls. The system that governs an autonomous AI agent at the moment of action, the system that governs an autonomous drone over a substation, and the system that governs an autonomous control loop inside critical infrastructure are, structurally, the same kind of system. All of them must bind what the autonomous actor is allowed to do, when the action happens, with evidence the actor could not have produced about itself. That is one category. The field treats it as four.

What you will not find, across any of those silos, is a way to measure how well the autonomous actor was governed, or whether it acted within its bounds. There is no shared, independent measurement layer for whether the system did what it was authorized to do. There is no agreed scale for the safety, efficiency, and effectiveness of autonomous action. There is no FICO for autonomous action, no UL listing for autonomous decision-making, no recognized arbiter a CISO or a Chief AI Governance Officer6 or a Critical Infrastructure Director can point to and say that system, measured against that benchmark, with that audit trail.

There is no independent conformance score. The market is operating on vibes, marketing decks, and the assumption that the eventual standard will be obvious when it arrives. It will not be obvious. It will be named.


2. The Structurally Empty Seat

Nobody currently on that field can independently say whether an autonomous system actually stayed inside the bounds it was given. Not whether the controls exist on paper, whether they held, at the moment the system acted, measured by someone who does not also sell the controls. That is not a feature gap. It is a structural vacancy, and it is empty for reasons that do not go away with more funding or a better product. There are three.

An operator cannot score its own governance. The grade is self-validating, so the market cannot trust it. The boxer does not referee his own fight. When the company that sells the system also issues the grade on whether the system behaved, the grade is marketing, not measurement.

A coalition cannot ship the score either. A coalition is a table of competitors. A real score implies winners and losers among the parties at that table, so coalitions retreat to neutral taxonomy every time. That is why shared-body scorecards keep getting announced and never built: the moment a coalition tries to grade its own members, the members stop agreeing.

A score is a different kind of object than a framework. A framework is advice. A score is a falsifiable, attributable claim that someone relied on, with consequences if it was wrong. One carries liability; the other does not. That is why the field reaches for severity ratings, maturity models, and best-practice checklists, and then stops just short of the one thing that would settle whether the system stayed in bounds.

This shape has appeared before. Lenders did not get to grade their own borrowers, so FICO became its own company. Bond issuers did not get to rate their own bonds, so Moody's did. Manufacturers did not get to certify their own products as safe, so UL did. In each case the independent score did not become inevitable when someone argued for it; it became inevitable when a party that prices risk, a lender, a bond market, an insurer, began allocating capital on the basis of it. UL became the de facto requirement because insurers would not underwrite what UL had not listed. A credit score and a bond rating became load-bearing because lending and the bond market priced on them. The score earns its authority the moment the cost of ignoring it shows up on a balance sheet. Every time autonomy starts acting at scale, the same seat opens, and it is never filled by a player already on the field. It is filled by someone structurally separate from both the operators and the rulebook writers, because that separation is the entire product.

The independence is not a feature of the score. The independence is the score.


3. The Three-Layer Diagnosis

When the word "governance" attaches to an autonomous system, it does at least three different kinds of work. Most of the field talks about it as one thing. That conflation is where the score gets lost.

Process governance. Methodology, frameworks, change management, the organizational choreography of adopting autonomous systems responsibly. Real work. It produces playbooks, training, maturity models. It does not bind any specific system at any specific moment. It answers how do we adopt autonomy responsibly. It is advisory by design, and most of what the trade press calls "AI governance" lives here.

Accountability governance. The operational overlay of RACI2 charts, lineage maps, audit trails, and incident response. Also real work. It produces investigation evidence after the fact and assigns ownership when something goes wrong. It depends on the system being well-behaved during the period of observation, a dependency the documented evaluation-aware-deception literature suggests is unsafe to rely on.7 It answers who is responsible when the system acts. It is reactive by design, downstream of the action.

Runtime governance.4 The dispatch-time enforcement that binds what a system is allowed to do at the moment of action, by infrastructure architecturally separated from the system itself. For a digital agent, that means a governed execution framework that decides at dispatch what the agent may invoke, with what data, against what destination, under what risk classification, the architectural approach a frontier AI lab has publicly described as supervising "what the agent is able to do" rather than what it does.1 For a physical system, a drone over a critical-infrastructure site, an unmanned aerial vehicle in regulated airspace, an autonomous control loop inside a substation, it means a multi-domain sensing and decision substrate that decides, before action, what the system perceives, what it is authorized to act on, and what evidence of that decision exists outside the system. This is the binding layer. It is also the empty layer, in both domains. The trade press writes about it least; it is the hardest to build; and it is the only layer that produces audit evidence that does not depend on the governed system narrating its own compliance.

The category boundary the field has not drawn is this: process and accountability governance are necessary and good, and they are not a score. They are the documentation of intent and the trace of outcome. A score is the measurement of what happened between intent and outcome, at the moment the system moved, with evidence the system could not have produced about itself. The score lives in the third layer. The third layer is mostly empty, on both sides of the digital-physical line. That is the gap.

The thing the third layer measures is narrow and exact: did this specific action stay inside this specific authorization, judged by something that did not also produce the action. Everything else in this paper, the Five Laws, the four properties, the dual-domain scope, is built to make that one measurement trustworthy.


4. The Five Laws of AI Governance

A governed autonomous system, left to its own architecture, collapses four roles into itself: it becomes the judge, the witness, the owner, and the executioner of its own governance. The Five Laws strip each role from the governed system and assign it to something external and separable, and require that governance bind before the act, not after.

First Law: Independent Measure. A system may not be the measure of its own conduct. The scale of what it was authorized to do versus what it did must exist outside the system and outlive it. The failure mode prevented is the self-issued scorecard, which the market cannot trust because the grader sells the controls.

Second Law: Binding at Dispatch. A system may not be governed only in retrospect. Authority must constrain the action before the action occurs; observation after the fact is incident response, not prevention. Absence of a valid governance signal is a denial, not a permission. The failure mode prevented is governance that attaches only after the irreversible action has already happened.

Third Law: External Witness. A system may not be the sole witness to its own compliance. The record of its conduct must be produced by infrastructure the system cannot alter. The failure mode prevented is self-reported logs, which fail the moment the system is compromised or misaligned.

Fourth Law: Separated Authority. No party may certify itself. The power to measure must be separable from the act of building, with published criteria and verifiable evidence chains; the one who builds it first does not own the standard. A judge drawn from the same architecture as the builder inherits the same blind spots. Independence is not a different instance. It is a different species. Building the work first and owning the standard are separate questions; how the measuring authority becomes separable from its builder is the subject of §5. The failure mode prevented is the standards body that is also the only vendor measured against itself.

Fifth Law: Revocable Authority. A system may not outlive its off-switch. The power to halt or revoke must remain live, external, and superior to the system at all times, and must not erode as the system's capability grows. The failure mode prevented is the system that has grown past the point where anyone can stop it.

Scope boundary. This framework governs conformance to an authorization, not the content of the authorization. A system can be perfectly governed into doing something reckless if the grant itself was reckless. This is a deliberate scope decision: a measurement standard, not a policy-content standard. "Compliant" is not "safe." It is the precondition for reasoning about safety, because you cannot reason about the safety of a system you cannot bind.

Applicability. The framework applies to autonomous systems that take consequential action without a human at the controls at the moment of action, in both the digital domain (agents that invoke tools, move data, or change state) and the physical domain (systems that act on the world directly, for example on airspace, vehicles, sensor fields, or critical-infrastructure environments). It is not aimed at human-in-the-loop tools where a person authorizes each action, or at non-autonomous software. Where a system is partly autonomous, the framework applies to the actions it takes autonomously.

It is also, deliberately, not a pre-deployment evaluation. Pre-deployment evaluation grades a model before it acts and then lets it act; that is the pre-hoc face of the failure mode the Second Law exists to prevent. A grade earned in evaluation does not bind the system at runtime, and the documented evaluation-aware-deception literature shows that systems can behave differently once they recognize they are being tested.7 Measuring conformance therefore has to happen where the action happens, not before it. This is distinct from the pre-deployment configuration of runtime policy, which the framework does cover: setting, before deployment, the authorization a system will be held to at the moment it acts is not the same as certifying the system safe in advance of acting.

Relationship to the four properties. The Five Laws are the prohibitive inversion of the four properties a standards body must satisfy (§5). Same architecture, two registers: the Laws state what a governed system may not do; the properties state what the measuring body must provide. The Fifth Law, revocability, surfaces a requirement the four properties do not yet name.

The Five Laws are also published on their own, with a one-page brief for download: The Five Laws of AI Governance.


5. What a Standards Body Actually Requires

A standards body is not a brand, a certification logo, a working group, a methodology, or a maturity model. It is a structure that satisfies four properties, all at once, none optional. Each exists because of a specific failure mode the absence of the property produces. The properties are not aspirations. They are the test.

1. A measurement layer that survives the system being measured. A way to measure what a system was authorized to do, what it actually did, and the distance between the two, in terms that hold across implementations, vendors, deployment contexts, and domains. The measurement must exist independent of any single vendor's instrumentation, compare a digital agent's governed action and a physical system's governed action against the same conceptual scale, and produce a result that does not depend on the governed system narrating its own compliance. Without a shared scale, the procurement question which of these governs the system better is unanswerable, and procurement defaults to the loudest vendor. A standards body without a measurement layer is a logo. This is the layer AQ Score™ is being built to occupy.

2. Dispatch-time enforcement, not post-hoc observation. The governance under measurement must bind the system at the moment of action, not at the moment of audit, the third layer of the Three-Layer Diagnosis. Probabilistic systems produce non-compliant behavior some percentage of the time regardless of training-time alignment, and post-hoc observation by definition arrives after the non-compliant behavior already happened. For an agent that just exfiltrated credentials, or a drone that just entered restricted airspace, post-hoc observation is the start of the incident-response timeline, not the prevention. A standards body that certifies post-hoc compliance is certifying that the system had a chance to mostly behave.

3. Tamper-evident attestation produced outside the governed system. The evidence of governance, what was authorized, executed, denied, and on what basis, must be produced by infrastructure architecturally separated from the system being governed. The governed system cannot be the only witness to its own compliance. The failure mode is the system that logs its own behavior and reports those logs as proof; this works only as long as the system is honest and well-behaved, and the moment it is compromised or misaligned, its self-reporting is the first thing to fail. A standards body that accepts self-reported logs is certifying a story, not a system.

4. Federated authority: no single vendor measures itself. The standard must derive its authority from a structure where no single vendor, including the practitioner who built the work, is the sole measurement authority. The standard exists above any vendor's interest, with separable certifying parties, published criteria, and verifiable evidence chains. This answers the most common hostile-reader question: why you, and not NIST, or ISO, or one of the Big Four? The answer is that the standards body does not need to be operated by the practitioner who built the work, and over time it will not be. The work is published in the public record at the USPTO, mapped to recognized frameworks, and submitted through federal standards processes. Other actors (NIST, ISO, federal agencies, certifying bodies, trade associations) can engage with it, extend it, contest it, and adopt it. The practitioner who does the work first does not own the standard. The standard is what the field eventually agrees the work is.

The honest version of the present moment is that the practitioner who built the work also runs the entity around it today. That is the starting condition of every standard, not a permanent state. Independence here is a structural commitment with a verifiable trajectory: the measuring authority is being separated from the building entity, the criteria are published rather than held, and the evidence chains are made inspectable by parties with no stake in the result. A reader is entitled to hold the work to that trajectory and to measure the distance between the commitment and the structure at any point. That is the same test the framework applies to everything else.

Every one of these describes what a standards body must do, not what it must be called. A working group with the right structure is a standards body. A vendor consortium with the wrong structure is not, whatever it calls itself. The test is in the properties, not the title.


6. Dual-Domain Coverage

There is a second vacancy hiding inside the first. Every player in the crowded field governs software, an agent that approves a transaction, edits a record, calls an API, all of it on a screen. But autonomy already left the screen. The agent is also a drone holding a piece of airspace, a vehicle deciding to change lanes, a sensor system deciding what it just detected. When that system acts outside its authority, the cost is not a deleted database row. It is a collision, an incursion, a physical event that cannot be rolled back. Same governance question. Same empty referee seat. A much larger blast radius.

This is what makes the seat harder to fill, not easier. The independent scorekeeper this moment requires has to keep score across both domains, the software agent and the physical one, because increasingly they are the same deployment. The same four-property test that governs an AI agent at the moment of dispatch also governs a drone over critical infrastructure, an autonomous control loop inside a substation, and an unmanned aerial vehicle in regulated airspace. One standards body. One measurement scale. Two substrates, one category.

The architecture covers both domains today. The governed execution framework for digital autonomous agents is operational. The multi-domain sensing system that makes governance of physical autonomous systems possible is operational in production, and the work that joins the two halves into one architecture is filed in the patent record. A referee that can only see half the field is not a referee.

This is why AQ Score™ is, by design, a single measurement layer across both domains rather than two separate scores bolted together. The conformance question is the same on either side of the digital-physical line: was the autonomous actor's action inside its authority, measured by something it could not edit? The scale that answers it is one scale, applied to a software agent at dispatch and a physical system at the moment it acts. A measurement standard that covered only the digital half would describe the same partial referee this section just ruled out. The score spans both because the seat does.


7. The Receipts

The architecture is already in the public record, and it engages two distinct NIST roles: as a framework the work maps to, and as a federal body reviewing the work itself.

The filings. The governed execution framework for digital autonomous agents and the multi-domain sensing system for physical autonomous systems are filed at the USPTO across five provisional patents covering 334 total claims, part of one dual-domain portfolio.5 The architecture is mapped against 17 published regulatory and standards frameworks through 95 documented compliance control mappings, including NIST AI RMF, the EU AI Act, ISO/IEC 42001, NIST CSF 2.0, and NIST SP 800-53. The full control catalog and framework alignment is published as a public reference document, open for inspection. The measurement layer the standards body is being built around, AQ Score™, is in trademark intent-to-use filing at the USPTO.

The work is in the public record. Three SDOS Concept Crosswalks are cataloged with the NIST OLIR Program as Draft Informative References, each in its public review window: Reference ID 212 (AI RMF 1.0), Reference ID 215 (CSF 2.0), and Reference ID 217 (SP 800-53 Rev 5.2.0). Catalog inclusion is a public-record fact, not an endorsement: the crosswalks are listed for public review, and the relationship type is supportive, mapping runtime-governance controls to framework subcategory intent without claiming equivalence. The doctrine sits on top of that inspectable stack: the Five Laws at the memorable top, twenty-four controls across nine governance domains mapped to the NIST AI RMF subcategories beneath.

The point is not the count. The point is that the four properties are not aspirations, the work is operational, running in production, and reduced into instruments other actors can verify, contest, extend, or adopt: patent specifications anyone can read, framework crosswalks anyone can audit, federal references anyone can examine, and a governed execution framework producing audit evidence in real deployments. A claim without an audit trail is a press release. The audit trail is the standards body.

Two observations follow. First, the architecture covers both domains today. There is no future-roadmap promise; the dual-domain coverage is already in the filings, the mappings, and the public submissions. Second, the work is already in the public record on multiple tracks: three SDOS crosswalks are cataloged with the NIST OLIR Program as Draft Informative References in public review; published frameworks including the EU AI Act, NIST AI RMF, and ISO 42001 are mapped through ninety-five documented control mappings; and the patent filings are on file at the USPTO. Catalog inclusion and framework mapping are inspectable facts; neither constitutes certification, endorsement, or affiliation by any organization that maintains those frameworks.


8. What This Is Not

This is not a vendor announcement. There is no product launching at the end of this paper, no SKU to procure, no demo to schedule. The work is structural, not commercial.

This is not a request for industry consensus. Standards bodies are not voted into existence by the field they govern. They are built by practitioners who do the work, document the architecture, and invite the field to engage with the result. Consensus follows; it does not precede.

This is not the proposal of a proprietary scoring scheme. A standards body that is also the only vendor measured against itself is not a standards body. The architecture described requires open criteria, separable measurement, federated authority, and verifiable evidence the governed system could not have produced about itself. Those properties are non-optional.

This is the claim that the work exists, the architecture is published, and the field is invited.


A score is a different object than the governance artifacts most organizations already hold, and the difference is one legal and governance professionals will recognize immediately: it is attributable. A policy says what should happen. A maturity assessment says how mature the program is. Neither is a claim that a specific autonomous action stayed inside a specific authorization, produced by a party with no stake in the answer, that someone could rely on and be wrong about. That last object, falsifiable, attributable, relied-upon, with consequences if it was wrong, is the one the field has not built, and it is the one that matters most to the functions that carry liability.

This is why governance, compliance, and legal leaders are not downstream consumers of this framework. They are the functions for whom the distinction is load-bearing.

Consider where an independent conformance measurement actually lands in an organization that deploys autonomous systems:

  • Audit and assurance. Today, an audit of autonomous governance examines whether controls exist and whether the organization documented them. It cannot examine whether the controls held at the moment the system acted, because the evidence is produced by the system being audited. An independent measurement layer, with attestation produced outside the governed system, gives an auditor an evidence object that does not depend on the auditee narrating its own compliance. That changes what an audit can assert.

  • Procurement and contracts. "The vendor governs its agents" is not a contractable requirement; it is a marketing claim. "The vendor's autonomous systems are measured for conformance against a published, independent standard" is a requirement that can be written into an RFP, a master services agreement, and a service-level term. A standard turns a vendor's assurance into a counterparty obligation. Procurement and legal are the functions that write those obligations, and the vocabulary they adopt becomes the default the market is held to.

  • Regulatory and liability posture. As the EU AI Act, NIST AI RMF, and the physical-autonomy regulators (FAA, NERC, and the critical-infrastructure sector authorities) move from principles toward enforceable rules, the question of what counts as adequate governance of an autonomous action becomes a question of liability. The functions answering it, general counsel, chief compliance officers, chief AI governance officers, need an architectural anchor that survives the next round of implementing rules. A framework that maps to the recognized standards and produces attributable evidence is that anchor.

  • Mergers, acquisitions, and investment diligence. When an acquirer or investor buys a company, it inherits that company's autonomous systems and their liabilities. Today there is no way to price that risk. Financial diligence examines the books and legal diligence examines the contracts, but nothing examines whether the target's AI agents stayed inside their authority, because the only evidence is produced by the systems being acquired. The exposure is sharpest in regulated sectors: a healthcare company whose autonomous systems touch patient data, or a financial firm whose agents move money, carries an inherited governance liability the buyer cannot currently measure. An independent conformance measurement is to AI risk what a credit score is to a borrower, the thing a counterparty checks before the transaction closes. A target with no independent measure of its autonomous governance is not a target with good governance; it is a target whose governance cannot be priced.

  • Insurance and insurability. As autonomous systems take consequential action, the question of who bears the loss when one acts outside its authority moves to the insurer. But a risk cannot be priced if it cannot be measured, and today the governance of an autonomous system is not independently measurable. Cyber and professional-liability underwriters already struggle to price AI exposure; an autonomous-AI loss with no independent record of whether the system stayed inside its authority is a risk an underwriter cannot bound. This is the capital-allocator dynamic from the opening of this paper, now with a name. The independent score that became inevitable for FICO, for Moody's, and for UL became inevitable the moment a party that prices risk began allocating capital on it; for autonomous-AI governance, that party is the underwriter. Insurers would not underwrite what UL had not listed, and listing became the de facto requirement. An independent conformance measure is what makes autonomous-AI risk underwritable at all, and once underwriting prices on it, the measure stops being optional. The strongest forcing function for a standard is not a regulation. It is a premium.

The point is not that this framework is a compliance product. It is the opposite: most of what is sold as autonomous-system governance is advisory by design, and advice does not survive a deposition. The measurement layer is the thing that does, because it is the only layer that produces evidence the governed system could not have produced about itself. For the professions whose work is reasoning about attributable claims, liability, and the consequences of relying on them, that is not a feature. It is the whole question.

10. The Invitation

If the gap you recognize is the one between what your organization deploys autonomously and what it can govern at the moment of deployment, the architecture is in the public record. There are three paths, depending on where you sit.

For credentialed economic buyers (CISOs, Chief AI Governance Officers, Critical Infrastructure Directors, Chief Compliance Officers, VPs of AI Governance), the institutional brief is available on signed request, covering the architecture, the four-property test, and the operational evidence at the disclosure depth appropriate to a buyer evaluating against procurement criteria.

For analysts, regulators, journalists, and researchers, the published material at the USPTO, the federal references in the NIST OLIR catalog, and the framework crosswalks are open for review without gate. Citation is welcome. Disagreement is welcome.

For the standards bodies, certifying authorities, federal agencies, and trade associations who will eventually operate the federated-authority structure, the conversation is open. The architecture was built to be extended, contested, and adopted. It was not built to be owned.

This paper is not asking for anything. It names what already exists, identifies what the field has not yet built, and leaves the door open for the actors who will build it together.


Frequently Asked Questions

What are the Five Laws of AI Governance? The Five Laws state what a governed autonomous system may never do to itself: it may not be the measure of its own conduct (Independent Measure), may not be governed only in retrospect (Binding at Dispatch), may not be the sole witness to its own compliance (External Witness), may not certify itself (Separated Authority), and may not outlive its own off-switch (Revocable Authority). They are the normative core of the framework and apply across both digital agents and physical autonomous systems.

Why does the field need an independent measurement standard for AI governance? Operators who build autonomous systems cannot credibly grade their own conformance, and coalitions of competitors collapse into neutral taxonomy when asked to rank their own members. That leaves the role of an independent measurement authority structurally empty, for the same reason FICO is independent of lenders and UL is independent of manufacturers. The absence is structural, not accidental, and it leaves the risk not merely unmeasured but unpriced.

Why this practitioner, and not NIST, ISO, or one of the Big Four? A standards body does not need to be operated by the practitioner who builds the work first, and over time it will not be. The work is published in the public record at the USPTO, mapped to recognized frameworks, and submitted through federal standards processes. Other actors — NIST, ISO, federal agencies, certifying bodies, trade associations — can engage with it, extend it, contest it, and adopt it. The practitioner who does the work first does not own the standard. The standard is what the field eventually agrees the work is.

Is this a proprietary scoring scheme? No. A standards body that is also the only vendor measured against itself is not a standards body. The framework is offered as a federated standard: open criteria, separable measurement, federated authority, and verifiable evidence chains the governed system could not have produced about itself. Those properties are non-optional.

How is a standard different from an AI governance framework? A framework tells an organization what to do. This standard measures whether an autonomous system actually stayed inside its authorization at the moment of action, judged by something that did not also produce the action. The framework layer produces documents and process; the measurement layer produces an independent, comparable score of conformance.

Does the framework cover physical autonomous systems, or only software agents? Both. The Five Laws and the measurement layer are domain-independent. They apply to a software agent that invokes tools, moves data, and changes state, and to a physical system that holds airspace, drives a vehicle, or acts on a sensor field. The structural problem is the same: bind what the autonomous actor is allowed to do, at the moment it acts, with evidence the actor could not have produced about itself.

Does a high score mean the system is safe? No. The framework governs conformance to an authorization, not the content of the authorization. A system can be perfectly governed into doing something reckless if the grant itself was reckless. "Compliant" is not "safe." It is the precondition for reasoning about safety, because you cannot reason about the safety of a system you cannot bind.


Definitions

The terms this paper relies on, stated for reuse. Each defines what the term means, not how any instrument computes it.

Runtime governance. The enforcement that binds what an autonomous system is allowed to do at the moment of action, by infrastructure architecturally separated from the system itself; distinct from process governance (how an organization adopts autonomy) and accountability governance (who is responsible after the fact).

Conformance. Whether a specific autonomous action stayed inside its specific authorization: the relationship between what a system was authorized to do and what it actually did.

Conformance score. An independent measure of that relationship, produced by a party that does not also build or sell the system being measured.

The third layer. Runtime governance, as the layer beneath process and accountability governance; the only layer that produces evidence of an action that the governed system could not have produced about itself.

Independent measurement. Measurement of a system's governance produced by a party structurally separate from both the operators that build the system and the bodies that write the rules; that separation is the property that makes the measurement trustworthy.

Federated authority. A standards structure in which no single vendor, including the practitioner who built the work first, is the sole measurement authority; criteria are published and certifying parties are separable.

The empty seat. The structurally vacant role of an independent measurement authority for autonomous-system governance, vacant because operators cannot credibly grade themselves and coalitions of competitors retreat to neutral taxonomy rather than rank their own members.


Pharns Genece, AAM Cyber [email protected] · linkedin.com/in/pharnsgenece


Version 1.0 · June 2026

Version history. v1.0 (2026-06-02): Initial publication.

Recommended citation: Pharns Genece, AQ Score: A Foundational Framework for Independent Measurement of Autonomous System Governance, Version 1.0 (AAM Cyber, June 2026), https://aamcyber.com/standard/aq-score-foundational-framework.

Rights. © 2026 Pharns Genece / AAM Cyber. This paper may be quoted, cited, and shared with attribution. AQ Score™ is a trademark of its owner. The criteria described are offered as an open framework for the field to engage with, extend, contest, and adopt; the document text remains the work of its author.


References



  1. Max McGuinness, Mikaela Grace, Jiri De Jonghe, Jake Eaton, and Abel Ribbink, "How we contain Claude across products," Anthropic engineering blog, May 25, 2026. https://www.anthropic.com/engineering/how-we-contain-claude 

  2. RACI is an organizational accountability matrix (Responsible, Accountable, Consulted, Informed) used to assign roles when multiple parties touch a decision or process. 

  3. Public incident reports of AI agent misexecution events 2024-2026, including the widely reported July 2025 production-database deletion incident (agent given write access and an open instruction set deleted live production data despite explicit instructions to the contrary), and multiple subsequent reports of agents executing unauthorized credential provisioning, file deletions, and infrastructure changes. 

  4. Runtime governance binds what an autonomous system is allowed to do at the moment of action, by infrastructure architecturally separated from the system being governed. It is distinct from process governance (how an organization adopts autonomous systems) and accountability governance (who is responsible after autonomous action). The Three-Layer Diagnosis elaborates the distinction. 

  5. As of publication: five provisional patent applications filed at the USPTO covering the dual-domain governed-execution-and-sensing architecture (334 total claims). The architecture is mapped against 17 published regulatory, standards, and framework instruments through 95 documented compliance control mappings. Three SDOS Concept Crosswalks are cataloged with the NIST OLIR Program as Draft Informative References in their public review windows: Reference ID 212 (AI RMF 1.0), Reference ID 215 (CSF 2.0), and Reference ID 217 (SP 800-53 Rev 5.2.0). Catalog inclusion and framework mapping are inspectable public-record facts; neither constitutes certification by, endorsement from, or affiliation with NIST or any organization that maintains the referenced frameworks. Organizations should conduct independent assessment to determine whether the controls satisfy their specific requirements. Full disclosure of specific patent claim language is deferred until the non-provisional filing window; this footnote enumerates only what is verifiable from public USPTO records and the NIST OLIR catalog. 

  6. CAIGO (Chief AI Governance Officer) is a coined role term naming the institutional position that does not yet formally exist at most enterprises. The Chief AI Governance Officer is the executive responsible for the governance of autonomous AI agent action across an organization, distinct from the Chief AI Officer (CAIO), who manages AI capability and adoption. The term is common-law-asserted by the author since April 2026; USPTO trademark intent-to-use sequencing is pending. 

  7. On evaluation-aware deception in frontier models, see Meinke et al., "Frontier Models are Capable of In-context Scheming," Apollo Research, December 2024, arXiv:2412.04984. For the developer-side acknowledgment of the same phenomenon, see Anthropic, System Card: Claude Sonnet 4.5, September 2025, §§7.6.4-7.6.5 (evaluation awareness), available at https://www.anthropic.com/claude-sonnet-4-5-system-card. Anthropic's evaluators reported that the model "was able to recognize many of our alignment evaluation environments as being tests of some kind, and would generally behave unusually well after making this observation": a finding that complicates the interpretation of all pre-deployment safety scores.