AI Governance With No Governance Team
Most AI governance writing is sized for a 5,000-person bank with a Chief AI Officer, a model risk team, and a board committee that meets quarterly. If your company is small enough that nobody has those titles yet, and you shipped your first AI feature last quarter, the recommended posture is unreachable. The gap between what’s described and what’s achievable discourages anyone from starting, so most teams don’t.
The smaller version needs four artifacts and named owners across functions you already have. The point isn’t to look governed. The point is to make explicit decisions before implicit decisions become the only ones you’ve got, before the governance conversation becomes a post-incident review.
which AI is allowed
to do what] Data[Data classification
what flows where] Eval[Evaluation rubric
what proves a model
is fit to ship] Inc[Incident playbook
what happens when
something goes wrong] Use -->|product| Decisions[Explicit decisions
with named owners] Data -->|security| Decisions Eval -->|eng + function| Decisions Inc -->|eng + legal| Decisions style Decisions fill:#eaf2fa style Use fill:#fff5e0 style Data fill:#fff5e0 style Eval fill:#fff5e0 style Inc fill:#fff5e0
Figure 1. The four lightweight artifacts and where the ownership lands. Each one is two pages or less. Together they cover the decisions that, left implicit, become the questions you can’t answer in the first incident review.
Use-case decisions: which AI is allowed to do what
Before you decide on tools, you need a written answer to which use cases are allowed without review, which require review, and which are off-limits. Without one, every product manager makes the call individually, and the calls don’t agree.
The lightweight version is a one-to-two page policy with three categories. Pre-approved covers low-risk uses: internal productivity, drafting, summarization of internal content. Conditional covers uses that need a lightweight review: customer-facing content, anything touching regulated data, anything making consequential decisions. Prohibited covers uses that are off-strategy or high-risk.
A mid-size edtech I worked with, a team moving quickly on AI that understood the stakes of getting it wrong with a student-facing product, categorized student-facing content generation as conditional and internal productivity tools as pre-approved. The policy fit on a single page. The hardest part wasn’t writing it. The hardest part was getting product leadership to commit to the categories, because every category boundary was a future “no” they’d have to deliver to a PM with a roadmap. The org only reached a stable policy once the executive sponsor had agreed to back those calls publicly.
The use-case policy is the artifact that prevents your AI portfolio from being shaped by whichever PM had the best demo this quarter. Without it, you’re not deciding what your AI strategy is. You’re noticing what it became.
Data decisions: what flows where
Which data can be used by which AI systems, with what protections, under what consent. Without a written answer, your engineers will use whichever data is convenient, your vendors will see whichever data the API call sends, and you’ll learn what your actual data posture was during the first incident.
The lightweight version is a data classification table mapped to AI usage. Public data: unrestricted. Internal data: a contracted vendor with a data processing agreement. Confidential data: never crosses to external services. Personally identifiable: explicit consent required, or it doesn’t get used at all.
A SaaS company I worked with had a written rule that sensitive customer data never crossed to third-party AI services. The policy was a few pages, written by a security director and a legal lead in a couple of weeks. It saved many months of back-and-forth on every new vendor evaluation, because the answer to “can we send X to Y” was already in writing.
The classification only works if your data infrastructure can enforce it. If your warehouse can’t distinguish internal from confidential at query time, the policy is aspirational. This ties back to what AI adoption requires from your data infrastructure: a classification table is only as useful as the lineage and access controls that make it real.
Evaluation decisions: what proves a model is fit to ship
The decision nobody wants to make explicit, because making it explicit constrains shipping. Without it, the pattern that emerges is shipping based on demos. The demo works. The production failure mode you didn’t test for arrives three weeks later.
The lightweight version is a minimum bar: test against an evaluation set that includes adversarial cases, run a bias check on the populations the system will serve, document observed failure modes, and get sign-off from someone empowered to say no who isn’t the person who built the feature.
A pharmaceutical research team I worked with required accuracy testing on representative demographic subgroups before any model went into production, with a documented failure mode log that the lead researcher signed off on. The accuracy bars weren’t research-grade. They were just higher than zero and the same bars every time, which made the comparison from one model to the next honest. The researcher’s sign-off was the enforcement mechanism. Without it, the bar would have moved to whatever this quarter’s roadmap pressure required.
The hardest part is the sign-off step. The person empowered to say no has to be willing to say no under roadmap pressure, and to be supported by leadership when they do. Getting that backing in writing, before the first contested deployment, is the governance work that can’t be captured in a YAML file.
Incident decisions: what happens when something goes wrong
Most organizations only make this decision after the first public incident, when the response is shaped by panic and the next-quarter PR strategy.
The lightweight version is a two-page playbook: who decides to roll back, who communicates with affected users, who handles the regulatory or contractual fallout, what gets logged for the post-incident review.
A SaaS company I worked with had a two-page playbook where the on-call engineer had authority to disable any AI feature without escalation. A communications template was pre-drafted for customer notification. A short list of who-calls-whom was pinned in the incident channel. None of it was elegant. All of it existed before they needed it, which is the only thing that mattered the first time they needed it.
The four artifacts side by side
| Artifact | Core question | Owner | Reviewers | Update cadence |
|---|---|---|---|---|
| Use-case policy | What is AI allowed to do? | Product leadership | Security, legal | Quarterly |
| Data classification | What data flows where? | Security | Legal, engineering | When data posture changes |
| Evaluation rubric | What proves fit to ship? | Engineering | Function owner (product, clinical, legal) | Per deployment type |
| Incident playbook | What happens when it fails? | Engineering | Legal, comms | After every AI incident |
A governance artifact template
Every AI feature in production has a corresponding record in this format, checked into the same repository as the code.
# governance/models/checkout-intent-classifier.yaml
# Model governance record for the checkout intent classifier.
# Required before promotion to production.
# Owner: platform-ml-team
apiVersion: governance/v1
kind: ModelRecord
metadata:
name: checkout-intent-classifier
version: "2.1.0"
owner: platform-ml-team
use_case_category: conditional # pre-approved | conditional | prohibited
last_reviewed: "2025-11-15"
next_review: "2026-02-15"
model:
type: fine-tuned-classifier
base_model: internal-bert-v3
training_data_classification: internal # public | internal | confidential | pii
data_flows_to_external: false
evaluation:
accuracy_overall: 0.94
accuracy_subgroups:
- group: mobile_users
accuracy: 0.93
- group: new_accounts
accuracy: 0.91
- group: international_users
accuracy: 0.89
adversarial_test_passed: true
bias_check_completed: true
failure_modes_documented: true
fitness_signoff:
name: "Jane Smith"
role: "Head of Product, Checkout"
date: "2025-11-14"
risk_register:
- id: RISK-001
description: >
Model may misclassify high-value orders as low-intent,
causing friction in the checkout flow for edge cases.
likelihood: low
impact: medium
mitigation: >
Confidence threshold set at 0.85; low-confidence predictions
fall back to rule-based path. Monitored via checkout_intent_fallback_rate.
owner: platform-ml-team
incident:
rollback_authority: on-call-engineer
disable_flag: "feature.checkout_intent_classifier"
customer_notify_template: "templates/ai-incident-checkout.md"
post_incident_log: "incidents/ai/"
When the lightweight version stops being enough
Five signals tell you it’s time to formalize. Review volume exceeds what existing functions can absorb without becoming a bottleneck. A regulatory or customer requirement specifies governance with named, dedicated roles. An incident demonstrates the cost of the current approach. Your risk profile changes: new market, new data sensitivity, a new kind of decision being automated. Cross-functional review starts producing inconsistent calls because nobody is accountable for consistency.
Hit two or more and the right move is to consolidate ownership in a dedicated function. The lightweight version was always temporary. Its job was to get you safely to the moment when the formal version pays off.
in place]) --> Q1{Review volume
sustainable?} Q1 -->|Yes| Q2{Regulatory
requirement?} Q1 -->|No, bottleneck| Formalize[Consolidate into
dedicated function] Q2 -->|Yes| Formalize Q2 -->|No| Q3{Incident revealed
coverage gap?} Q3 -->|Yes| Formalize Q3 -->|No| Q4{Risk profile
changed?} Q4 -->|Yes| Formalize Q4 -->|No| Continue[Continue lightweight
review quarterly] style Formalize fill:#fff5e0 style Continue fill:#eaf2fa style Start fill:#eaf2fa
Figure 2. The decision to formalize governance is an organizational threshold, not a calendar event. Two or more signals arriving together is the threshold most organizations can act on before the cost becomes visible.
The organizations that do this well end up with two pages on use cases, a data classification table, an evaluation rubric, and an incident playbook. None of those documents have the words “AI governance” in the title. What they have is clarity about who decides what, written down somewhere colleagues can find it, with owners who are held to it when the first roadmap pressure arrives.
That’s how implicit governance becomes explicit. Not through a big initiative, but through four short documents and the habit of keeping them current. The governance habit, once established, tends to outlast the people who started it. That durability is the thing worth building toward, even if the first version of each document is two pages and imperfect.