THE 100% RELIABILITY TARGETS FALLACY
A new CTO at a SaaS company, one that was building toward a Series B and trying to demonstrate enterprise-grade reliability, put it to me directly in a standup. “Our customers expect the system to never go down. Our SLO has to reflect that. We’re targeting 100%.” I could see the faces of a couple of engineers on the call change. None of them spoke. Within six months, their SRE program had quietly stopped using SLOs at all. The target everyone knew was meaningless had poisoned the targets that weren’t.
THE AI BUILD VS BUY DECISION
The AI build-vs-buy decision gets answered backward by most organizations. Technology choice first, organizational position second, if it gets asked at all. A mid-size edtech I worked with, a capable team building adaptive learning tools for university programs, had already approved budget for a custom model before anyone had asked whether they could evaluate the model’s output well enough to know if it was working. The build wasn’t the wrong answer. It was an answer to a question they hadn’t asked.
WHEN AI MAKES DECISIONS ABOUT LEARNERS
Consider an edtech company building adaptive learning tools for university programs, rolling them out across partner institutions as a personalization initiative. A few months in, a faculty coordinator notices something off in the placement data. International students and non-native English speakers are being routed into remedial tracks at roughly 2.4 times the rate of domestic students. The model wasn’t designed to discriminate. It had learned to treat reading speed and vocabulary range as a proxy for content mastery, and that proxy collapsed for students whose prior education had been in a different language. The “personalization” was, in practice, a slow funnel away from standard coursework.
AI GOVERNANCE WITH NO GOVERNANCE TEAM
Most AI governance writing is sized for a 5,000-person bank with a Chief AI Officer, a model risk team, and a board committee that meets quarterly. If your company is small enough that nobody has those titles yet, and you shipped your first AI feature last quarter, the recommended posture is unreachable. The gap between what’s described and what’s achievable discourages anyone from starting, so most teams don’t.
MONITORING AT SCALE
Metrics costs don’t scale with capacity. They scale with cardinality. A platform that grows from 50 to 500 services can see its bill grow by an order of magnitude while the underlying infrastructure has only tripled. Nobody decides this. Every team makes individually reasonable choices: a label for a customer ID here, a dimension for a feature flag there, a new service instrumented with the same conventions as the old one. The individual decisions look fine. The product of them looks like a billing shock.
THE ERROR BUDGET MINDSET
“I don’t think it’s ever caused us to slow shipping. We’ve raised the SLO several times when we couldn’t hit it.” That’s how a platform director at a large SaaS company answered when I asked when a budget breach had last changed what her team did. The dashboards were tidy. The alerts were tuned. On paper, a textbook implementation.
SHAPING ARCHITECTURE WITH SLOS
When did an SLO last shape a roadmap decision at your company? Not appear in a quarterly review, not show up on a board deck, not get cited in an all-hands. Shape what got built, what got deferred, or what got rebuilt. If the answer takes a long pause, the SLO program is probably a dashboard.