The AI Build vs Buy Decision

The AI build-vs-buy decision gets answered backward by most organizations. Technology choice first, organizational position second, if it gets asked at all. A mid-size edtech I worked with, a capable team building adaptive learning tools for university programs, had already approved budget for a custom model before anyone had asked whether they could evaluate the model’s output well enough to know if it was working. The build wasn’t the wrong answer. It was an answer to a question they hadn’t asked.

What’s your actual position relative to this capability? Build vs buy is the answer that emerges when that question is answered honestly. Reverse the order and you’ll get the technology you can afford and a position that doesn’t fit it.

flowchart TB Start[AI capability needed] --> Pos["Position questions:
data, risk,
horizon, capability"] Pos --> Frame[Decision frame
becomes legible] Frame --> Build[Build path] Frame --> Buy[Buy path] Frame --> Hybrid[Hybrid path
most common] style Pos fill:#fff5e0

Figure 1. The position questions sit between the need and the decision. Most organizations skip the middle box and go straight from need to vendor list, which is why the decision they make rarely fits the position they’re in.

The vocabulary problem

“Build” and “buy” both mean less than they used to. Most “building” in AI is fine-tuning a foundation model or orchestrating an evaluation pipeline. Few teams are training models from scratch, and the ones doing it usually shouldn’t be. Most “buying” includes deep integration, custom prompts, evaluation infrastructure, and ongoing operating costs that look a lot like building.

The TCO math is fundamentally different from traditional software because three cost surfaces are new. Compute is the obvious one. Data is the second: cleaning, labeling, version control, lineage. Evaluation is the third, and the one teams undercount most. A vendor model that hallucinates 4% of the time on your specific data is fine until the day it isn’t, and you can’t tell which day that is without infrastructure to measure it.

When the words don’t carry their old meaning, the framing is blurry. Teams sign up for “build” and discover they bought a vendor’s API plus six months of custom orchestration. They sign up for “buy” and discover they own an integration no vendor will support. The framing distorts the decision before the decision is made.

Before the technology question

Before build vs buy becomes legible, there are four questions worth answering. None of them are about technology.

Data. What do you have that’s genuinely distinctive? Most organizations overestimate this. The fact that your transactional data is yours doesn’t make it strategic. That edtech client had genuinely distinctive data: thousands of annotated student learning paths no foundation model had seen. That’s the rare case. Most companies have data structurally similar to what the vendors already trained on, and the distinctiveness of “our data is ours” is mostly a story leadership tells itself.

Risk. What can you tolerate around accuracy, hallucination, drift, and explainability? A B2B internal tool can tolerate a lot. A clinical decision-support system can tolerate almost none. The honest answer is what happens when the model is wrong, and who pays the cost. Not a score.

Horizon. What time scale are you committing to? Two quarters of a strategic bet looks different from an indefinite operating commitment. Vendors discount for multi-year contracts. Custom builds make economic sense at indefinite horizons and rarely at shorter ones. The horizon question is the one most often postponed until the postponement itself becomes the risk, because committing to it forces budget conversations leadership prefers to defer.

Evaluation capability. Can you evaluate the output of the chosen path well enough to know if it’s working? If not, you’re trusting the vendor in both cases, because a build you can’t evaluate is a vendor relationship with extra steps. The edtech team’s build plan failed this question. They had the data and the horizon. They didn’t have the evaluation capability, and the build path required it more than the buy path did.

flowchart TD D{"Data distinctive?"} D -->|Yes| R{"Risk requires
opacity control?"} D -->|No| Buy[Buy or hybrid path] R -->|Yes| H{"Long-term
horizon?"} R -->|No| Buy H -->|Yes| E{"Evaluation
capability exists?"} H -->|No| Buy E -->|Yes| Build[Build path] E -->|No| Hybrid[Hybrid: buy foundation,
build evaluation layer] style Build fill:#eaf2fa style Buy fill:#fff5e0 style Hybrid fill:#fff5e0

Figure 2. The four position questions as a decision flow. Most organizations that end up on the build path fail one of these gates and don’t know it until six months in. The evaluation capability gate is the one that breaks the most build cases.

The cases that hold up

The pattern in builds that work is consistent. The cases that don’t hold up fail one of the four dimensions in predictable ways.

DimensionBuild caseBuy/hybrid caseCommon failure
DataGenuinely proprietary, unannotated at scale, not replicated by vendorsStructurally similar to vendor training dataConfusing “our data is ours” with “our data is distinctive”
RiskVendor opacity unacceptable; explainability required by contract or regulationVendor accountability contractually sufficient for the workloadAssuming internal build is automatically safer
HorizonIndefinite commitment; capability is core to the productUnder two years, or capability is non-strategicTreating a 12-month experiment as a long-term build decision
EvaluationTeam can operate and evaluate a model in productionEvaluation infrastructure needs building regardless of pathAssuming “buy” means you don’t need evaluation capability

Distinctiveness is the most overclaimed, and also the hardest to call honestly from the inside. A team that’s built its strategy around a capability can’t easily step back from “our data is distinctive.” The team that says “our data is unique” usually means “our data is ours,” which is a different statement and rarely sufficient to justify the build cost.

Operating capability breaks build cases that pass the other three. A client who called me had distinctive data, a clear risk profile, and a long horizon. They started a custom build and discovered six months in that nobody on the team had operated a model in production. The build became a vendor relationship with extra steps, plus six months of internal cost they didn’t recover.

The hybrid most organizations end up with

Buy the foundation. Customize the surface. This is what “build vs buy” means in practice for most organizations, and the framing erases it.

The integration cost is underestimated in both directions. Consider a startup that had been building anomaly detection tooling for years and thought they understood the cost. They bought a vendor model for a new detection use case. The integration and customization work consumed well over a year of engineering time. The post-integration evaluation infrastructure (drift detection, A/B harness, alerting on score distribution shift) cost more than the vendor license. Two years in, the team described what they had as “their model,” and they were right, even though they hadn’t trained any of the underlying weights.

Evaluation infrastructure is the cost that doesn’t go away regardless of which path is chosen. If the model isn’t being evaluated continuously, the org doesn’t know if it’s working. The evaluation program is itself a build commitment, and most “we bought it” decisions assume it away.

The governance question stays open in both cases. A YAML model record helps: something the team can maintain and audit across reviews.

# Model governance record — updated each quarterly review
model: adaptive-learning-recommendations
path: hybrid               # vendor foundation + proprietary fine-tune
owner: ml-platform-team
last_reviewed: 2026-Q1

data:
  distinctive: true
  basis: "annotated student learning paths, proprietary curriculum"
  cold_start_gap: "new schools with <30 sessions have reduced accuracy"

risk:
  hallucination_tolerance: low
  explainability_required: true   # educators must understand recommendations
  known_failure_mode: "overconfident on underrepresented demographics"

evaluation:
  accuracy_reviewed: quarterly
  subgroup_testing: [low_income_schools, ELL_students, rural_districts]
  drift_detection: enabled
  last_incident: none

vendor:
  contract_expires: 2027-Q2
  exit_cost_estimate: 6 months engineering
  accountability_gap: "SLA covers uptime; accuracy not contractually covered"

Bought capabilities still need owners, still need policy, still need an escalation path when they misbehave. The fact that the model came from a vendor doesn’t tell you who’s accountable when it produces a wrong answer that affects a customer.

Reading the decision in the right order

That edtech client reversed their build decision after a few meetings. They went with a vendor-managed evaluation harness, a smaller fine-tuning effort on their proprietary data, and a six-month evaluation period before deciding whether to expand. They saved a year of build time and ended up with an evaluation capability they hadn’t had before. That capability turned out to be the asset, not the model.

The decision I see most often communicated to leadership is the technology choice, not the position. That’s a framing problem upward more than a technology problem. Leadership wants to hear “build or buy.” Practitioners owe them the position questions first, because the technology answer is what falls out of those questions.

Start with your position. The questions don’t always produce a clean answer, and that’s worth sitting with. Sometimes the honest read is that the organization isn’t ready for the build the position seems to require, or the evaluation capability doesn’t exist yet. Neither of those is a failure to decide. Both are more useful than answering the technology question first.

Related Posts