What a Cloud Architecture Review Should Produce

Thu, Oct 26, 2023
10-minute read

Most cloud architecture reviews produce a deck that gets filed. The deck is thorough. Findings, recommendations, a maturity heatmap, a risk matrix, an executive summary the sponsor reads once. None of the recommendations have owners. None have decisions attached. The roadmap gets quietly shelved when the engineering team can’t agree on which third of it to start.

The deck isn’t the failure. The framing is. Findings are not the same shape as decisions. A useful review produces something else entirely: a small number of decisions framed for the people who can make them, with enough specificity that next steps are obvious and enough honesty that disagreement is possible. The gap between the two is the gap between a review that changes things and a review that gets filed.

flowchart LR subgraph Filed["Filed-and-forgotten review"] F1[Findings] F2[Recommendations] F3[Heatmap] F4[No named decisions] F5[No named owners] end subgraph Useful["Useful review"] U1[Specific decisions
for specific people] U2[Options and
trade-offs] U3[Sequenced priorities] U4[Named owners,
rough timeframes] end style F4 fill:#fdd style F5 fill:#fdd style U1 fill:#eaf2fa style U3 fill:#eaf2fa

Figure 1. The two artifacts a review can produce. Both contain the same information. Only one of them is shaped to drive action, because the audience for the right column is the person whose calendar can move next quarter.

What most reviews produce

The default deck has four predictable parts, and they build toward the same outcome.

A maturity assessment against a framework opens it. The framework is usually generic, the score is usually middle-of-the-road, and the team comes away knowing they’re a 2.4 against a 5-point scale on a dozen dimensions. The score is a measurement, not a decision, and nobody has been asked to make one.

A heatmap of risks follows, mostly red. Striking and decisive-looking. The decisions implied by it are not specified. Red doesn’t tell you what to do. Red tells you to be uncomfortable.

A long list of recommendations, mostly generic. “Implement defense in depth.” “Adopt zero trust.” “Improve observability.” Each recommendation is correct. Each one would be correct in any architecture review of any cloud environment. None of them are specific to your organization, your team, your tradeoffs, or your next quarter.

Then the deck gets filed with the executive sponsor and forgotten within the quarter. The sponsor reads the executive summary. The deck circulates to two or three people who skim it. The recommendations get folded into a vague “platform investment” line item that competes with everything else for the next budget cycle. By the next review, most of the recommendations are still open, and the new firm starts again.

This is the artifact most teams pay for, and not because the reviewers are bad. The deck is a documentation artifact. The team needs a decisions artifact. Different shape, different audience.

What useful reviews produce

A useful review delivers a small number of decisions framed for specific decision-makers, each with clear options and trade-offs. Three to seven decisions, not seventy findings. Each decision has a name (the person who can make it), a frame (what they’re choosing between), and a stake (what changes if they choose differently).

A prioritized sequence comes next: not “everything is urgent,” but “this first, here’s why, here’s what becomes possible after.” The order matters because most architectural work has dependencies, and the team that tries to do all of it at once does none of it well. Sequence is where the experience of the reviewer earns its fee. Anyone can list problems. Sequencing them is the work.

An honest read of what your organization can actually do, given current capacity. Most reviews assume infinite engineering attention. Useful reviews assume the team you have, with the priorities they have, and they tell you what’s deliverable inside that constraint. “You cannot do this in the next two quarters” is sometimes the most useful sentence in the report.

Specific next actions, with named owners and rough timeframes, close the loop. The deliverable for each decision isn’t “the team will adopt…” It’s “by Q1 end, the platform team will have shipped X, the security team will have signed off on Y, and the architecture board will have decided whether to fund Z.”

A research computing lead I worked with, one trying to get a stalled infrastructure modernization back into motion, commissioned a review with that explicit ask: “give me three decisions I need to make in the next 90 days.” The reviewer pushed back on the framing once, then accepted it. The review produced exactly that. Two of the three decisions were made within 60 days. The third was deferred with a written reason and a date for revisiting. The engagement led to follow-on work because the output was actionable, not because the firm was good at selling more.

The framing that makes the difference

“Recommendation” versus “decision to be made.” The second is harder to write and more useful to receive. A recommendation says what should happen. A decision says who has to choose, between what options, with what consequence. “We recommend implementing centralized logging” is a recommendation. “The platform team needs to choose between extending the existing logging stack (faster, more debt) and migrating to the new vendor (slower, less debt) by end of Q2; the cost differential is several hundred thousand dollars annually and the migration delays the SLO program by a quarter” is a decision. The second sentence is much harder to write because the reviewer has to understand the tradeoff and stake a position. That’s the work.

Tailoring each output to its audience matters too. Technical leads want the architectural reasoning. Finance wants the cost frame. Leadership wants the strategic narrative. The same set of decisions can be framed three different ways without losing fidelity, and a review that produces all three framings is much more likely to drive action than one that produces a single document and asks each audience to translate.

Distinguishing what’s an architectural problem from what’s an organizational one is where most reviews fail. They have different fixes, and conflating them is one of the most common errors. “The team has too much technical debt” might be an architectural finding, but the fix isn’t architectural: nobody has been given time to retire it, and that’s a leadership decision about capacity. Calling out the difference is more useful than producing a recommendation that pretends the architectural fix is sufficient.

The fourth framing is naming the decisions your organization is implicitly making by deferring. The review’s job is to surface the decisions you’ve been postponing and make them legible. “By choosing not to choose, you’re committing to the current path with these specific costs” is a sentence that often unblocks the conversation. The team that’s been deferring isn’t deferring on purpose. They’re deferring because no one has framed the deferral as a choice with consequences.

Review element	Filed-deck version	Useful version
Primary output	Findings and heatmaps	Named decisions with options
Audience	Whoever reads the deck	Specific decision-makers
Recommendations	Generic (“adopt zero trust”)	Organization-specific, sequenced
Owner assignment	“The platform team”	A named person with standing
Follow-up mechanism	Roadmap line item	Named check-in with a date
Measure of success	Deck delivered	Decision made or explicitly deferred

The conversations a review should enable

Between technical leadership and engineering: the review’s value is giving the conversation a shared external reference, so it doesn’t degenerate into the same internal debate that’s been running for two years. Without that reference, each side continues to argue from inside their own assumptions.

Between technical leadership and finance: the review can put numbers on tradeoffs that have been argued in vibes for months. “We can keep deferring this for $X per year in operational drag, or we can fix it for $Y this year.” Finance can hear that conversation. They cannot hear “we have technical debt.”

Between technical leadership and the board: the board wants to know about position, not posture. A review that surfaces strategic questions (regional expansion, vendor concentration, regulatory exposure) is much more useful at the board level than one that reports a maturity score.

Within engineering: some of what the team is doing is necessary, and some of it is muscle memory from the previous architecture, the previous CTO, or the previous incident. The review’s outsider perspective is most useful when it can call out which is which, gently and specifically. The same outsider perspective surfaces the architecture debt that’s been accumulating in different shapes without the team having to argue itself into seeing it.

flowchart TD Start([Review delivered]) --> D1{Decisions named?} D1 -->|No| Fail1[Deck gets filed
nothing moves] D1 -->|Yes| D2{Owners assigned?} D2 -->|No| Fail2[Items drift
no accountability] D2 -->|Yes| D3{Check-in
scheduled?} D3 -->|No| Fail3[Momentum fades
within 60 days] D3 -->|Yes| D4{Items moving
within 30 days?} D4 -->|No| Retry[Re-decide or
explicitly park] D4 -->|Yes| Win[Review succeeds] style Fail1 fill:#fdd style Fail2 fill:#fdd style Fail3 fill:#fdd style Retry fill:#fff5e0 style Win fill:#eaf2fa

Figure 2. The decision path that separates reviews that change things from reviews that get filed. Each fork is a place where most engagements lose momentum, because the output wasn’t shaped to survive the transition from delivery to action.

What good follow-through looks like

Decisions should be moving within weeks, not quarters. If nothing has moved within a month of delivery, the review failed regardless of how good the deck is. The deliverable for the review is the decision, not the deck.

Items need named owners. “The platform team” is not an owner. A specific person is. The named owner has the standing and the calendar capacity to move the item, and can be asked at the next checkpoint whether the item moved. “The platform team” cannot be asked that question.

Quarterly check-ins, with consequences for items that haven’t moved, keep the work alive. The check-in is a forum where stalled items get re-decided or explicitly parked. Without it, items quietly fall off, and the next review starts from the same place.

The next review is an update against the decisions made, not a redo. If the second engagement reproduces the findings of the first, neither engagement was useful. The reviewer’s job in the follow-up is to assess what moved, what didn’t, and why.

A SaaS that had paid for a thick deck-style review months earlier, with no recommendations moving, restructured their second engagement around this approach. The brief was three decisions, named owners, 90-day horizon. The new firm had to push back on their own template and produce a different artifact than they were used to producing. They did. Two of the three decisions moved. The third produced a follow-on conversation about organizational capacity that the previous review had failed to surface, and that conversation was the more important outcome.

Here is the review template that anchored that second engagement:

# architecture-review-output.yaml
# One entry per decision. Three to seven decisions per review.

decisions:
  - id: decision-001
    title: "Centralized logging stack: extend or migrate"
    owner: "Sarah Chen, Platform Engineering Lead"
    due_date: "2023-12-31"
    options:
      - label: Extend existing stack
        cost_estimate: "$40k one-time"
        annual_ops_cost: "$180k"
        tradeoff: "Faster. Adds 18 months of technical debt."
      - label: Migrate to new vendor
        cost_estimate: "$120k one-time"
        annual_ops_cost: "$90k"
        tradeoff: "Delays SLO program by one quarter."
    recommended_option: "Migrate"
    rationale: >
      Operational drag from extending now exceeds migration cost
      within 24 months. SLO program delay is recoverable.
    status: "pending"
    deferred_reason: null
    deferred_revisit_date: null

  - id: decision-002
    title: "API gateway consolidation"
    owner: "Marcus Okafor, Architecture Board"
    due_date: "2024-01-15"
    options:
      - label: Consolidate to single gateway
        cost_estimate: "$60k migration"
        annual_ops_cost: "$30k"
        tradeoff: "Simpler. Requires 6-week migration sprint."
      - label: Keep parallel gateways
        cost_estimate: "$0"
        annual_ops_cost: "$95k"
        tradeoff: "Ongoing split ownership. Two runbooks."
    recommended_option: "Consolidate"
    rationale: >
      Split ownership has already caused two incidents this year.
      Migration cost is recovered in under 8 months.
    status: "pending"
    deferred_reason: null
    deferred_revisit_date: null

# Review metadata
review:
  delivered: "2023-10-26"
  sponsor: "CTO"
  first_checkin: "2023-11-26"
  quarterly_review: "2024-01-15"

A review that produces a deck and no decisions has failed, regardless of how good the deck is. The product of a review is the decisions it enables and the actions that follow. Everything else is artifact.

The reviews that change things tend to be commissioned differently. The sponsor specified the decisions they needed to make, and the audiences they needed to reach, before the reviewer touched a slide. The review shaped to that brief from the start produces a different artifact than the one shaped to “tell us what’s wrong with our architecture,” and the difference almost always shows up in whether anything changed after the deck was delivered. What I still don’t have a clean answer to is this: when the sponsor doesn’t know which decisions to name upfront, who does the work of naming them, and how do you price that honestly?

cloud-architecture architecture strategy