Infrastructure Security Architecture Patterns

Mon, Aug 19, 2024
9-minute read

Security patterns without a threat model are a checklist. The auditor goes home happy. The control has no clear job, and the day a real incident asks the control to do something specific, the team finds out which posture is paper and which is real.

The patterns are mature and well-named. What’s missing is the threat model that says what each pattern is for. “Encrypt everything” passes the audit but tells you nothing about who can decrypt when the data is needed, which is the version of the question an actual incident asks.

flowchart LR Threat["Threat model"] --> Pattern["Pattern selected"] Pattern --> Capability["Operating
capability check"] Capability --> Posture["Real security
posture"] NoThreat["No threat model"] --> Checklist["Pattern as
checklist item"] Checklist --> Paper["Paper posture"] style Posture fill:#eaf2fa style Paper fill:#fdd style NoThreat fill:#fdd

Figure 1. The same pattern can produce real posture or paper posture depending on whether a threat model and an operating capability sit behind it. Most “we do defense in depth” programs are running the lower path without noticing.

The patterns worth knowing

Least privilege at every layer is the pattern that requires constant maintenance, because every new service, role, and integration tries to grant itself more than it needs. The question it answers: what does a compromised credential give an attacker? If you can’t answer that crisply, the pattern isn’t operating.

Segmentation by failure and trust domain is the second, and it’s different from network segmentation. The question isn’t “how do we divide the VPC?” It’s “which workloads should fail or be compromised independently?” The team that segments by VPC because that’s the unit the cloud provides is doing infrastructure layout, not security architecture.

The third pattern, defense in depth, is the one most commonly implemented wrong. The mistake is layering controls that fail the same way. Three controls that all depend on the same identity provider aren’t three layers; they’re one layer with three components.

Encryption everywhere with explicit key ownership reorients the question from “is the data encrypted?” to “who can decrypt?” Encryption with shared keys at the cloud-account level protects you against the cloud provider’s auditors and almost no one else. Logging and detection close the list, but only as preconditions for actual response: logging without retention is paperwork, logging without detection rules is a search corpus nobody searches, and logging without an on-call who reads alerts is a recording nobody listens to.

A SaaS company I audited, one that had passed three consecutive compliance reviews and was proud of it, had the encryption pattern in name only: managed keys for the bulk of services, no rotation policy, no key access audit, no documented owner for the master keys. The compliance audit passed. The keys were effectively shared across the entire production estate. The pattern was present. The capability behind it wasn’t.

The threat model that informs the pattern

The pattern that fits depends on which threat your organization is defending against. Most face three at once, and most programs prioritize as if they faced one.

Threat world	What’s load-bearing	What’s secondary	Common mistake
Credential stuffing	MFA, session lifecycle, JIT access, credential rotation	Network controls	Prioritizing segmentation here is solving last decade’s problem
Supply chain	Signed builds, SBOM, image attestation, provenance verification	Network and IAM	Network controls don’t catch a malicious package running with legitimate credentials
Insider	Segmentation by trust domain, separation of duties, just-enough-access	Authentication strength	Strong auth doesn’t help against an attacker who has legitimate auth
Data exfiltration	DLP, egress controls, data classification, access logging	Perimeter controls	Perimeter-first programs miss the lateral movement that precedes exfiltration
Ransomware / destructive	Immutable backups, blast-radius segmentation, recovery runbooks	Detection latency	Detection often fires after encryption has already started

The discipline is naming which threat each pattern is mitigating, because the same pattern serves different threats differently. Encryption everywhere defends against the supply-chain world weakly and the insider world barely. Segmentation defends against the insider world well and the credential-stuffing world poorly. The pattern without the threat tag is the control with no clear job.

The patterns adopted without intent

Encryption is only as strong as the keys you can rotate on demand. An audit can pass because the auditor didn’t ask the rotation question. A real adversary doesn’t have to ask it. They just have to find one credential with kms:Decrypt permission against the shared key. This is one of the places where the security posture question gets asked too late, usually after an incident has surfaced what the audit didn’t.

A SaaS I worked with had years of CloudTrail logs and a roughly week-long query window because a storage tier change had quietly been applied and nobody noticed. The first time their incident response team needed to look back many months, they couldn’t. The pattern was present. The capability behind it had been quietly retired. The events were generated, never reviewed, and unsearchable when they were needed.

Next comes segmentation that prevents legitimate work and gets bypassed informally: the “we have a Slack channel for that” anti-pattern. Every legitimate cross-segment workflow that doesn’t have a clean path becomes an informal path. The informal path is the one used during incidents, and it’s the one that compromises the segmentation story. A pattern that doesn’t fit the operational reality is a pattern that gets routed around.

A policy-as-code example

The gap between documented patterns and operating capability is often invisible until an incident reveals it. Policy-as-code makes the gap explicit. The OPA/Rego rule below encodes the least-privilege principle as a machine-checkable constraint on IAM role creation, so the pattern is enforced at plan time, not discovered during a post-incident review.

# policies/iam-least-privilege.rego
# Deny IAM roles that attach AWS-managed AdministratorAccess
# or that include wildcard resource grants on sensitive actions.
# Apply at plan time via Conftest or OPA Gatekeeper.

package iam.least_privilege

import future.keywords.if
import future.keywords.in

deny[msg] if {
    resource := input.resource_changes[_]
    resource.type == "aws_iam_role_policy_attachment"
    resource.change.after.policy_arn == "arn:aws:iam::aws:policy/AdministratorAccess"
    msg := sprintf(
        "Role '%v' attaches AdministratorAccess. Use a scoped policy instead.",
        [resource.change.after.role]
    )
}

deny[msg] if {
    resource := input.resource_changes[_]
    resource.type == "aws_iam_policy"
    statement := resource.change.after.policy.Statement[_]
    statement.Effect == "Allow"
    statement.Resource == "*"
    sensitive_action(statement.Action)
    msg := sprintf(
        "Policy '%v' grants wildcard resource access to sensitive actions: %v",
        [resource.change.after.name, statement.Action]
    )
}

sensitive_action(actions) if {
    sensitive := {
        "iam:CreateUser", "iam:AttachRolePolicy",
        "kms:Decrypt", "kms:GenerateDataKey",
        "s3:GetObject", "s3:PutBucketPolicy",
        "secretsmanager:GetSecretValue"
    }
    some action in actions
    action in sensitive
}

The architectural commitments behind each pattern

Identity governance is required to operate least privilege. The program, not the technology. IAM tools don’t operate least privilege; people do. A mid-size SaaS I advised listed “least privilege” in their security program. IAM access reviews were scheduled quarterly. They were performed annually in practice, because the team didn’t have the staff for quarterly. The gap was visible in the audit logs to anyone who looked. The pattern existed in the document. The capability didn’t.

Network and IAM coordination are required to operate segmentation: two teams, one outcome. Most segmentation failures I see are organizational, not technical. The network team owns the segments. The IAM team owns the identities. The trust-domain decisions sit between them, and a program without explicit ownership of those decisions ends up with segments that don’t match the trust model and identities that span segments they shouldn’t.

Detection engineering is required to operate logging. Someone owns what gets alerted on. Someone tunes the rules. Someone retires the alerts that don’t fire usefully. The detection engineer is a role most programs don’t have, and the absence of the role is why most logging programs degrade into recording that nobody listens to.

The honest question: which capabilities does your organization have, and which patterns require capabilities you don’t? The pattern that requires a capability you don’t have is the pattern that becomes paperwork, present in the architecture and absent from the actual posture.

The organizational dimension

Security functions best as a co-owner of architecture, not a downstream reviewer who rubber-stamps or blocks. The security team that joins at the architecture-review stage is responding to decisions that have already been made. The security team that joins at the design stage is shaping the decision space. The first model produces patterns adopted without intent. The second produces patterns chosen for specific threats.

Saying “we don’t have detection engineering, so we’re not going to claim we have a logging-and-detection pattern” is harder politically than saying “we have logging and detection.” I’ll admit this is one I’ve struggled to land diplomatically in advisory engagements. The honest answer protects the program’s credibility long-term. The comfortable answer protects the next quarterly review.

flowchart TD TM["Threat model
defined"] --> PL{"Least
privilege
operable?"} TM --> PS{"Segmentation
matches trust
domain?"} TM --> PD{"Detection
engineering
staffed?"} TM --> PE{"Encryption
keys owned
and rotatable?"} PL -->|Yes| LP["Least priv
real posture"] PL -->|No| LPP["Least priv
paper posture"] PS -->|Yes| SP["Segmentation
real posture"] PS -->|No| SPP["Segmentation
paper posture"] PD -->|Yes| DP["Detection
real posture"] PD -->|No| DPP["Detection
paper posture"] PE -->|Yes| EP["Encryption
real posture"] PE -->|No| EPP["Encryption
paper posture"] style LP fill:#eaf2fa style SP fill:#eaf2fa style DP fill:#eaf2fa style EP fill:#eaf2fa style LPP fill:#fdd style SPP fill:#fdd style DPP fill:#fdd style EPP fill:#fdd

Figure 2. Each pattern’s outcome is determined not by its presence in the architecture document but by whether the operating capability sits behind it. A single “No” in this flow produces paper posture for that pattern, regardless of what the compliance report says.

The compliance theater problem: patterns that satisfy auditors and miss threats. The auditor’s checklist and the threat model overlap less than the program documents suggest. A program that optimizes for the checklist will satisfy the auditor and leave threats unmitigated. A program that optimizes for the threat model has to do extra work to translate it into the auditor’s vocabulary. The translation work is where the program either keeps its integrity or sells it. The zero-trust pattern is the one I see most often shipped at the audit-narrative level, not the operating-model level.

Auditing your patterns

The insurance company eventually rebuilt their key management program. They moved sensitive workloads to dedicated keys. They wrote rotation policy. They added an access audit on the master keys. The compliance audit still passed. The difference was that a real incident would have produced a coherent answer, not a panicked one.

Audit your patterns against the capability they require, not against whether they appear in the architecture document. The pattern you can operate well beats the pattern you list and operate poorly, every time. The pattern that gets tested is rarely the one you scheduled the audit around. It’s the one your adversary picked, and the gap between those two is the gap between paper posture and real posture.

security security architecture