Why Most Threat Models Are Wrong (and the One That Wasn't)
The threat model document on every team's wiki has the same structure, the same diagrams, and the same omissions. It captures the threats that fit on the page and misses the ones that don't. There is a different way to do this - uglier on paper, more honest under pressure.
Open the threat model document on any reasonably mature engineering team's wiki. You'll find the same shape: a system diagram with rectangles for components and arrows for trust boundaries, a STRIDE table per component, a list of identified threats with severity ratings, and a column of mitigations marked as 'planned', 'in progress', or 'complete'. The diagram is clean. The threats are categorised. The document looks like a threat model.
It is rarely a useful one. After enough years of writing them, reviewing them, and watching them fail to predict anything that actually happens during incidents, I have come to believe that the conventional threat-modelling output captures threats that fit on the page and systematically misses threats that don't. The format is the bug.
What conventional threat models capture well
To be fair: the STRIDE-and-rectangles approach does some things well. It catches the obvious within-component bugs - auth bypasses, injection on input boundaries, missing access controls on individual data classes. These are real and worth having a structured pass over. The format encourages teams to enumerate components and trust boundaries, which is a useful exercise on its own merits even if the threat-modelling output disappears into a wiki.
But the format constrains what kinds of threats can be expressed. Anything that doesn't fit cleanly into a {component × STRIDE-category} cell becomes invisible in the output, even if a thoughtful reviewer noticed it during the meeting.
What conventional threat models miss
Three classes, in approximate order of frequency in real incidents:
Threats that span the seams
The diagram has rectangles for components A, B, and C, with arrows for the trusted edges between them. The actual incident comes from an interaction nobody drew an arrow for: A makes a side-effect-having call to a fourth-party system that B happens to depend on, and a compromise of that fourth-party affects how B interprets data from A. The threat is not in any of A, B, or C - it lives in the implicit trust pattern across them. STRIDE tables don't have a column for 'transitive trust'.
Threats that are about time
Most threat models are static-snapshot models. They reason about the system as it is. Real systems change continuously: deployments, dependency updates, configuration drift, key rotation, organisational changes. A threat that is dormant in the snapshot becomes live three deployments later when an unrelated change shifts the trust topology. Conventional threat models have nothing to say about this; the document is correct on the day it was written and steadily wrong thereafter.
Threats that are operational, not architectural
The architecture is fine. The deployment is fine. The runtime configuration was set by hand on a Tuesday by someone who is now on holiday and never written down anywhere. The compromise is via a misconfigured environment variable, an untracked feature flag, a forgotten test endpoint that survived into production. None of these are visible in any architectural document because they were never architectural; they were the residue of operational reality. Threat models written from architecture diagrams cannot see them.
The one that worked
The most useful threat model I have ever been part of producing was for a payments backend. It had no rectangles. It had a forty-page document organised entirely around plausible incident narratives - 'what would have to be true for a successful attack against [specific outcome] to happen, and what does the chain of compromise look like?'.
Each narrative was three to five pages. It started with the attacker's goal, walked through every step they would have to take, named every assumption that would have to hold for that step to succeed, and identified what controls existed at each assumption. The format forced narration: 'in step 4, the attacker needs to have valid IAM credentials for the production AWS account; this requires either compromise of an engineer laptop with AWS keys, or compromise of the build pipeline's role, or - and this is where we got nervous - the SaaS observability vendor we forward CloudTrail to'.
The narrative format made the third-party observability vendor visible as a critical trust dependency. STRIDE on the payments service would not have. The diagram had no arrow for it.
How we run them now
The format that has held up across a dozen reviews:
- Start with outcomes, not components. Pick three or four worst-case business outcomes ('attacker reads any customer's payment history', 'attacker mints unauthorized API tokens that survive password reset', 'attacker exfiltrates the unit-economics dataset'). These are the things that matter; everything else is secondary.
- For each outcome, write a narrative of how an attacker gets there. Don't optimise for a clean attack path; enumerate plausible paths even if some seem implausible. Implausible paths become plausible during specific operational conditions; capture them.
- At every step of every narrative, identify the assumption being relied on for that step to fail or succeed. The assumptions are the actual surface area of the threat model. Some will be technical ('the IAM role does not have S3:Get on the customer-data bucket'). Some will be operational ('the on-call engineer reviews PagerDuty alerts within 15 minutes'). Don't filter; both categories matter.
- Identify the control responsible for each assumption holding. If there is no control - only convention, only 'we trust the vendor', only 'nobody has ever done that' - write that explicitly. Uncontrolled assumptions are the work output of the threat-modelling exercise.
- Re-run the exercise quarterly with the previous output in hand. The interesting question is not 'what threats exist?' - it is 'what assumptions changed since last quarter?'. The diff is the operating layer.
What this gets you
Narrative threat models do worse on the conventional thing threat models are supposed to do: they don't produce a clean STRIDE table. They are harder to summarise on a slide. They take longer to write. The reviewer asking 'where is the threat list?' will not be satisfied.
They do better on the actual thing: predicting what will happen during an incident. The narratives become the runbook. The named assumptions become the monitoring targets. The uncontrolled assumptions become the engineering backlog. The diff between quarters becomes the indicator of whether the architecture is drifting toward or away from the original posture.
If your threat model can be replaced by a STRIDE table, it could probably be replaced by a checklist; and if it could be replaced by a checklist, the work the threat model was supposed to do is being done somewhere else, or not at all.