AvailableSteelCon 2025 · 2025-07 · 30 min

Detection Engineering at Scale

Writing Sigma rules that survive red-team validation - architectural patterns, review loops, and KPI-driven rule design.

Slides (forthcoming)Recording (forthcoming)

This talk started from a year of watching detection rules ship that had never been tested in production. The core problem is not writing bad rules; it is shipping them without any mechanism to discover they are broken until an attacker walks through the gap.

The architectural pattern

Mature detection stacks treat rules as code: they live in version control, they go through a review process, and they have a defined lifecycle from research to production. The talk introduced a three-stage pipeline: red team exercise, rule hypothesis, staging validation, production deployment.

yaml
# Rule lifecycle metadata
stage: staging          # research | staging | production
coverage:
  log_source: sysmon
  event_ids: [1, 10]
  fleet_pct: 78        # validated against production telemetry
kpi:
  fp_rate: 2.3         # false positives per 100 alerts (30-day trailing)
  tn_rate: 97.8

KPI-driven review

The KPI framework was the centrepiece of the talk. Every rule in production carries two metrics: false positive rate (FPs per 100 alerts over the trailing 30 days) and telemetry coverage (percentage of fleet where the required log source is confirmed flowing). A rule with excellent logic and 30% fleet coverage is a 30% detection at best. That number belongs on the rule.

TIP

Any rule with a false positive rate above 10% does not belong in the alerting tier. Move it to a hunting library. Run it on demand against historical data. Do not let it drain analyst attention on a continuous basis - that drain destroys morale faster than any individual bad rule.

What the audience pushed back on

The sharpest question from the floor: 'This sounds like engineering process. Our team does not have time for process.' The answer: detection without process has a failure mode too. You find out about it during an incident. Engineering process exists to avoid the more expensive recovery process. The question is not whether you can afford the process; it is whether you can afford to skip it.

← Back to all talks