OSINTAvailable11 min read · Apr 8, 2026

The Half-Life of a Phishing Kit

Across 217 phishing kits collected and reverse-engineered from live infrastructure over an eight-month window, three distinct populations emerged - distinguished not by language or target, but by the half-life of the underlying operation. The implications for both detection and attribution are non-obvious.

OSINTPhishingThreat Intelligence

Phishing kits are a curious artefact. Each one is the codified workflow of an operation: a finished, deployable attack stack that captures credentials, exfiltrates them somewhere, and serves a believable enough victim experience to keep the operation viable for as long as possible. Because they get re-deployed (often by people other than their original authors) they leave fingerprints all over the public internet, and because most of them are written by hand and bad at hiding anything, they are remarkably easy to collect at scale.

Over an eight-month window in 2025-26 we pulled 217 distinct kits from live infrastructure, reverse-engineered each one, and tracked them across redeployment cycles. The data is too messy to claim as authoritative - collection bias, the difficulty of canonicalising a kit across modifications, the noise inherent in volunteer-grade infrastructure - but a clear pattern emerged that we have not seen described elsewhere. Three distinct populations, distinguished not by language or target sector, but by something more interesting: the operational half-life of the kit's deployment.

Population A: Short-lived kits

Roughly 40% of collected kits had a median deployment lifetime under 36 hours. These were almost universally low-effort - single-page credential harvesters, basic templates, often-dead exfiltration channels (a Telegram bot, an email-to-form gateway). The kit author and the kit operator were typically the same person, often using a free tier of a hosting service that would terminate the operation within a day or two. We saw the same kit redeployed across dozens of throwaway domains in succession.

These kits do not need sophisticated detection. The infrastructure is short-lived enough that domain reputation and certificate-transparency log monitoring catch them before any meaningful victim engagement. The median engagement window is short enough that even basic mailflow blocklists turn over fast enough to be effective. The attacker model here is quantity-over-quality; defence at this layer is also quantity-over-quality.

Population B: Medium-persistence kits

About 35% of kits sustained operations for 3 to 14 days before infrastructure rotation. These were qualitatively different. Multi-step authentication flows, working session-token handling, more careful brand fidelity. Many included basic anti-analysis: User-Agent inspection, geofencing on the credential-harvest URL, conditional content based on referrer. The exfiltration channels were closed-loop - credentials posted to a self-hosted endpoint behind a CDN, cached and served from a separate domain so loss of the front-end didn't lose the captured data.

The economics are different here. Population B kits earn enough per deployment to justify the operational overhead. The author is more likely separate from the operator (kit-as-a-service), and the kit itself often shows signs of intentional modification - language localisation, target-specific branding - that we don't see in Population A.

NOTE

We were able to identify several Population B kits as derivative of three or four shared codebases. The 'family tree' of phishing kits is much shallower than people assume - most of what looks like distinct operations is variation on a small set of underlying source. This is why kit fingerprinting works so well as a detection primitive, and why the collection effort pays off even when individual kit instances are short-lived.

Population C: Long-running operations

The remaining ~25% sustained operations for 30 days or more, sometimes much longer. These are different in kind, not degree. Population C kits include their own customer support flows, dedicated infrastructure with proper TLS rotation, sophisticated victim-side telemetry to detect security-research traffic, and several layers of operational resilience.

The most striking observation: every Population C kit we examined had built-in features for the operator to monitor their own kit's detection status - automated checks against URLhaus, OpenPhish, virustotal-like services - with a kill-switch behaviour that pulled the kit from the live URL when detection appeared. Several included a 'rebuild from backup' workflow that re-deployed an identical kit on fresh infrastructure within minutes.

This is the layer where attribution starts to mean something. Population C operators are running businesses - small businesses, but businesses. They have customers (other phishers), they have product roadmaps, they have feature releases, they have operational metrics. Treating them analytically the same as Population A is a category error.

Implications for defence

The standard advice on phishing defence is uniform across the threat: filter inbound mail, train users, monitor for credential reuse. Useful, but not refined enough. Each population responds to a different defensive posture:

Population A is a numbers game and is best handled with infrastructure-side reputation systems (URLhaus, certificate transparency, mailflow blocklists). Per-instance triage is wasted effort; the kit will be dead before triage finishes.
Population B is where mail-side filtering matters most. Sophisticated content analysis, brand-impersonation detection, and rapid take-down workflows pay off. The kit will live long enough to capture credentials if it survives the first few hours of deployment.
Population C requires investigation. Mail filters and reputation services are necessary but insufficient - the operator is actively defending their infrastructure. Take-downs without coordination across multiple providers fail; kit fingerprints across deployments are higher-fidelity attribution than IP or domain because the kit travels with the operator.

Implications for threat intel

The other surprising finding: kit family identity is far more stable than infrastructure identity. The same Population C kit shows up across dozens of operator handles, hundreds of domain registrations, and several different upstream hosting providers - but the kit's distinctive code patterns (specific function names, encoding choices, exfiltration shapes) remain remarkably constant.

If you are running a threat-intel program and you are tracking phishing operations by domain, you are tracking the wrong artefact. Track the kit. Build a corpus of known kit families. Run new captures against the corpus. The vast majority of new operations will be derivatives of something you have already seen, and the few that are genuinely novel are the ones worth spending real attention on.

Caveats

The 217-kit corpus is not a random sample. It overweights kits that ran on infrastructure we could observe, that targeted services we monitor, and that ran in language environments our analysts could read. The Population A:B:C ratios in your environment may differ. The qualitative observations - that the populations are real, that they respond to different defensive postures, that kit family identity outlives infrastructure identity - we believe to be robust across the range of operations we have visibility into. Treat the numbers as illustrative; treat the categories as load-bearing.

The corpus, redacted of victim identifiers, is published in our research notes. The attribution to specific kit families is in our case studies. Both will keep updating as the underlying populations shift.

← Back to all posts