About

Extraction is a bubble visualization of every notable crypto exploit, hack, and major collapse since 2014. Each bubble is one incident: size = USD lost, color = attack vector, position = year (or month when you drill into a year via the scrubber).

As of today: 1,189 incidents, 2014–2026, 34 Lazarus / DPRK (23 confirmed · 10 forensics · 1 rumored), 68 bridge hacks, 110 repeat exploits, 11 collapses.

Data sources

Spine + 6 enrichment layers. Each record is tagged with all sources that contributed to it.

DefiLlama Hacks API (api.llama.fi/hacks) — the canonical spine. ~510 incidents, daily refresh. Provides project name, date, USD, chains, classification, technique, language, parentProtocolId, defillamaId.
SlowMist Hacked (hacked.slowmist.io) — HTML-scraped (~1350 records pre-dedup, 666 added after fuzzy-match against spine). Fills long-tail incidents and especially CEX / Wallet / Bridge categories. Filtered to ≥$100K loss.
Hand-curated manual lists— 21 entries DefiLlama doesn't track: pre-2016 (Mt. Gox, Bitfinex 2016, Bitstamp), supply-chain (Ledger Connect Kit, npm @solana/web3.js, CoW Swap domain hijack), and major collapses (Terra LUNA / UST May 2022, Celsius, BlockFi, FTX customer shortfall, 3AC, Voyager, Genesis, Iron Finance, Babel, Hodlnaut, Prime Trust). Each row has primary-source URLs (DOJ / SEC / Reuters / Chainalysis).
Lazarus / DPRK attribution — manual cross-reference list of incidents attributed to North Korean actors. Each entry tagged with confidence:
- high (23): FBI / US Treasury / OFAC / DOJ formal attribution
- medium (10): Chainalysis / TRM Labs / Elliptic published report, no government sanctions yet
- rumor (1): community speculation without primary source — bubble shows dashed stroke + “?” badge
Tavily news search — for 589 of 1189records, pulled the original news article / postmortem / forensics report from credible domains (Rekt News, Coindesk, Cointelegraph, TheBlock, Decrypt, Halborn, Certik, Chainalysis, etc.) and attached as “Sources” in each modal.
eth-labels API (eth-labels.com) — attacker / exploiter address tags from Etherscan and other EVM block explorers. 109 records have associated on-chain attacker addresses with links to the appropriate chain explorer.

Vector taxonomy

We classify incidents by attack mechanism, not target type. Ronin Bridge and Harmony are tagged key-leak (validator key compromise) — not bridge— because that's what actually happened. The bridge label captures contract-level cross-chain exploits like Wormhole and Nomad.

smart-contract: re-entrancy, logic bugs, missing access control
bridge: cross-chain interop / message-verifier exploits
oracle: price-feed manipulation, flash-loan-driven oracle / donation attacks
key-leak: private key compromise, multisig signer compromise, insider
phishing: wallet drainers, domain hijack, signature phishing, supply-chain (npm) attacks
rugpull: exit scams, Ponzi schemes, malicious deploy, honeypot
collapse: algorithmic stablecoin death spirals, CeFi bankruptcies, internal design failures (Terra, Celsius, FTX)
other: unknown, governance attack, mixed

Methodology

Fetch + normalize. npm run sync-data pulls DefiLlama → merges SlowMist + manual lists via fuzzy dedup (Levenshtein name + ±1 month + ±15% amount tolerance, with substring match preferred and generic-word filtering to prevent false positives like “X Bridge” ↔ “Y Bridge”).
Vector inference.Priority chain: DefiLlama bridgeHack flag → Rugpull classification → technique-regex match → classification fallback. Donation Attack on lending markets correctly maps to oracle (it's share-to-asset ratio manipulation); Cloudflare Key Compromised maps to key-leak; Frontend / Domain Hijack maps to phishing.
Repeat-exploit detection. Records grouped by parentProtocolId (or normalized name fallback) and sorted by date — first occurrence is isRepeat: false, subsequent get the prior incident IDs listed in the modal. Catches Compound forks (Sonne, Hundred), Curve pools, Balancer pools, Venus Core Pool repeats, etc.
Data-quality sanity.Implausible-amount auto-drop (anything >$10B is almost certainly a token-count vs USD bug from SlowMist). Verified specific corrections via Tavily-driven cross-check: TokenStore Jun 2019 was $1B in SlowMist (token count) → $160M per Quadriga Initiative report; OneCoin $440M → $4.5B per SEC + EU prosecutors; Finiko, Solar Techno Alliance, ArbiStar, Wirecard etc — see data/discovery-corrections.json.
Schema invariants asserted by scripts/verify-data.ts: required fields, unique ids, ≥25 Lazarus matches, ≥85% classification coverage in DefiLlama spine, ≥5 repeat-flagged. Top-15 by amount printed for human review.

Known limitations

Coverage gaps are inherent to source aggregation. We rely on what DefiLlama / SlowMist / Tavily news search surface. Some incidents (especially obscure DeFi protocols, country-specific exchanges) may be missing or have stale numbers.
Attacker-address coverageis partial — eth-labels.com tracks the most recent ~400 attacker-tagged addresses across EVM chains. Famous historical exploiters like the Wormhole or Mango Markets attackers aren't in their dataset.
Audit firm + bug bounty data (was-it-audited / by-whom / had-bounty) requires Solodit and Immunefi scraping — both are SPA + tRPC backends that need full browser automation. Deferred to a later phase.
Laundered / recovered amounts are best-effort. DefiLlama exposes returnedFundsfor ~30% of records; full laundering trail (Tornado / Sinbad / cross-chain bridge hops) requires Chainalysis Reactor / TRM Forensics paid feeds, which we don't use.
Pre-2016 / off-chain incidents rely on the manual list. Mt. Gox 2014, Bitstamp 2015, Bitfinex 2016 are present; smaller pre-2016 events may be missing.
Multi-year Ponzi schemes (BitConnect, OneCoin, PlusToken) are listed with their final aggregate-loss estimate at a single date marker. The bubble represents total scheme value, not loss-per-day.

UI features

Search by name — input on the chip row, autosuggests top matches with keyboard nav (↑↓ Enter Esc).
Permalink — click the link icon in any incident modal to copy a URL that opens that specific hack on page load. Browser back/forward buttons sync the modal state.
CSV export — yearly-stats section has a download button for the full dataset (21 columns, RFC 4180 escaped).
Year drilldown — click any year in the bottom scrubber to filter that year and re-cluster bubbles by month within it.
Vector + chain filters — top chip row. Click a chip to keep only those vectors / chains visible (others dim, not removed).
Repeat-exploit indicator — incidents at protocols with multiple hacks show prior-incident links inside the modal.

Credits

Built with Next.js + d3-force + Tailwind CSS. Data and analysis sourced from DefiLlama, SlowMist, Chainalysis, FBI / US Treasury / DOJ / SEC, TRM Labs, Elliptic, Halborn, Certik, Cyfrin, eth-labels.com, Tavily, and the UN Panel of Experts. Non-commercial — exists to make crypto-theft history easier to scan at a glance, not to attribute liability or replace forensic investigation.

Built by @nikolayxyz