About
Extraction is a bubble visualization of every notable crypto exploit, hack, and major collapse since 2014. Each bubble is one incident: size = USD lost, color = attack vector, position = year (or month when you drill into a year via the scrubber).
As of today: 1,189 incidents, 2014–2026, 34 Lazarus / DPRK (23 confirmed · 10 forensics · 1 rumored), 68 bridge hacks, 110 repeat exploits, 11 collapses.
Data sources
Spine + 6 enrichment layers. Each record is tagged with all sources that contributed to it.
- DefiLlama Hacks API (
api.llama.fi/hacks) — the canonical spine. ~510 incidents, daily refresh. Provides project name, date, USD, chains, classification, technique, language, parentProtocolId, defillamaId. - SlowMist Hacked (hacked.slowmist.io) — HTML-scraped (~1350 records pre-dedup, 666 added after fuzzy-match against spine). Fills long-tail incidents and especially CEX / Wallet / Bridge categories. Filtered to ≥$100K loss.
- Hand-curated manual lists— 21 entries DefiLlama doesn't track: pre-2016 (Mt. Gox, Bitfinex 2016, Bitstamp), supply-chain (Ledger Connect Kit, npm @solana/web3.js, CoW Swap domain hijack), and major collapses (Terra LUNA / UST May 2022, Celsius, BlockFi, FTX customer shortfall, 3AC, Voyager, Genesis, Iron Finance, Babel, Hodlnaut, Prime Trust). Each row has primary-source URLs (DOJ / SEC / Reuters / Chainalysis).
- Lazarus / DPRK attribution — manual cross-reference list of incidents attributed to North Korean actors. Each entry tagged with confidence:
- high (23): FBI / US Treasury / OFAC / DOJ formal attribution
- medium (10): Chainalysis / TRM Labs / Elliptic published report, no government sanctions yet
- rumor (1): community speculation without primary source — bubble shows dashed stroke + “?” badge
- Tavily news search — for 589 of 1189records, pulled the original news article / postmortem / forensics report from credible domains (Rekt News, Coindesk, Cointelegraph, TheBlock, Decrypt, Halborn, Certik, Chainalysis, etc.) and attached as “Sources” in each modal.
- eth-labels API (eth-labels.com) — attacker / exploiter address tags from Etherscan and other EVM block explorers. 109 records have associated on-chain attacker addresses with links to the appropriate chain explorer.
Vector taxonomy
We classify incidents by attack mechanism, not target type. Ronin Bridge and Harmony are tagged key-leak (validator key compromise) — not bridge— because that's what actually happened. The bridge label captures contract-level cross-chain exploits like Wormhole and Nomad.
- smart-contract: re-entrancy, logic bugs, missing access control
- bridge: cross-chain interop / message-verifier exploits
- oracle: price-feed manipulation, flash-loan-driven oracle / donation attacks
- key-leak: private key compromise, multisig signer compromise, insider
- phishing: wallet drainers, domain hijack, signature phishing, supply-chain (npm) attacks
- rugpull: exit scams, Ponzi schemes, malicious deploy, honeypot
- collapse: algorithmic stablecoin death spirals, CeFi bankruptcies, internal design failures (Terra, Celsius, FTX)
- other: unknown, governance attack, mixed
Methodology
- Fetch + normalize.
npm run sync-datapulls DefiLlama → merges SlowMist + manual lists via fuzzy dedup (Levenshtein name + ±1 month + ±15% amount tolerance, with substring match preferred and generic-word filtering to prevent false positives like “X Bridge” ↔ “Y Bridge”). - Vector inference.Priority chain: DefiLlama bridgeHack flag → Rugpull classification → technique-regex match → classification fallback. Donation Attack on lending markets correctly maps to oracle (it's share-to-asset ratio manipulation); Cloudflare Key Compromised maps to key-leak; Frontend / Domain Hijack maps to phishing.
- Repeat-exploit detection. Records grouped by
parentProtocolId(or normalized name fallback) and sorted by date — first occurrence isisRepeat: false, subsequent get the prior incident IDs listed in the modal. Catches Compound forks (Sonne, Hundred), Curve pools, Balancer pools, Venus Core Pool repeats, etc. - Data-quality sanity.Implausible-amount auto-drop (anything >$10B is almost certainly a token-count vs USD bug from SlowMist). Verified specific corrections via Tavily-driven cross-check: TokenStore Jun 2019 was
$1Bin SlowMist (token count) →$160Mper Quadriga Initiative report; OneCoin$440M→$4.5Bper SEC + EU prosecutors; Finiko, Solar Techno Alliance, ArbiStar, Wirecard etc — seedata/discovery-corrections.json. - Schema invariants asserted by
scripts/verify-data.ts: required fields, unique ids, ≥25 Lazarus matches, ≥85% classification coverage in DefiLlama spine, ≥5 repeat-flagged. Top-15 by amount printed for human review.
Known limitations
- Coverage gaps are inherent to source aggregation. We rely on what DefiLlama / SlowMist / Tavily news search surface. Some incidents (especially obscure DeFi protocols, country-specific exchanges) may be missing or have stale numbers.
- Attacker-address coverageis partial — eth-labels.com tracks the most recent ~400 attacker-tagged addresses across EVM chains. Famous historical exploiters like the Wormhole or Mango Markets attackers aren't in their dataset.
- Audit firm + bug bounty data (was-it-audited / by-whom / had-bounty) requires Solodit and Immunefi scraping — both are SPA + tRPC backends that need full browser automation. Deferred to a later phase.
- Laundered / recovered amounts are best-effort. DefiLlama exposes
returnedFundsfor ~30% of records; full laundering trail (Tornado / Sinbad / cross-chain bridge hops) requires Chainalysis Reactor / TRM Forensics paid feeds, which we don't use. - Pre-2016 / off-chain incidents rely on the manual list. Mt. Gox 2014, Bitstamp 2015, Bitfinex 2016 are present; smaller pre-2016 events may be missing.
- Multi-year Ponzi schemes (BitConnect, OneCoin, PlusToken) are listed with their final aggregate-loss estimate at a single date marker. The bubble represents total scheme value, not loss-per-day.
UI features
- Search by name — input on the chip row, autosuggests top matches with keyboard nav (↑↓ Enter Esc).
- Permalink — click the link icon in any incident modal to copy a URL that opens that specific hack on page load. Browser back/forward buttons sync the modal state.
- CSV export — yearly-stats section has a download button for the full dataset (21 columns, RFC 4180 escaped).
- Year drilldown — click any year in the bottom scrubber to filter that year and re-cluster bubbles by month within it.
- Vector + chain filters — top chip row. Click a chip to keep only those vectors / chains visible (others dim, not removed).
- Repeat-exploit indicator — incidents at protocols with multiple hacks show prior-incident links inside the modal.
Credits
Built with Next.js + d3-force + Tailwind CSS. Data and analysis sourced from DefiLlama, SlowMist, Chainalysis, FBI / US Treasury / DOJ / SEC, TRM Labs, Elliptic, Halborn, Certik, Cyfrin, eth-labels.com, Tavily, and the UN Panel of Experts. Non-commercial — exists to make crypto-theft history easier to scan at a glance, not to attribute liability or replace forensic investigation.
Built by @nikolayxyz