Methodology

How Bitcoin Weigh-In sources, validates, versions, and corrects its commodity price dataset. Companion to the dataset.

What this is

The Bitcoin Weigh-In dataset records daily closing prices in US dollars for a curated set of fungible commodities from 2013-01-02 to the most recent completed UTC day. From those closes it derives per-BTC equivalents (how many troy ounces of gold, pounds of copper, or barrels of crude one bitcoin could have purchased on each day) and pairs them with a deterministically computed BTC circulating supply. The artifact is a single small file — around 800 KB as CSV, 700 KB as Parquet — that any analyst, journalist, or hobbyist can download once and analyse offline without an API key.

This document describes how the data is collected, what the published flags mean, how cross-validation works, how versions are cut, and how to report corrections. The companion dataset page ships the artifacts; this page describes the rules behind them.

Data sources

Three providers between them cover every live series. Each commodity is pinned to a single primary endpoint so the dataset has one parser, one rate-limit regime, and one place to look when something disagrees with the rest of the financial press.

CoinGecko

The primary source for BTC-USD (coin id bitcoin) and for gold, priced via Pax Gold (pax-gold) — a token redeemable for one fine troy ounce of LBMA gold that tracks spot within a small premium. Both come from CoinGecko's keyless public API (market_chart); the daily job records the last price of each UTC day. No API key is required, so the shared pool is IP-throttled and the job backs off on HTTP 429.

GoldAPI.io

The primary source for silver spot (XAG/USD, USD per troy ounce). The daily job sends the key in the x-access-token header, and a redacted form of every fetched URL is recorded in /health.json so an authentication failure surfaces clearly rather than presenting as silent forward-fill.

FRED (St. Louis Fed)

The primary source for Brent crude (DCOILBRENTEU). FRED redistributes the EIA spot price daily, typically with a one business-day lag. The daily job retries transient HTTP errors on a backoff and forward-fills if the value never arrives.

Stooq (retired)

Stooq was the original source for BTC, gold, silver, and several deferred commodities (platinum, copper, CBOT wheat, ICE coffee). It was dropped on 2026-06-13 after it began blocking automated access. BTC, gold, and silver moved to the providers above; the deferred commodities are not rendered in the interface and their historical values remain frozen in the dataset.

Derived (no API)

BTC circulating supply is computed in scripts/sources.ts as a pure function of days-since-genesis. Genesis is 2009-01-03; the protocol targets 144 blocks per day, the initial block reward is 50 BTC, and the reward halves every 210,000 blocks. The implementation walks halving eras and accumulates supply era-by-era. Because every input is a constant of the protocol, the column has no API dependency and is unit-tested against known halving block dates.

Forward-fill logic

Markets close on weekends and public holidays. Source endpoints occasionally drop a single day's row even on a normal trading session. In both cases the daily job carries the previous known value forward so that every calendar date from coverage start to last update has a row in the dataset. The decision to forward-fill rather than emit nulls is an honest one — analyses that join across commodities need a value for every date, and the alternative (per-commodity NaN) silently propagates into derived calculations.

v1.0 of the dataset ships a forward_filled column populated as empty string for every row, because per-row fill provenance is not reconstructable from the historical NDJSON that was bootstrapped before this column existed. Prospective per-row tracking begins in v1.1, at which point the column will hold a pipe-delimited list of the column names that were forward-filled on that date — for example brent_usd on a day where most feeds returned values but the FRED Brent series had not yet published.

The daily cron writes a separate fill record per source into /health.json on every run, so the present day's fill state is always visible even before per-row tracking lands. The cron also exits non-zero if every source returns zero rows on a UTC weekday — a signal that authentication, rate limits, or upstream infrastructure has changed, rather than silent fill propagating an undetected outage.

BTC supply derivation

The btc_supply column is deterministic. For a date D:

  1. Compute days since the genesis block at 2009-01-03.
  2. Multiply by 144 (the protocol's target blocks per day) to get an approximate cumulative block count.
  3. Walk halving eras of 210,000 blocks: era 1 pays 50 BTC per block, era 2 pays 25, era 3 pays 12.5, era 4 pays 6.25, era 5 pays 3.125, and so on. For each era, add min(era_end, total_blocks) − blocks_so_far times the era's reward.
  4. Round to an integer count of BTC.

The approximation drifts a few thousand BTC from reality (real interblock times vary around the 10-minute target, and mining hashrate growth nudges blocks slightly faster than schedule), but the error is small enough — under 0.1% across the full coverage range — that the column is fit for the visualisation's purpose: showing where on the supply curve any given date sits. Analyses that need block-exact supply should pull from a node or a block explorer; this dataset's column is a clean closed-form schedule.

Illustrative pricing

Two of the four commodities rendered in the visualisation — Plutonium-238 and cocaine — do not have public spot markets. Their prices on the site are illustrative composites constructed from named sources, with the as-of date carried alongside. They appear on the main visualisation but they are not in the live dataset published under /data, which holds only live market closes. A third commodity, the LEU uranium fuel pellet, follows the same pattern but is currently deferred from the visualisation; its illustrative price record persists in the repository for later re-enable.

Plutonium-238

Composite material-cost estimate of ~$5,000/g (midpoint of a $4,000–$8,000 range) derived from the DOE Office of Nuclear Energy, NASA Planetary Science Division publications on the Pu-238 production program (~$150M/year for ~1.5 kg/year), the Cassini OIG report from 1997 ($1,968/g escalated to 2024 dollars), and Atomic Insights' analysis of RTG heat sources. A separately cited fully-loaded program cost (~$100,000/g) reflects the facility maintenance and regulatory infrastructure required for production but is less directly comparable to other commodities' market prices, so the material-cost figure drives the BTC equivalence on the visualisation. Uncertainty bounds: roughly ±60% around the midpoint at the material-cost layer. As-of date: 2024-12-31.

Density and the cube. The visualiser sizes the plutonium cube from the oxide fuel — plutonium-238 dioxide (PuO₂), the ceramic form that radioisotope thermoelectric generators actually burn and that glows — not the pure metal. The cube edge is computed at PuO₂'s theoretical density of 11.46 g/cm³. Real sintered fuel pellets are deliberately pressed to roughly 80–90% of theoretical density (a controlled porosity that accommodates helium from alpha decay without cracking), so an actual pellet of the same mass occupies 10–25% more volume — a slightly larger cube — than the one drawn here. We render the theoretical-density cube because it is the single unambiguous figure; the caveat is that real fuel is a little less dense, and therefore a little bulkier, than the idealised block.

LEU uranium fuel pellet

Composite cost of ~$20 per 7 g pellet from the World Nuclear Association "Economics of Nuclear Power" methodology, cross-checked against the IAEA/OECD-NEA Red Book 2024. Decomposes as: U₃O₈ feed at ~$100/lb, conversion to UF₆ at ~$20/kgU, enrichment at ~$150/SWU, fabrication at ~$300/kgU, yielding ~$3,000/kgU of finished fuel; divided by 7 g/pellet ≈ $20/pellet. Uncertainty bounds: ±30% by contract terms, enrichment level, and market conditions. As-of date: 2025-01-01.

Cocaine (three-tier)

There is no spot market for cocaine. The composite presents three tiers reflecting the market's actual structure: producer (~$2,500/kg, range $1,500–$3,500, raw refined base, UNODC World Drug Report 2024); wholesale (~$30,000/kg, range $25,000–$35,000, ≥80% pure US wholesale standard, UNODC 2024 / DEA NDTA 2024); and retail purity-adjusted (~$120,000/kg, range $80,000–$250,000, normalised to 100% for cross-tier comparison, DEA / EMCDDA). Wholesale is the primary tier for BTC equivalence because it is the most directly comparable to how other commodities are priced (standardised purity, kilogram-scale transactions). As-of date: 2024-12-31.

The live visualiser: camera and staging

The home page renders each commodity as a real-time 3D cube at true physical scale, beside a rigged Shiba Inu that acts as the constant scale anchor (40 cm at the shoulder). Because the cube spans roughly six orders of magnitude — from a sub-millimetre fleck of gold to a silver monolith tens of metres on a side — a single fixed shot cannot stay convincing across the whole range. The scene solves scale the way a photographer does: with the camera. The honesty rule is that every apparent size on screen derives from one declared camera geometry per frame, never from an artistic fudge.

Camera model

A single perspective camera (35° field of view) frames the scene, and exactly one geometry is in force at any instant. As the cube grows the camera dollies along a banded path — macro framing for the speck (the cube held at a fixed fraction of the frame so it never vanishes, with a 5 cm floor), the familiar two-shot when cube and dog are comparable, and a wide shot when the cube towers. The transitions are continuous at the band crossovers by construction, and the camera's height is capped at one metre so the largest cubes are genuinely looked up at rather than viewed from above. The damped easing between framings is itself the scale cue: the longer the camera travels, the bigger the change in size you are being shown.

Staging honesty

Once the cube grows past about 1.2 m on a side, the Shiba walks to a fixed mark in the near foreground while the cube recedes into the distance — the standard photographer's trick for conveying the size of something enormous. This means the dog and the cube are no longer the same distance from the camera, so their on-screen sizes are governed by real perspective (an object twice as far away appears half as large) rather than by a shared scale factor. That is a true depiction, not a trick of the eye, but because it differs from the simple side-by-side comparison the readout says so explicitly — "Shiba standing nearer the camera" — whenever the dog is staged in the foreground. Nothing on screen is ever resized by feel; the apparent sizes always follow from the one declared camera geometry.

Hashweight: network physical mass

The Hashweight panel estimates the total physical mass of the hardware that secures the Bitcoin network. It is an order-of-magnitude estimate — treat all figures as having roughly ±30% uncertainty — derived from three independently sourced inputs: live network hashrate, published ASIC specifications, and publicly disclosed node counts.

Live hashrate

Current network hashrate is fetched at page load from the mempool.space mining API (/api/v1/mining/hashrate/1w), which returns a 7-day rolling average in H/s. The historical sparkline uses /api/v1/mining/hashrate/all, which provides weekly averages back to Bitcoin's origin. If the API is unreachable, the panel falls back to a recent known-good value (800 EH/s).

ASIC fleet model

The installed ASIC fleet is modelled with two blended constants:

  • 150 TH/s per machine — a blend of S19-era hardware (Antminer S19 Pro: 110 TH/s, S19 XP: 140 TH/s) and S21-era hardware (Antminer S21: 200 TH/s, S21 Pro: 234 TH/s). Older S9-class machines (~100 TH/s) and early retirements pull the average down; cutting-edge deployments push it up.
  • 13.5 kg per machine — S19-class units average ~13.2–14.3 kg; S21-class units average ~14.2–14.9 kg; older hardware is lighter (~4.3 kg for S9). The blended fleet average lies between those bounds.

ASIC count = hashrate (TH/s) ÷ 150. ASIC mass = ASIC count × 13.5 kg. At ~950 EH/s this yields ~6.3 million machines weighing ~85,000 metric tonnes. The model over-counts recently retired machines still in transit and under-counts very new hardware not yet fully deployed; ±30% is a reasonable uncertainty band.

Node mass

Full nodes contribute negligibly to the total: approximately 20,000 reachable nodes (source: bitnodes.io) at a blended average of 0.5 kg each (Raspberry Pi at 45 g through NUC/small server at ~1.2 kg) ≈ 10 metric tonnes — under 0.02% of total network mass. The true node count including behind-NAT nodes is likely 50,000–100,000+, but even at that scale the contribution remains under 50 tonnes.

Titanic comparison

The comparison reference is the RMS Titanic's loaded displacement: 52,310 long tons = 53,150 metric tonnes. This is the actual physical mass of the ship, passengers, cargo, and fuel when she sailed. Note: the commonly cited figure of 46,328 is the ship's gross register tonnage — a volumetric measure (100 cubic feet = 1 gross ton), not a mass. Comparing a mass to a volume figure would be dimensionally incorrect, so the displacement figure is used here.

Solo miner estimate

Solo miners — predominantly Bitaxe open-source boards, home Antminers, and Nerdminers — are estimated at ~40 PH/s total hashrate and ~60,000 devices. CKPool Solo routinely reports 10–20 PH/s; allowing for other solo pools and direct-connected miners, 30–50 PH/s is a plausible range. At ~667 GH/s average per device (Bitaxe Ultra/Gamma range: 400–1,200 GH/s), 40 PH/s implies ~60,000 units. Average device weight of 0.18 kg blends bare Bitaxe boards (~0.12 kg) with heavier home ASICs. Total solo mass ≈ 11 metric tonnes, representing roughly 0.012% of total network mass.

Cross-validation

After the primary CoinGecko, GoldAPI.io, and FRED fetches complete, the daily job queries a secondary source — Massive — for the same day's close on BTC-USD, XAU-USD, XAG-USD, and (where available) XPT-USD. For each ticker where both providers return a value, the job computes the absolute percent difference. When the difference exceeds 0.5%, an entry is appended to a cross_validation_flags array in /health.json recording the date, ticker, both values, and the percent diff.

The cross-validation step is a quality signal, not a build gate. It does not fail the daily cron — a missing API key, an HTTP error, a parse failure, or a Massive ticker that doesn't exist all produce a "skipped" status without emitting a flag. This is deliberate: a secondary-source disagreement is information for an analyst, not an infrastructure outage that should block publication of the primary feed. Tickers Massive doesn't cover (continuous futures, FRED-only series like Brent) are skipped silently.

Versioning and updates

The dataset uses semantic versioning for schema changes: a major bump for removed or renamed columns, a minor bump for added columns or sources, and a patch for fixes that preserve the schema. The current version is pinned in dataset-config.json at the repository root; the artifact builder uses that value to decide which static/data/v{X.Y}/ directory to write to. Bumping the version is a manual one-line edit committed by the maintainer.

Daily updates happen at 02:00 UTC. A GitHub Actions cron fetches the previous UTC day's close from every source, appends a row to data/prices.ndjson, rebuilds static/prices.json, regenerates every artifact under static/data/v{X.Y}/, and commits the result to main. Cloudflare Pages redeploys automatically from the commit. The latest aliases at /data/prices.csv, /data/prices.json, and similar always point to the current version's artifacts; the versioned directory at /data/v{X.Y}/ persists indefinitely so prior versions remain downloadable.

Archival to Zenodo is triggered manually by cutting a GitHub release tag, at which point Zenodo's GitHub integration mints a DOI and archives the source tarball. The DOI is copied back into dataset-config.json and the next build surfaces it on the dataset page. Release cadence is keyed to schema-meaningful changes rather than the daily content updates, which keeps DOIs sparse and citable.

Corrections

To report a suspected error, email info@sortathing.com with the affected date(s) and column(s), the value the dataset shows, and where the corrected value should come from with a link. Corrections that affect a single row land in the next daily commit; corrections that affect the schema or a historical methodology trigger a minor version bump and a CHANGELOG entry. Either way, the original row stays in git history — the dataset is the current best truth, but the prior shape remains inspectable in the commit log.

Credits and licences

The dataset itself is published under Creative Commons CC-BY-4.0; see the dataset page for citation details.

Shiba Inu 3D model

The Shiba Inu used as the live visualiser's scale reference is a third-party model licensed under CC-BY-4.0, which requires visible attribution. Per the licence:

This work is based on "Animated Dog, Shiba Inu" (https://sketchfab.com/3d-models/animated-dog-shiba-inu-9abfce885a834399b2c3ccaed51cd474) by quander (https://sketchfab.com/quander) licensed under CC-BY-4.0 (http://creativecommons.org/licenses/by/4.0/)