Experimental

Lab

Three editorial chart prototypes built on top of the same regulations.gov data feed.
Regulatory output by agency

The midnight regulation surge

Outgoing presidents tend to push a backlog of finalized rules out the door in their final weeks. It’s known as the “midnight regulations” surge. Each panel below tracks Final Rule documents per month at six agencies whose late-Biden output ran furthest above their prior-year baseline, drawn from the 20 highest-volume agencies of the past decade. Blue marks Democratic administrations, red Republican; dashed lines mark each January 20 handoff.

Loading agency activity…
Republican administrationDemocratic administrationBar height = Final Rules posted that month. Shared Y-axis across panels.
Comment uniqueness

Organic comments or orchestrated campaign?

Mass comment campaigns, where thousands of identical or near-identical letters arrive through advocacy platforms, are now routine in federal rulemaking. For each docket below, we group near-identical comments (looking past formatting, names, numbers, and word order) to separate form letters from genuinely unique submissions.

Loading comments and computing clusters…
Fidelity ladder · pathway to production

What each matching step recovers

The panel above ships the top of this ladder (the near-duplicate tier) by default. This guide steps through the rungs so you can see what each normalization step buys: start at exactmatch (a single form letter signed by thousands counts as thousands of “unique” submissions), then loosen to template and near-duplicate and watch the uniqueness collapse. All three run live in the browser; the looser ones are cheap, transparent stand-ins for the offline/server methods outlined below.

Byte-identical text only.

Clustering comments (exact)…
Architecture: client-side aggregation, and the path to a server tier

Every chart in the lab computes on the client: DuckDB-WASM reads column-pruned, predicate-pushed Parquet straight from R2 over HTTP range requests, so there is no backend to run. That keeps deploys instant and queries ad-hoc, but every viewer re-runs the same scan, and per-row work over millions of rows (live hashing, fuzzy matching, embeddings) is slow or infeasible. The looser tiers above are cheap stand-ins for methods that belong on an offline/server tier.

Which aggregations move to materialized views, which fields dropped at ETL block features, and where this can go next (SimHash/LSH, a real search index, semantic dedupe) are written up in docs/architecture.md.

Rulemaking duration

How long does it take to make a federal rule?

A federal rule typically moves through a docket as a Proposed Rule → public comment period → Final Rule. The chart below shows how long that takes for each of the top eight rulemaking agencies. Each row is one agency's distribution of completed rulemakings (in days from proposal to final), with the median marked in solid color.

Loading lifecycle data…