What an open EBITDA multiples database would actually need to do

Search "EBITDA multiples by industry 2026" and the first ten results split into two piles. The first pile is one dataset: Aswath Damodaran's NYU Stern aggregates, dated January 2026, republished verbatim by Equidam and Eqvista and FullRatio and half a dozen others. The second pile is advisor-blog ranges ("HVAC trades at 5x-7x in the lower middle market") with no sample, no date stamp on the underlying deals, and no way to filter by deal size or geography.

Neither pile is wrong. Both are stale by the time a deal team needs them, and neither is queryable. That gap is the article.

The two flavors of stale

Damodaran's table is the most cited number in the space, and it's a public-company EV / EBITDA ratio computed once a year against the Compustat universe. By March it's two months old. By October it's reflecting prices nobody is paying anymore. For a fund underwriting an LBO target with $4M of EBITDA, "trucking trades at X" is also describing Old Dominion at scale, not the regional carrier the team is actually looking at.

The advisor-range posts have the opposite problem. They are calibrated to private deals, which is the right universe, but the "range" is a memory of what the partners closed last year. Not a sample. There is no n. There is no date window. There is no way to ask what founder-led HVAC businesses in the $2M-$5M EBITDA band actually traded at over the trailing twelve months.

A team comparing a target against either source is back-solving for a number that already happened. Fine for a pitch deck. Not fine for a bid.

What we'd need to ship one

We build data-extraction pipelines. The raw inputs for a credible multiples table are all public — SEC EDGAR filings, deal press releases, and BD announcements — and any team that wants to assemble it can. None of the existing top-ranked pages do. The reason isn't the scraping. It's that the methodology required to publish multiples without misleading anyone is harder than the existing pages have decided to solve.

Six things would have to be true on the page before we'd put our name on it.

The first is a real freshness window. Refresh weekly, with every row tagged by the date the underlying deal closed, not the date the page was published. A multiple from a deal that closed in January 2024 is not the same instrument as one from April 2026, even if it lives in the same industry bucket. The reader should be able to filter to the trailing 90, 180, or 365 days and watch the row move.

The second is a sample-size column on every row. This is the column the existing pages refuse to publish, because in many sub-industries the honest answer is "we observed four deals in the last twelve months." That number is useful. It tells the reader the row is directional, not a benchmark. A row with n=4 has to look different on the page than a row with n=180.

The third is an outlier rule that is documented and applied consistently. Public deal announcements skew toward the loud ones. A single $2B specialty-chemical roll-up will drag the median for a thin sample, and trimming it without saying so is the trick most advisor blogs play. Two defensible answers: report the trimmed mean and the median side by side with a 5%-95% winsorization applied uniformly, or report the raw median with the IQR. Either is fine. Pretending the underlying distribution is symmetric is not.

The fourth is keeping public-company comps separate from announced-deal multiples. Damodaran's universe is one thing. Transaction multiples from M&A announcements are a different thing. Stack them in the same row and the reader cannot tell which they're looking at. Most pages currently in the top ten do exactly that.

The fifth is a size cut. "Industrials" with no size filter is not a benchmark. It's an average of a hundred deals where size dominates every other variable. The cut that matters for PE is EV bands, roughly $1M-$10M EBITDA, $10M-$50M, $50M-$250M, $250M+. Different funds underwrite at different rungs, and the multiple is not continuous across them.

The sixth is a source link on every observation. If the underlying deal is in the 8-K, link to the 8-K. If it's in an LBO press release, link to the release. The reader should be able to click into any cell and see what built it. None of the pages currently ranking can do this, because their underlying data is licensed and the license forbids it.

What's actually scrapable today

For public-company comps, the answer is everything. The SEC's EDGAR is fully scrapable, 10-Ks publish EBITDA-adjacent line items, and the universe is bounded. The full filing index is open and the schema is stable.

Private transactions are messier but better than the existing pages let on. The full population of private M&A deals isn't public. The population that gets announced is: buyer press releases, target-company statements, banker tombstones, state filings on share transfers, and the occasional 8-K when the buyer is public. Coverage by sub-industry is uneven. Trades involving a public acquirer are the cases where a multiple is likeliest to surface in disclosure, because the buyer has filing obligations the seller side does not. Founder-led sub-$10M-EBITDA deals report a multiple almost never.

The honest version of this database would show that unevenness rather than smooth it over. Rows with thin coverage would carry a warning. Rows with structurally non-public deal flow (most lower-middle-market services rollups, for example) would say so on the page.

The incentive against shipping it

Look at who currently ranks for this query. None of them link a source. None of them publish a sample size. None of them tag a row by deal date. The omissions are not an accident. The pages exist to capture the search term, not to answer it, and the answer that would actually be useful is the one that exposes how thin the data underneath is.

That is the real position. The public-multiples industry publishes numbers that don't have a denominator. A row that says "HVAC: 6.2x" with no n, no date, no source, and no size cut is decoration. It looks like a benchmark and it functions as wallpaper. The reason the page-one results all look the same is that the honest version is harder to write, less profitable to host (no SaaS upsell sits behind a sample-size column), and harder to defend the first time a reader emails to ask which deals are in the bucket.

We can build the scraper. The scraper isn't the wedge. The wedge is being willing to publish a row that says n=4 next to a row that says n=180 and tell the reader the difference matters. If a firm we work with wants this table built against their thesis, specific sub-industries, specific size cuts, specific geographies, refreshed against their sourcing cadence rather than ours, we'll construct it from the same fleet. The narrow version earns its keep. The general-purpose page might come later, and when it does it will list its sample size in every row.

What an open EBITDA multiples database would actually need to do

The two flavors of stale

What we'd need to ship one

What's actually scrapable today

The incentive against shipping it

Alex Stepansky

Get the newsletter. No retainer pitch.

The two flavors of stale

What we'd need to ship one

What's actually scrapable today

The incentive against shipping it

Alex Stepansky

Get the newsletter. No retainer pitch.

More posts.

Independent sponsor economics in 2026: what the cohort actually earns

Add-on acquisitions: how sourcing differs from platform deals

DealCloud alternatives: an honest comparison for PE operators