Corridome
Start a build
Corridome/Blog/Post 24
Post No. 24

What 11.4 million PPP loans tell you about every US small business with employees

The 2024 SBA Paycheck Protection Program FOIA release is the closest thing in the public domain to a registry of every US small business with employees. Here is what you can pull from it for M&A sourcing without enrichment.

Date
Length2,234 words
Read11 min

The 2024 SBA FOIA release of the Paycheck Protection Program lists 11,365,188 loans totaling $787.5 billion. Although this dataset is more of a COVID-era snapshot, it also happens to be the closest thing in the public domain to a registry of every US small business with employees. With sector, geography, payroll-size, and survival outcome attached to every row, the PPP loan archive is a great top-funnel datasource.

Bird's eye view

We pulled the data and crunched the numbers. Thirteen CSVs, ~960MB compressed on disk, and ~12 million rows. The outcomes break down cleanly:

Fig. 01 — Loan status outcomes across the full PPP archive
  • Paid in Full10,569,378 (93.0%)
  • Charged Off627,208 (5.5%)
  • Exemption 4168,602 (1.5%)

$758.5B forgiven · $17.9B charged off · $11.1B exempted from disclosure

Two program rounds. PPP, the original 2020 wave, accounts for 8.52M loans and $580.4B. PPS, the Second Draw that opened in January 2021, accounts for 2.84M loans and $207.1B. A borrower with both is one that made it through a first hit, then documented a 25%+ revenue decline to qualify for the second relief check. The survival cohort is its own filter, which gets its own section below.

The loan-size distribution:

Fig. 02 — Loan size distribution

n = 11,365,188 loans / hover any bar to highlight

While the PPP cap was $10M, we can see that most of the loans land under $50K, with only half a percent breaking $1M. PPP skews hard toward sub-scale operators with small payrolls: the average business reported 7.9 jobs, with business in the top 10-percentile reporting 16, and the top 1% reporting 102. This is the universe that matters for lower-middle-market PE.

Loan amount is a revenue proxy

PPP loans were sized at 2.5x average monthly payroll. The formula reverses cleanly: CurrentApprovalAmount ÷ 2.5 × 12 is the implied annual payroll. Divide that by sector-typical labor-cost-of-revenue, and you have an implied revenue band without ever talking to the owner.

Take a residential general contractor with a $200K PPP loan. That's $80K monthly payroll, $960K annually. Residential construction runs ~30% labor cost of revenue, putting implied revenue at roughly $3.2M. Cross with JobsReported and you also get per-employee comp, which separates skilled trades from administrative shops within the same NAICS.

This conversion is gold. PPP has 11.4M records, but filtered to a thesis sector and size band, we can usually land on 20-80K rows.

The labor-cost ratios that matter for the back-into (practitioner ranges; cross-reference with RMA Annual Statement Studies for the exact sub-industry):

  • Residential construction: 28-32%
  • Home services and trades: 35-45%
  • Full-service restaurants: 30-35% (food cost dominates)
  • Healthcare practices: 45-55%
  • Professional services: 50-65%
  • Manufacturing: 12-20%

A firm hunting $1-3M revenue HVAC contractors filters PPP to NAICS 238220 (plumbing, heating, AC), loan amount $100K-$400K, BorrowerState... and that returns roughly 19K leads nationally. No banker needed.

The roll-up substrate problem

Most "fragmented sector ready for consolidation" posts argue from intuition. PPP lets you argue from operator count.

The top NAICS-2 sectors by dollar volume:

Fig. 03 — Top 10 NAICS-2 sectors by PPP dollar volume
  • Construction$97.5B1.01M loans · $96K avg · 97.6% forgiven · 5.24% charge-off
  • Professional / Scientific / Technical$94.2B1.29M loans · $73K avg · 96.4% forgiven · 3.75% charge-off
  • Healthcare & Social Assistance$94.0B984K loans · $96K avg · 97.6% forgiven · 3.40% charge-off
  • Accommodation & Food Services$83.1B815K loans · $102K avg · 95.9% forgiven · 5.36% charge-off
  • Other Services$56.1B1.57M loans · $36K avg · 93.1% forgiven · 8.91% charge-off
  • Manufacturing (durable)$47.0B251K loans · $187K avg · 98.3% forgiven · 4.22% charge-off
  • Retail Trade$41.8B568K loans · $74K avg · 97.6% forgiven · 4.82% charge-off
  • Admin & Support / Waste Mgmt$38.7B613K loans · $63K avg · 95.1% forgiven · 8.05% charge-off
  • Wholesale Trade$37.6B346K loans · $109K avg · 97.5% forgiven · 5.61% charge-off
  • Transportation$28.8B781K loans · $37K avg · 92.5% forgiven · 8.40% charge-off

bar = $ approved / hover for forgiveness & charge-off detail

Other Services has the highest loan count by 285K but the lowest average loan, because that bucket is single-shop salons, repair places, dry cleaners, small religious orgs. Drill into the 4-digit children and the actual roll-up substrates surface:

Sub-sectorLoansMedian loan$BAvg jobs
Full-service restaurants325,221$61,24642.422.4
Offices of lawyers199,704$25,00016.45.9
Offices of dentists172,907$59,45214.08.5
Residential remodelers164,640$17,9565.13.1
Plumbing / HVAC contractors103,381$33,38013.010.5
General automotive repair93,018$20,8333.94.8
Drycleaning and laundry services28,087$17,7501.17.0
Roofing contractors27,844$20,8332.99.6

The fragmentation is real and quantified. 103K plumbing and HVAC operators with a median loan around $33K and ~10 employees, scattered across every metro in the country, is the roll-up substrate. So is 173K dental offices with ~8 employees. So is 93K auto repair shops with ~5.

PPP also gives you the geographic distribution at county and CD level for each row, so the same query that surfaces the sector also tells you which metros have the operator density to support a regional roll-up versus the ones too thin to bother with.

Quantifying resilience

96.3% of PPP dollars were forgiven program-wide, but that alone is not informative. Almost every operator that took a PPP loan also kept their payroll up and got the loan turned into a grant. The filter that matters is the inverse: 5.5% of loans (627,208) were charged off, meaning the borrower failed to qualify for forgiveness and could not repay.

Charge-off rate by NAICS-2 spreads from 0.85% in agriculture to 8.91% in Other Services and 8.40% in transportation. That's a ten-fold variance in operational failure rate across sectors, and it lines up with intuition: capital-intensive sectors with stable contracts (agriculture, manufacturing, professional services) survived, while fragmented owner-operator sectors with weak balance sheets (Other Services, last-mile transportation) did not.

The stronger second-order filter is the PPS cohort. PPS qualification required documenting a 25%+ Q-over-Q revenue decline at the time of the second-draw application. The 2.84M operators who passed that hurdle and are still on the rolls four years later have a forgiveness record on file. They are a battle-tested cohort. They got hit. They qualified for relief. They recovered enough to operate today. For thesis investors who care about resilience over growth, this is the highest-signal cohort the file produces.

Originating lender for relationship oriented sourcing

The single most underused column in PPP is OriginatingLender. The top 15:

Fig. 04 — Top 15 originating lenders by loan count
  • Bank of America490,733$34.4B approved · Big bank, mid loans
  • Cross River Bank478,533$12.9B approved · Fintech aggregator
  • JPMorgan Chase452,127$43.9B approved · Big bank, big loans
  • Harvest Small Business Finance408,120$8.2B approved · Fintech aggregator
  • Wells Fargo280,309$13.7B approved · Big bank, smaller avg
  • U.S. Bank174,531$10.7B approved · Mid-bank
  • TD Bank132,867$12.1B approved · Mid-bank
  • PNC Bank118,902$17.2B approved · Mid-bank, bigger loans
  • Truist Bank117,549$16.1B approved · Mid-bank, bigger loans
  • Manufacturers and Traders Trust89,774$13.2B approved · Regional, bigger loans
  • Huntington National Bank82,943$11.1B approved · Regional
  • Zions Bank76,351$9.8B approved · Regional
  • KeyBank69,112$11.0B approved · Regional
  • BMO Bank65,664$10.3B approved · Regional
  • Fifth Third Bank65,501$7.3B approved · Regional

bar = loan count / hover any row for the $-volume and lender pattern

Two signatures dominate. JPMorgan, BofA, Wells Fargo, Truist, PNC wrote loans the standard way: branch banker took the application, the average loan ran $30-150K, and the borrower had an existing depository relationship. Cross River and Harvest Small Business Finance wrote a different kind of loan: fintech-distributed, average around $20-27K. Here, the borrower came in through Square, BlueVine, Kabbage, or one of a dozen Bento/Womply-style aggregators. The two cohorts behave differently. Regional-bank borrowers respond to phone outreach and warm intros from their banker. Fintech-aggregator borrowers respond to email with a Calendly link and a Stripe-themed landing page. The cold-outreach motion is not the same.

For sourcing-by-relationship: filter PPP to whichever lender your fund has a working relationship with, intersect with thesis sector and size, and you have a warm-intro pipeline that does not require buying a list.

Business age plus sector is a succession proxy

BusinessAgeDescription is three buckets that explain 95% of rows:

Fig. 05 — Borrower business age at time of loan
  • Existing 2+ yrs10,142,927 (89.3%)
  • New (0-2 yrs)686,241 (6.0%)
  • Unanswered530,517 (4.7%)

In trades with known retirement-cliff dynamics (HVAC, plumbing, accounting, dental, small-shop manufacturing, independent insurance brokerage), older-business filings correlate strongly with older owners. A 35-year HVAC operator who took out a PPP loan with 12 reported jobs and $200K of payroll four years ago has a real chance of looking for a buyer in the next 24 months.

The dataset cannot tell you the owner's age directly. It can tell you the business is old, the sector is one where succession-driven sale events cluster, and the operator's PPP cohort is the same vintage as the wave of owners now selling. Cross it with Secretary of State filings for officer turnover, license-renewal data for ongoing operating status, or public records for ownership changes, and you have a hitlist.

Forgiveness rate as operational discipline signal

PPP loans got forgiven if the borrower documented payroll, rent, utilities, and a few other categories. This is pure admin: payroll registers, bank statements, tax forms, and lender questionnaires. Operators with a CPA, a controller, or even a bookkeeper got through it cleanly, whereas operators without an admin layer often did not.

Filter for loans where ForgivenessAmount is zero or far below CurrentApprovalAmount, and you have surfaced operators with weak back-office discipline. For roll-up platforms looking for clean acquisition targets, exclude the under-forgiven loans — they signal an operator whose books are likely a mess. For distressed and turnaround funds, invert the filter and you have a lead list of operationally weak businesses that might be acquirable at real discounts and improvable by dropping a controller in.

Both are real plays you can run from one column.

The limits of PPP data

PPP is the first-pass screen in an M&A sourcing funnel. It is not the funnel.

The file does not include EIN, TIN, or any IRS-issued identifier. Those are confidential under 26 USC §6103 and never released in SBA FOIA disclosures regardless of loan size. There is no owner identity, no person names, no contact information. The snapshot is 2020-2021, so four years of attrition, ownership changes, and consolidation have happened since. There is no financial detail beyond payroll: no revenue, no EBITDA, no margins, no balance sheet. The implied-payroll math gives a directional range, not a number to underwrite to.

So PPP narrows ~33M US businesses down to a thesis-shaped 30K-80K-row shortlist. The next layer (Secretary of State filings for current officers, OpenCorporates or a paid enrichment vendor for contact info, license records or web-presence checks for current operating status) converts the shortlist into a real outreach list. That stitching is the work. PPP alone is the substrate.

Filtered against a specific thesis, the dataset surfaces operators no banker is touching. The lift from "the SBA released some loan data" to "11,365,188 names, addresses, sectors, sizes, lenders, and survival outcomes, queryable from your laptop in five seconds" is a one-week piece of infrastructure that pays for itself the first quarter a partner takes a call from a name on the list before the banker shows up.

Alex Stepansky, Principal, Corridome
About the author

Alex Stepansky

Builder and engineer. Writes about the sourcing infrastructure firms build once they've outgrown the list broker.

More →

Get the newsletter. No retainer pitch.

Protected by Turnstile