The 2024 SBA FOIA release of the Paycheck Protection Program lists 11,365,188 loans totaling $787.5 billion. Although this dataset is more of a COVID-era snapshot, it also happens to be the closest thing in the public domain to a registry of every US small business with employees. With sector, geography, payroll-size, and survival outcome attached to every row, the PPP loan archive is a great top-funnel datasource.
Bird's eye view
We pulled the data and crunched the numbers. Thirteen CSVs, ~960MB compressed on disk, and ~12 million rows. The outcomes break down cleanly:
- Paid in Full10,569,378 (93.0%)
- Charged Off627,208 (5.5%)
- Exemption 4168,602 (1.5%)
$758.5B forgiven · $17.9B charged off · $11.1B exempted from disclosure
Two program rounds. PPP, the original 2020 wave, accounts for 8.52M loans and $580.4B. PPS, the Second Draw that opened in January 2021, accounts for 2.84M loans and $207.1B. A borrower with both is one that made it through a first hit, then documented a 25%+ revenue decline to qualify for the second relief check. The survival cohort is its own filter, which gets its own section below.
The loan-size distribution:
n = 11,365,188 loans / hover any bar to highlight
While the PPP cap was $10M, we can see that most of the loans land under $50K, with only half a percent breaking $1M. PPP skews hard toward sub-scale operators with small payrolls: the average business reported 7.9 jobs, with business in the top 10-percentile reporting 16, and the top 1% reporting 102. This is the universe that matters for lower-middle-market PE.
Loan amount is a revenue proxy
PPP loans were sized at 2.5x average monthly payroll. The formula reverses cleanly: CurrentApprovalAmount ÷ 2.5 × 12 is the implied annual payroll. Divide that by sector-typical labor-cost-of-revenue, and you have an implied revenue band without ever talking to the owner.
Take a residential general contractor with a $200K PPP loan. That's $80K monthly payroll, $960K annually. Residential construction runs ~30% labor cost of revenue, putting implied revenue at roughly $3.2M. Cross with JobsReported and you also get per-employee comp, which separates skilled trades from administrative shops within the same NAICS.
This conversion is gold. PPP has 11.4M records, but filtered to a thesis sector and size band, we can usually land on 20-80K rows.
The labor-cost ratios that matter for the back-into (practitioner ranges; cross-reference with RMA Annual Statement Studies for the exact sub-industry):
- Residential construction: 28-32%
- Home services and trades: 35-45%
- Full-service restaurants: 30-35% (food cost dominates)
- Healthcare practices: 45-55%
- Professional services: 50-65%
- Manufacturing: 12-20%
A firm hunting $1-3M revenue HVAC contractors filters PPP to NAICS 238220 (plumbing, heating, AC), loan amount $100K-$400K, BorrowerState... and that returns roughly 19K leads nationally. No banker needed.
The roll-up substrate problem
Most "fragmented sector ready for consolidation" posts argue from intuition. PPP lets you argue from operator count.
The top NAICS-2 sectors by dollar volume:
bar = $ approved / hover for forgiveness & charge-off detail
Other Services has the highest loan count by 285K but the lowest average loan, because that bucket is single-shop salons, repair places, dry cleaners, small religious orgs. Drill into the 4-digit children and the actual roll-up substrates surface:
| Sub-sector | Loans | Median loan | $B | Avg jobs |
|---|---|---|---|---|
| Full-service restaurants | 325,221 | $61,246 | 42.4 | 22.4 |
| Offices of lawyers | 199,704 | $25,000 | 16.4 | 5.9 |
| Offices of dentists | 172,907 | $59,452 | 14.0 | 8.5 |
| Residential remodelers | 164,640 | $17,956 | 5.1 | 3.1 |
| Plumbing / HVAC contractors | 103,381 | $33,380 | 13.0 | 10.5 |
| General automotive repair | 93,018 | $20,833 | 3.9 | 4.8 |
| Drycleaning and laundry services | 28,087 | $17,750 | 1.1 | 7.0 |
| Roofing contractors | 27,844 | $20,833 | 2.9 | 9.6 |
The fragmentation is real and quantified. 103K plumbing and HVAC operators with a median loan around $33K and ~10 employees, scattered across every metro in the country, is the roll-up substrate. So is 173K dental offices with ~8 employees. So is 93K auto repair shops with ~5.
PPP also gives you the geographic distribution at county and CD level for each row, so the same query that surfaces the sector also tells you which metros have the operator density to support a regional roll-up versus the ones too thin to bother with.
Quantifying resilience
96.3% of PPP dollars were forgiven program-wide, but that alone is not informative. Almost every operator that took a PPP loan also kept their payroll up and got the loan turned into a grant. The filter that matters is the inverse: 5.5% of loans (627,208) were charged off, meaning the borrower failed to qualify for forgiveness and could not repay.
Charge-off rate by NAICS-2 spreads from 0.85% in agriculture to 8.91% in Other Services and 8.40% in transportation. That's a ten-fold variance in operational failure rate across sectors, and it lines up with intuition: capital-intensive sectors with stable contracts (agriculture, manufacturing, professional services) survived, while fragmented owner-operator sectors with weak balance sheets (Other Services, last-mile transportation) did not.
The stronger second-order filter is the PPS cohort. PPS qualification required documenting a 25%+ Q-over-Q revenue decline at the time of the second-draw application. The 2.84M operators who passed that hurdle and are still on the rolls four years later have a forgiveness record on file. They are a battle-tested cohort. They got hit. They qualified for relief. They recovered enough to operate today. For thesis investors who care about resilience over growth, this is the highest-signal cohort the file produces.
Originating lender for relationship oriented sourcing
The single most underused column in PPP is OriginatingLender. The top 15:
bar = loan count / hover any row for the $-volume and lender pattern
Two signatures dominate. JPMorgan, BofA, Wells Fargo, Truist, PNC wrote loans the standard way: branch banker took the application, the average loan ran $30-150K, and the borrower had an existing depository relationship. Cross River and Harvest Small Business Finance wrote a different kind of loan: fintech-distributed, average around $20-27K. Here, the borrower came in through Square, BlueVine, Kabbage, or one of a dozen Bento/Womply-style aggregators. The two cohorts behave differently. Regional-bank borrowers respond to phone outreach and warm intros from their banker. Fintech-aggregator borrowers respond to email with a Calendly link and a Stripe-themed landing page. The cold-outreach motion is not the same.
For sourcing-by-relationship: filter PPP to whichever lender your fund has a working relationship with, intersect with thesis sector and size, and you have a warm-intro pipeline that does not require buying a list.
Business age plus sector is a succession proxy
BusinessAgeDescription is three buckets that explain 95% of rows:
- Existing 2+ yrs10,142,927 (89.3%)
- New (0-2 yrs)686,241 (6.0%)
- Unanswered530,517 (4.7%)
In trades with known retirement-cliff dynamics (HVAC, plumbing, accounting, dental, small-shop manufacturing, independent insurance brokerage), older-business filings correlate strongly with older owners. A 35-year HVAC operator who took out a PPP loan with 12 reported jobs and $200K of payroll four years ago has a real chance of looking for a buyer in the next 24 months.
The dataset cannot tell you the owner's age directly. It can tell you the business is old, the sector is one where succession-driven sale events cluster, and the operator's PPP cohort is the same vintage as the wave of owners now selling. Cross it with Secretary of State filings for officer turnover, license-renewal data for ongoing operating status, or public records for ownership changes, and you have a hitlist.
Forgiveness rate as operational discipline signal
PPP loans got forgiven if the borrower documented payroll, rent, utilities, and a few other categories. This is pure admin: payroll registers, bank statements, tax forms, and lender questionnaires. Operators with a CPA, a controller, or even a bookkeeper got through it cleanly, whereas operators without an admin layer often did not.
Filter for loans where ForgivenessAmount is zero or far below CurrentApprovalAmount, and you have surfaced operators with weak back-office discipline. For roll-up platforms looking for clean acquisition targets, exclude the under-forgiven loans — they signal an operator whose books are likely a mess. For distressed and turnaround funds, invert the filter and you have a lead list of operationally weak businesses that might be acquirable at real discounts and improvable by dropping a controller in.
Both are real plays you can run from one column.
The limits of PPP data
PPP is the first-pass screen in an M&A sourcing funnel. It is not the funnel.
The file does not include EIN, TIN, or any IRS-issued identifier. Those are confidential under 26 USC §6103 and never released in SBA FOIA disclosures regardless of loan size. There is no owner identity, no person names, no contact information. The snapshot is 2020-2021, so four years of attrition, ownership changes, and consolidation have happened since. There is no financial detail beyond payroll: no revenue, no EBITDA, no margins, no balance sheet. The implied-payroll math gives a directional range, not a number to underwrite to.
So PPP narrows ~33M US businesses down to a thesis-shaped 30K-80K-row shortlist. The next layer (Secretary of State filings for current officers, OpenCorporates or a paid enrichment vendor for contact info, license records or web-presence checks for current operating status) converts the shortlist into a real outreach list. That stitching is the work. PPP alone is the substrate.
Filtered against a specific thesis, the dataset surfaces operators no banker is touching. The lift from "the SBA released some loan data" to "11,365,188 names, addresses, sectors, sizes, lenders, and survival outcomes, queryable from your laptop in five seconds" is a one-week piece of infrastructure that pays for itself the first quarter a partner takes a call from a name on the list before the banker shows up.
