Every "M&A target identification" article on the first page of Google walks through the same five-step diagram: set criteria, scan the market, screen, prioritize, approach. The diagrams are fine. They are also useless on the day a sourcing team has to actually find operators no one else has, because the diagram is mute on the part that does the work: where the data comes from.
So this is the other article. A catalog of public records beyond LinkedIn and the paid platforms, written from the data layer.
LinkedIn, Grata, and PitchBook are not bad data. They are the data every other buyer in your bracket is already running through the same filters. If you want target lists no one else has, you read records the platforms have not bothered to index.
A note on the shape of "signal"
The signals below are not deal-flow promises. They are the raw material a scoring function uses to rank operators. The pattern is the same under every source: one public record alone is noise, two records that agree are a signal, three that agree are a thesis. Each section names a public source, the columns it returns, and the extraction shape you'd use to convert it into operator-level scores.
State and local permitting records
Building permits, electrical, plumbing, mechanical. Issued by municipalities and aggregated unevenly across the country. The cleanest pulls come from the metros that publish to Socrata-backed open data portals. Austin, Seattle, NYC, and LA County hand back permit number, issue date, contractor name and license, owner, BBL or BIN, estimated job cost, lat/lon, and status as JSON over the SODA API. The NYC DOB NOW endpoint and the LA Building & Safety endpoint are the two highest-volume taps. For mid-tier markets, the canonical access is the portal vendor: Accela and Tyler Technologies' EnerGov host the per-city interfaces, and a county-by-county scrape covers the gaps.
Permits are the closest thing to a leading indicator of installation demand in public data. They precede revenue by three to nine months in trades, construction, and home services. The usable signal isn't the raw count; it's permit-issuance velocity in an operator's primary metro, normalized against that metro's annual baseline. When the rolling 90-day permit count runs hot against baseline, operators in the metro hit the ceiling of techs they can hire, and that's the point in the cycle where owners start picking up buyer calls.
Extraction is mechanical: pull permits by metro, bucket by trade code, compute the ratio against the metro's baseline, score the metro, propagate the score to the operators headquartered there. The ratio is what does the work, not the raw count.
State corporation and LLC filings
Every Secretary of State runs a corporations division. Quality varies widely by jurisdiction, which is why the canonical source depends on what you want.
For officer and tenure data, California's Statement of Information is the cleanest path. Every entity in CA refiles it every two years, and the historical SI-200 (for corporations) and LLC-12 (for limited liability companies) are downloadable as PDFs from bizfileonline.sos.ca.gov. Parse the PDF stack and you have a structured tenure series. For Delaware-domiciled entities, the icis.corp.delaware.gov entity search returns entity name, file number, formation date, and registered agent. Registered-agent continuity is the tenure proxy for DE LLCs and corporations. New York's Department of State portal returns formation date and the DOS process address for any active entity. For officer-level coverage outside CA, the ~25 states that mandate officer listing on annual reports (TX, FL, GA, NC, OH among them) carry it through their respective bulk feeds. Florida's Sunbiz is the verbose end of the spectrum and lists every member ever filed.
Tenure is what these filings give you. An LLC formed in 2003 whose listed manager has not changed is a different operator than an LLC formed in 2019 with three manager changes. Tenure correlates with the population the roll-up thesis is actually about: long-held, owner-operated firms whose owners are at the point in the curve where they want to sell.
The usable tenure feature is the gap between LLC formation date in the state filing and the most recent role change on the owner's LinkedIn entry. When both numbers exceed twenty years, you're looking at the long-held, owner-operated cohort the succession-cliff thesis is actually about.
FCC licenses
The FCC's Universal Licensing System publishes every commercial wireless license, broadcast license, and amateur record in the country. Per license: call sign, licensee name, FRN, service code, status, grant and expiration dates, frequency, transmitter coordinates, address, county, and zip. The canonical pipe is the weekly bulk archive, shipped nightly as pipe-delimited files (l_micro.zip for microwave, l_paging.zip for paging, and so on). For one-off lookups, the ULS Advanced License Search filters by state, up to 40 counties, and multiple service codes at a time.
Whole verticals depend on FCC-licensed equipment: paging, marine and aviation services, two-way radio operators, microwave-backhaul tower companies, broadcast. The license itself is an asset on the balance sheet and a fingerprint of the operator.
A regional wireless ISP scan starts with FCC Part 101 microwave-link licenses crossed against state corporation filings. The licenses give you towers and links; the corporate filings give you ownership; a join on address and licensee normalizes the operator. The output is a map of licensed wireless operators in a geography that no vendor has bothered to assemble.
For unlicensed-spectrum WISPs running on Part 15 bands (5, 6, 24, 60 GHz), the canonical sources are FCC Form 477 broadband deployment filings, state broadband registries, BroadbandNow's coverage dataset, and ARIN IP-block assignments. Join those on operator name and the resulting universe covers both halves of the fixed-wireless market.
FDA registrations
FDA's establishment registration database covers every facility that manufactures, processes, packages, or holds drugs, devices, food, or cosmetics for the US market. Three access paths run in parallel and all return clean data: the legacy CDRH establishment registration CGI returns paginated tables with FEI numbers, registration year, product codes, and establishment types. The 510(k) detail pages return applicant name, address, device name, product code, decision date and code, and the registration numbers of every firm currently manufacturing the cleared device. openFDA returns structured JSON at api.fda.gov/device, paginated and no auth required, which is the right path when you want a single year of cleared devices in one batch. For bulk loads, the monthly PMN96CUR.ZIP and the establishment-registration ZIP cover the full pull.
Contract manufacturers, copackers, and private-label producers are some of the hardest operators to source through standard channels. They don't market. Their websites are intentionally vague about who they make products for, because every line on the floor is somebody else's brand under an NDA. FDA registration is the one place they have to identify themselves.
For a copacker thesis in personal care, you pull every establishment under the relevant cosmetic product codes, filter to contract manufacturers via the establishment-identifier flag, then cross against state corporation filings for revenue proxy and tenure. The output is an operator universe of contract producers that don't surface in any LinkedIn search because they're contractually invisible by design.
OSHA inspection records
OSHA publishes inspection records, citations, and establishment data through the Severe Injury Reports portal. For ad-hoc inspection lookups, the Establishment Search returns inspection number, date opened, establishment name, site and mailing address, union status, SIC and NAICS, inspection type, ownership, case status, violation counts, initial and current penalty totals, and the full citation table with cited standard, issuance date, and abatement date.
The deeper headcount data lives in the bulk dataset. data.dol.gov hosts the osha_inspection table with Nr Employees, Nr Controlled, and Nr Covered columns: ground-truth headcount taken the day the inspector walked the floor. This is closer to ground truth than commercial employee-count vendors, especially for industrial-services theses (abatement, demolition, scaffolding, industrial cleaning) where the vendors systematically conflate parent-company headcount with operating-establishment headcount.
Sizing an industrial-services rollup looks like this: pull the OSHA bulk file in the target NAICS codes over a rolling five-year window, key by establishment, deduplicate against state filings, and rank by inspection-derived employee count. Because the field is recorded at the site the inspector visited, it gives you operating-establishment headcount rather than the parent-company roll-up that vendors typically ship.
FAA aircraft registry
The FAA registry is downloadable nightly: every civil aircraft registered in the US, with owner, address, make, model, year, and registration status. Per record: registrant, street, city, county, state, zip, region, certificate issue and expiration dates, mode S code, manufacturer, model, serial, engine make and model, airworthiness classification, and date. For per-tail lookups, the N-Number Inquiry on registry.faa.gov returns the record in semantic data-label attributes. For bulk pulls, the Releasable Aircraft Database is a nightly zip refreshed off the production registry.
Aircraft ownership is a wealth signal at the personal level and an operating signal at the corporate level. Owner-operated firms whose principals own aircraft are a specific population: concentrated, identifiable, and worth surfacing when the thesis involves successful regional operators in metros without commercial-airline density.
For a thesis in regional distribution where principals fly themselves, the registry gives you the tail number, which gives you flight history through public trackers, which tells you which facilities the principal visits and how often. It serves as a tiebreaker between two otherwise-equivalent operators when one principal flies and the other does not.
Court records
Federal court records sit in two places. pcl.uscourts.gov (PACER's Case Locator) charges $0.10 per page with a $3 cap per document, which makes it the right path for one-off pulls on a specific docket. The standing pipeline runs against CourtListener's RECAP archive, a free open-data archive that Free Law Project re-publishes from PACER with a documented REST API. RECAP carries millions of docket entries across federal district and bankruptcy courts; recent or sealed dockets occasionally need a paid PACER pull to complete the record. State court records are free, with formats that vary by jurisdiction.
Two signals come out of this. Distressed signal at one end (Chapter 11, mechanic's liens, tax warrants). Competitive intelligence at the other (contract disputes between operator and competitor reveal both the relationship and the dollar amount).
A weekly pull of RECAP filings in NAICS-relevant Chapter 11s, joined to state filings, produces a small actionable list of operators where the conversation is not "do you want to sell" but "do you want a 363 sale to close in sixty days."
SEC EDGAR and ERISA Form 5500
EDGAR covers public filings (10-Ks, 10-Qs, 8-Ks, proxy statements, SC 13D, Form D for private placements) at the SEC's portal. The Department of Labor publishes ERISA Form 5500 filings for every employee benefit plan with more than 100 participants. Per filing: Ack ID, EIN, plan number, plan name, sponsor name and address, plan codes and type, plan year, participants beginning and end of year, assets beginning and end of year, and a link to the filed PDF. The auditor is one join away on Schedule C Part 1 Item 2 keyed by ACK_ID with service code 10. Bulk access runs against the annual CSV download and the data dictionary at efast.dol.gov; pre-2009 filings are not indexed.
EDGAR is obvious for public companies and rarely useful for the lower middle market. ERISA is the opposite. Form 5500 surfaces plan sponsor, address, EIN, participants (a hard headcount), total plan assets, and the plan's actuarial schedules. It is the cleanest headcount and benefits-spend signal for private companies above roughly 100 employees.
Sizing one of those operators: pull the Form 5500, read participant count (true headcount), plan asset growth year over year (a proxy for organic growth), and the auditor. The firm that audits a $40M plan is a different firm than the one that audits a $200M plan. This is the data the operator's CFO sees, and it is public.
State license boards and trade-specific registrars
Every state has a contractor licensing board for at least some trades. There's a parallel set of trade-specific registries: the Nationwide Multistate Licensing System for mortgage originators, state pharmacy boards for compounding pharmacies, state DMV records for licensed dealers.
The high-quality bulk feeds live in roughly 20-25 states. Texas's TDLR ships every license (type, number, business name, owner name, business and mailing addresses with county, phone, expiration, continuing-education flag, and geocoded coordinates) across ~30 trades as JSON at data.texas.gov/resource/7358-krk7.json. California's Public Data Portal exposes a Master Contractor List plus classification and county slices; the per-license detail page on cslb.ca.gov is the right path for individual lookups. Florida DBPR, New York OpenData, Washington L&I, Colorado DORA, Ohio eLicense, and North Carolina LCB round out the canonical set. A top-10-states-by-population MVP covers about 54% of US population in two to three weeks. A unified national dataset runs six to ten weeks plus ongoing maintenance.
What you get from this layer is coverage of the operator universe in trades that don't advertise on LinkedIn. A licensed plumbing contractor in Arizona is in the Registrar of Contractors database whether or not they have ever opened a LinkedIn account. The state board is the population frame.
BBB filings and state contractor registries are how you move a home-services universe from "everyone with a website" to "every licensed operator in the geography." That cleanup, dropping franchises and one-person mobile units with a logo, holding the universe to firms actually licensed to do the work, is what keeps a scoring function honest. In any licensed trade, the licensing board is the canonical universe and everything else is a partial reflection.
The pattern under the list
None of these sources are secret. They don't show up in most M&A target identification workflows because no single vendor has assembled them, and the assembly is the work. A platform sells you a join of two or three sources at a price that assumes you will not build it yourself.
A scoring function that reads three or four sources on this list outperforms one reading LinkedIn alone because each additional public-record join shrinks the false-positive set and raises the precision of the tier. Build the extraction layer first, then the scoring layer — the order matters; a clever model on top of one source is still one source.
State tax warrants, customs import/export records, USPTO assignment records, and Medicare provider files belong on a longer version of this list. Every other buyer in your bracket has the same Grata seat, the same PitchBook export, the same Sales Navigator filters. The differentiation is the records nobody else thought to pull and the join logic that lines them up against the vendor data you already have.
