Data Product

SEC Disclosure Narratives

5,507 public companies. 12 XBRL disclosure topics. Extracted text from SEC filings, plus Parquet for programmatic access. One ZIP, delivered via secure download link.

58,554disclosures
5,507companies
12topics
~162 MBdownload
Package Contents

What's Inside the ZIP

disclosure-narratives.zip ~200 MB income-taxes/ ~4,500 extracted text files revenue/ debt/ leases/ ... 8 more topic folders disclosures.parquet all data, one file companies.csv ticker, CIK, name, SIC README.md metadata.json
Coverage

12 XBRL Disclosure Topics

Acquisitions

1,397 disclosures
1,388 companies

Commitments

4,603 disclosures
4,557 companies

Debt

4,146 disclosures
3,984 companies

Earnings Per Share

3,233 disclosures
3,215 companies

Fair Value

3,250 disclosures
3,232 companies

Goodwill & Intangibles

1,815 disclosures
1,805 companies

Income Taxes

5,108 disclosures
5,032 companies

Leases

3,716 disclosures
2,990 companies

New Accounting Standards

5,057 disclosures
4,982 companies

Property, Plant & Equipment

8,012 disclosures
4,372 companies

Revenue

4,200 disclosures
3,339 companies

Segments

4,101 disclosures
4,086 companies

Stock Compensation

6,786 disclosures
4,267 companies

Subsequent Events

3,130 disclosures
3,067 companies
File Formats

What You Get

Extracted text per company

One .txt per company per topic. Text extracted from XBRL filings by the SEC — clean, structured, ready for NLP.

disclosures.parquet

All disclosures in one columnar file. Filter by topic, company, or filing year. pandas / Polars / DuckDB ready.

companies.csv

Every company in the dataset: ticker, CIK, registrant name, SIC code, industry classification.

Code samples

Python and DuckDB examples to load, query, and analyze the data. Get started in minutes.

Quick Start
import duckdb
db = duckdb.connect()
df = db.sql("SELECT * FROM 'disclosures.parquet' WHERE topic = 'Income Taxes'").df()
print(f"{len(df)} companies with income tax disclosures")
Use Cases

Built For

NLP / ML researchers

Fine-tune language models on extracted financial disclosure text. Organized by topic with code samples to get started fast.

Quant analysts

Build governance signals, tax strategy features, or debt structure classifiers from disclosure narratives.

Compliance teams

Benchmark disclosure language across an industry. Compare how peers report on the same topic.

Custom Data

Looking for a different dataset?

We're building new SEC filing datasets — insider trades, financial statements, institutional holdings, and more. Tell us what you're working on and we'll help you get the data.

Request a Dataset →
$15USD · one-time

Enter your email, pay once, download instantly.

Secure download link emailed within minutes. Valid 72 hours.