Skip to content

XBRL mapping

SEC filers tag their financial statements using XBRL (eXtensible Business Reporting Language) — a taxonomy of thousands of concepts like us-gaap:Revenues, us-gaap:NetIncomeLoss, us-gaap:Assets. The problem is taxonomy drift: different filers use different concepts for the same economic fact, the same filer uses different concepts across years, and extension taxonomies proliferate. Raw XBRL is not cross-comparable out of the box.

Thesma normalises this into a stable canonical field schema. Every /v1/us/sec/companies/{cik}/financials response uses the same field names across every filer and every year.

Canonical fieldStatementUnit
revenueIncome statementUSD
cost_of_revenueIncome statementUSD
gross_profitIncome statementUSD
research_and_developmentIncome statementUSD
selling_general_adminIncome statementUSD
operating_expensesIncome statementUSD
operating_incomeIncome statementUSD
pre_tax_incomeIncome statementUSD
income_taxIncome statementUSD
net_incomeIncome statementUSD
eps_basicIncome statementUSD/share
eps_dilutedIncome statementUSD/share
total_assetsBalance sheetUSD
current_assetsBalance sheetUSD
current_liabilitiesBalance sheetUSD
total_liabilitiesBalance sheetUSD
stockholders_equityBalance sheetUSD
cash_and_equivalentsBalance sheetUSD
long_term_debtBalance sheetUSD
operating_cash_flowCash flowUSD
investing_cash_flowCash flowUSD
financing_cash_flowCash flowUSD
capital_expendituresCash flowUSD

Full list with XBRL source-tag mappings is queryable at /v1/us/sec/financials/fields:

Terminal window
curl -H "X-API-Key: $THESMA_API_KEY" \
"https://api.thesma.dev/v1/us/sec/financials/fields" | jq '.data[] | {canonical: .name, source_tags: .us_gaap_tags[:3]}'

Every financial value on a financials response carries metadata showing which XBRL concept it was derived from in that specific filing. This is the audit trail — if a filer used us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax in one 10-K and us-gaap:Revenues in the next, both get mapped to canonical revenue but the source_tags field records which concept was used each time.

{
"data": [
{
"fiscal_year": 2024,
"period_type": "annual",
"revenue": 391035000000,
"net_income": 93736000000,
"metadata": {
"accession_number": "0000320193-24-000123",
"filed_at": "2024-11-01",
"source_tags": {
"revenue": "us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax",
"net_income": "us-gaap:NetIncomeLoss"
}
}
}
]
}

Use source_tags when:

  • You need to reconcile a Thesma canonical field against a filer’s raw XBRL.
  • You’re building a model that weights periods differently based on mapping confidence.
  • A filer switches concepts between periods and you want to know which period used which concept.

Not every XBRL concept maps cleanly. The canonical fields are defined as a hierarchy — if a filer emits the most-specific concept, it’s a direct map; if they emit a parent concept, the mapping walks up the taxonomy.

The metadata.mapping_quality field (when present) encodes this:

ValueMeaning
exactDirect 1:1 map from a single concept.
derivedComputed from a sum or difference of other concepts (e.g., gross_profit = revenue - cost_of_revenue when the filer didn’t tag GrossProfit).
estimatedInferred with non-trivial assumptions. Check source_tags to understand what was assumed.

Most large-cap filers are exact across the board. Smaller filers and IFRS 20-F filers are more likely to show derived or estimated.

IFRS 20-F filers — limited coverage, Q2 2026 target

Section titled “IFRS 20-F filers — limited coverage, Q2 2026 target”

The canonical schema is designed around US-GAAP concepts. Foreign private issuers that file 20-F with IFRS statements — Spotify, Nu Holdings, GlobalFoundries, and others — are partially normalised:

  • Company records exist in /v1/us/sec/companies.
  • Raw 20-F filings are indexed in /v1/us/sec/filings.
  • Canonical /v1/us/sec/companies/{cik}/financials responses have limited historical coverage — expect sparse coverage pre-2023 and occasional gaps post-2023.

Native-currency IFRS support (EUR, BRL, etc. reporting without lossy conversion) and full IFRS-concept mapping are targeted for Q2 2026. Until then, 20-F users should fetch raw filings and parse XBRL directly for fields we haven’t fully mapped.

Filers can define their own XBRL extension concepts (e.g., aapl:ProductsRevenue). Extensions are parsed but not mapped into canonical fields by default — you’ll see them as separate entries in the filing detail but they don’t affect the canonical revenue number. This is intentional: extensions are by definition non-standard and aggregating them across filers would be meaningless.

Normalisation is most reliable from ~2013 onwards, when XBRL became mandatory for the full income statement and balance sheet. Pre-2013 filings are present but have lower mapping confidence — treat historical comparisons with pre-2013 data accordingly.

The canonical fiscal_year field is the year the reporting period ends, not the calendar year it was filed in. Apple’s fiscal_year: 2024 ended in September 2024 and was filed in November 2024. Don’t confuse this with the calendar year of the filing date (metadata.filed_at).