Skip to content

XBRL mapping

SEC filers tag their financial statements using XBRL (eXtensible Business Reporting Language) — a taxonomy of thousands of concepts like us-gaap:Revenues, us-gaap:NetIncomeLoss, us-gaap:Assets. The problem is taxonomy drift: different filers use different concepts for the same economic fact, the same filer uses different concepts across years, and extension taxonomies proliferate. Raw XBRL is not cross-comparable out of the box.

Thesma normalises this into a stable canonical field schema. Every /v1/us/sec/companies/{cik}/financials response uses the same field names across every filer and every year.

Canonical fieldStatementUnit
revenueIncome statementUSD
cost_of_revenueIncome statementUSD
gross_profitIncome statementUSD
research_and_developmentIncome statementUSD
selling_general_adminIncome statementUSD
operating_expensesIncome statementUSD
operating_incomeIncome statementUSD
pre_tax_incomeIncome statementUSD
income_taxIncome statementUSD
net_incomeIncome statementUSD
eps_basicIncome statementUSD/share
eps_dilutedIncome statementUSD/share
total_assetsBalance sheetUSD
current_assetsBalance sheetUSD
current_liabilitiesBalance sheetUSD
total_liabilitiesBalance sheetUSD
stockholders_equityBalance sheetUSD
cash_and_equivalentsBalance sheetUSD
long_term_debtBalance sheetUSD
operating_cash_flowCash flowUSD
investing_cash_flowCash flowUSD
financing_cash_flowCash flowUSD
capital_expendituresCash flowUSD

Full list with XBRL source-tag mappings is queryable at /v1/us/sec/financials/fields:

Terminal window
curl -H "X-API-Key: $THESMA_API_KEY" \
"https://api.thesma.dev/v1/us/sec/financials/fields" | jq '.data[] | {canonical: .name, source_tags: .us_gaap_tags[:3]}'

Every financial value on a financials response carries metadata showing which XBRL concept it was derived from in that specific filing. This is the audit trail — if a filer used us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax in one 10-K and us-gaap:Revenues in the next, both get mapped to canonical revenue but the source_tags field records which concept was used each time.

{
"data": [
{
"fiscal_year": 2024,
"period_type": "annual",
"revenue": 391035000000,
"net_income": 93736000000,
"metadata": {
"accession_number": "0000320193-24-000123",
"filed_at": "2024-11-01",
"source_tags": {
"revenue": "us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax",
"net_income": "us-gaap:NetIncomeLoss"
}
}
}
]
}

Use source_tags when:

  • You need to reconcile a Thesma canonical field against a filer’s raw XBRL.
  • You’re building a model that weights periods differently based on mapping confidence.
  • A filer switches concepts between periods and you want to know which period used which concept.

Not every XBRL concept maps cleanly. The canonical fields are defined as a hierarchy — if a filer emits the most-specific concept, it’s a direct map; if they emit a parent concept, the mapping walks up the taxonomy.

The metadata.mapping_quality field (when present) encodes this:

ValueMeaning
exactDirect 1:1 map from a single concept.
derivedComputed from a sum or difference of other concepts (e.g., gross_profit = revenue - cost_of_revenue when the filer didn’t tag GrossProfit).
estimatedInferred with non-trivial assumptions. Check source_tags to understand what was assumed.

Most large-cap filers are exact across the board. Smaller filers and IFRS 20-F filers are more likely to show derived or estimated.

Foreign private issuers file 20-F annual reports under the IFRS-Full taxonomy instead of US-GAAP. Roughly ~400+ companies on NYSE and NASDAQ file under IFRS today (Spotify, Nu Holdings, GlobalFoundries, Birkenstock, AngloGold Ashanti, Amer Sports, On Holding, Globant, XP Inc, Millicom, and others). They share the same /v1/us/sec/companies/{cik}/financials endpoint and the same canonical field names as US-GAAP filers, with two pieces of extra metadata on every response:

  • taxonomy: 'ifrs-full' (vs 'us-gaap')
  • currency: <ISO-4217> — e.g., 'EUR' for Spotify, 'USD' for Nu Holdings, 'BRL' for Brazilian IFRS filers
  • _reporting_notes — an object with two always-present keys: presentation_format (by_function | by_nature | unknown) and ifrs_18_applied (boolean). Five conditional keys may appear when the corresponding edge case fires: TAXONOMY_CHANGED_IN_AMENDMENT, CURRENCY_CHANGED_IN_AMENDMENT, TAXONOMY_DETECTION_AMBIGUOUS, CURRENCY_DETECTION_AMBIGUOUS, presentation_format_detection_note. Absent keys are dropped from wire output. For per-field confidence signals on derived values, see field_confidence (a separate sibling of line_items).

IFRS and US-GAAP differ in ways that matter for cross-filer comparison. Thesma surfaces the divergences honestly rather than silently smoothing them. The six comparability caveats below apply at the line-item level — read them before comparing an IFRS filer’s canonical field to a US-GAAP filer’s canonical field.

  1. R&D capitalisation. IFRS allows capitalisation of development costs that meet the IAS 38 criteria; US-GAAP expenses most R&D. Compare research_and_development with care for companies that capitalise.
  2. LIFO inventory. US-GAAP permits LIFO; IFRS prohibits it. Inventory costs and gross margins are not directly comparable between a LIFO US filer and an IFRS filer.
  3. Operating subtotal optionality. Before IAS 1 revisions took effect, IFRS didn’t require a disclosed operating-income line. Where the filer didn’t report one, Thesma derives operating_income from available subtotals. Check the row’s field_confidence dict for a per-field confidence signal on operating_income when the derivation path is used — field_confidence is a sibling of line_items in the response and reports string confidence labels (e.g., "medium") for fields with non-default confidence.
  4. Finance costs scope. IFRS’s “finance costs” category may bundle items that US-GAAP reports separately (interest expense, FX losses, hedge ineffectiveness). Thesma maps to canonical interest_expense using the filer’s disclosed breakdown; where only a combined total exists, the value is flagged via source_tags.
  5. Dividends classification. IFRS permits dividends paid to be classified under operating or financing in the cash flow statement. Thesma preserves the filer’s chosen classification rather than forcing one location — inspect financing_cash_flow and operating_cash_flow together for IFRS filers.
  6. Bank extensions. IFRS banks (Nu Holdings today; other foreign IFRS banks are in progress) report many line items through bank-specific XBRL extension taxonomies. Thesma canonicalises the top-level totals — revenue, net income, total assets, equity, operating/investing/financing cash flows — but sub-line-item detail lives in company-specific extensions and is not normalised across banks in v1.

Values are reported in the filer’s presentation currency with no USD conversion in v1. A Spotify row’s revenue: 17186000000 is €17.186B, not USD. The currency field on every response tells you the unit. Applications that need a unified USD view apply client-side FX conversion at a reference rate of their choosing.

USD-normalised IFRS values are on the roadmap for a future release; until then, the design choice is “honest native currency” over “lossy pre-conversion.”

See the IFRS filer recipe for an end-to-end worked example using Spotify.

Filers can define their own XBRL extension concepts (e.g., aapl:ProductsRevenue). Extensions are parsed but not mapped into canonical fields by default — you’ll see them as separate entries in the filing detail but they don’t affect the canonical revenue number. This is intentional: extensions are by definition non-standard and aggregating them across filers would be meaningless.

Normalisation is most reliable from ~2013 onwards, when XBRL became mandatory for the full income statement and balance sheet. Pre-2013 filings are present but have lower mapping confidence — treat historical comparisons with pre-2013 data accordingly.

The canonical fiscal_year field is the year the reporting period ends, not the calendar year it was filed in. Apple’s fiscal_year: 2024 ended in September 2024 and was filed in November 2024. Don’t confuse this with the calendar year of the filing date (metadata.filed_at).