XBRL mapping
SEC filers tag their financial statements using XBRL (eXtensible Business Reporting Language) — a taxonomy of thousands of concepts like us-gaap:Revenues, us-gaap:NetIncomeLoss, us-gaap:Assets. The problem is taxonomy drift: different filers use different concepts for the same economic fact, the same filer uses different concepts across years, and extension taxonomies proliferate. Raw XBRL is not cross-comparable out of the box.
Thesma normalises this into a stable canonical field schema. Every /v1/us/sec/companies/{cik}/financials response uses the same field names across every filer and every year.
The canonical schema
Section titled “The canonical schema”| Canonical field | Statement | Unit |
|---|---|---|
revenue | Income statement | USD |
cost_of_revenue | Income statement | USD |
gross_profit | Income statement | USD |
research_and_development | Income statement | USD |
selling_general_admin | Income statement | USD |
operating_expenses | Income statement | USD |
operating_income | Income statement | USD |
pre_tax_income | Income statement | USD |
income_tax | Income statement | USD |
net_income | Income statement | USD |
eps_basic | Income statement | USD/share |
eps_diluted | Income statement | USD/share |
total_assets | Balance sheet | USD |
current_assets | Balance sheet | USD |
current_liabilities | Balance sheet | USD |
total_liabilities | Balance sheet | USD |
stockholders_equity | Balance sheet | USD |
cash_and_equivalents | Balance sheet | USD |
long_term_debt | Balance sheet | USD |
operating_cash_flow | Cash flow | USD |
investing_cash_flow | Cash flow | USD |
financing_cash_flow | Cash flow | USD |
capital_expenditures | Cash flow | USD |
Full list with XBRL source-tag mappings is queryable at /v1/us/sec/financials/fields:
curl -H "X-API-Key: $THESMA_API_KEY" \ "https://api.thesma.dev/v1/us/sec/financials/fields" | jq '.data[] | {canonical: .name, source_tags: .us_gaap_tags[:3]}'metadata.source_tags on every response
Section titled “metadata.source_tags on every response”Every financial value on a financials response carries metadata showing which XBRL concept it was derived from in that specific filing. This is the audit trail — if a filer used us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax in one 10-K and us-gaap:Revenues in the next, both get mapped to canonical revenue but the source_tags field records which concept was used each time.
{ "data": [ { "fiscal_year": 2024, "period_type": "annual", "revenue": 391035000000, "net_income": 93736000000, "metadata": { "accession_number": "0000320193-24-000123", "filed_at": "2024-11-01", "source_tags": { "revenue": "us-gaap:RevenueFromContractWithCustomerExcludingAssessedTax", "net_income": "us-gaap:NetIncomeLoss" } } } ]}Use source_tags when:
- You need to reconcile a Thesma canonical field against a filer’s raw XBRL.
- You’re building a model that weights periods differently based on mapping confidence.
- A filer switches concepts between periods and you want to know which period used which concept.
Mapping confidence
Section titled “Mapping confidence”Not every XBRL concept maps cleanly. The canonical fields are defined as a hierarchy — if a filer emits the most-specific concept, it’s a direct map; if they emit a parent concept, the mapping walks up the taxonomy.
The metadata.mapping_quality field (when present) encodes this:
| Value | Meaning |
|---|---|
exact | Direct 1:1 map from a single concept. |
derived | Computed from a sum or difference of other concepts (e.g., gross_profit = revenue - cost_of_revenue when the filer didn’t tag GrossProfit). |
estimated | Inferred with non-trivial assumptions. Check source_tags to understand what was assumed. |
Most large-cap filers are exact across the board. Smaller filers and IFRS 20-F filers are more likely to show derived or estimated.
Known limitations
Section titled “Known limitations”IFRS 20-F — live
Section titled “IFRS 20-F — live”Foreign private issuers file 20-F annual reports under the IFRS-Full taxonomy instead of US-GAAP. Roughly ~400+ companies on NYSE and NASDAQ file under IFRS today (Spotify, Nu Holdings, GlobalFoundries, Birkenstock, AngloGold Ashanti, Amer Sports, On Holding, Globant, XP Inc, Millicom, and others). They share the same /v1/us/sec/companies/{cik}/financials endpoint and the same canonical field names as US-GAAP filers, with two pieces of extra metadata on every response:
taxonomy: 'ifrs-full'(vs'us-gaap')currency: <ISO-4217>— e.g.,'EUR'for Spotify,'USD'for Nu Holdings,'BRL'for Brazilian IFRS filers_reporting_notes— an object with two always-present keys:presentation_format(by_function|by_nature|unknown) andifrs_18_applied(boolean). Five conditional keys may appear when the corresponding edge case fires:TAXONOMY_CHANGED_IN_AMENDMENT,CURRENCY_CHANGED_IN_AMENDMENT,TAXONOMY_DETECTION_AMBIGUOUS,CURRENCY_DETECTION_AMBIGUOUS,presentation_format_detection_note. Absent keys are dropped from wire output. For per-field confidence signals on derived values, seefield_confidence(a separate sibling ofline_items).
Reporting basis and comparability
Section titled “Reporting basis and comparability”IFRS and US-GAAP differ in ways that matter for cross-filer comparison. Thesma surfaces the divergences honestly rather than silently smoothing them. The six comparability caveats below apply at the line-item level — read them before comparing an IFRS filer’s canonical field to a US-GAAP filer’s canonical field.
- R&D capitalisation. IFRS allows capitalisation of development costs that meet the IAS 38 criteria; US-GAAP expenses most R&D. Compare
research_and_developmentwith care for companies that capitalise. - LIFO inventory. US-GAAP permits LIFO; IFRS prohibits it. Inventory costs and gross margins are not directly comparable between a LIFO US filer and an IFRS filer.
- Operating subtotal optionality. Before IAS 1 revisions took effect, IFRS didn’t require a disclosed operating-income line. Where the filer didn’t report one, Thesma derives
operating_incomefrom available subtotals. Check the row’sfield_confidencedict for a per-field confidence signal onoperating_incomewhen the derivation path is used —field_confidenceis a sibling ofline_itemsin the response and reports string confidence labels (e.g.,"medium") for fields with non-default confidence. - Finance costs scope. IFRS’s “finance costs” category may bundle items that US-GAAP reports separately (interest expense, FX losses, hedge ineffectiveness). Thesma maps to canonical
interest_expenseusing the filer’s disclosed breakdown; where only a combined total exists, the value is flagged viasource_tags. - Dividends classification. IFRS permits dividends paid to be classified under operating or financing in the cash flow statement. Thesma preserves the filer’s chosen classification rather than forcing one location — inspect
financing_cash_flowandoperating_cash_flowtogether for IFRS filers. - Bank extensions. IFRS banks (Nu Holdings today; other foreign IFRS banks are in progress) report many line items through bank-specific XBRL extension taxonomies. Thesma canonicalises the top-level totals — revenue, net income, total assets, equity, operating/investing/financing cash flows — but sub-line-item detail lives in company-specific extensions and is not normalised across banks in v1.
Native-currency reporting
Section titled “Native-currency reporting”Values are reported in the filer’s presentation currency with no USD conversion in v1. A Spotify row’s revenue: 17186000000 is €17.186B, not USD. The currency field on every response tells you the unit. Applications that need a unified USD view apply client-side FX conversion at a reference rate of their choosing.
USD-normalised IFRS values are on the roadmap for a future release; until then, the design choice is “honest native currency” over “lossy pre-conversion.”
See the IFRS filer recipe for an end-to-end worked example using Spotify.
Extension taxonomies
Section titled “Extension taxonomies”Filers can define their own XBRL extension concepts (e.g., aapl:ProductsRevenue). Extensions are parsed but not mapped into canonical fields by default — you’ll see them as separate entries in the filing detail but they don’t affect the canonical revenue number. This is intentional: extensions are by definition non-standard and aggregating them across filers would be meaningless.
Historical coverage breadth
Section titled “Historical coverage breadth”Normalisation is most reliable from ~2013 onwards, when XBRL became mandatory for the full income statement and balance sheet. Pre-2013 filings are present but have lower mapping confidence — treat historical comparisons with pre-2013 data accordingly.
Period-end date alignment
Section titled “Period-end date alignment”The canonical fiscal_year field is the year the reporting period ends, not the calendar year it was filed in. Apple’s fiscal_year: 2024 ended in September 2024 and was filed in November 2024. Don’t confuse this with the calendar year of the filing date (metadata.filed_at).
See also
Section titled “See also”- SEC EDGAR dataset — full endpoint list and coverage
- SEC financials recipe — end-to-end example
- API Reference —
/v1/us/sec/financials/fields— queryable mapping table