← Home

Pipeworx vs Exa

structured primary-source records vs neural web search for AI

Pipeworx is for

structured, citable records from 877 authoritative sources — the filing itself, not pages about it.

Exa is for

neural web search built for AI — finding and extracting from web pages at scale, plus Websets for structured web research.

Exa is the strongest of the search-APIs-for-AI: neural retrieval over the web, content extraction, and Websets for turning web research into structured tables. Pipeworx starts one layer deeper: instead of searching pages ABOUT a thing, it returns the authoritative record OF the thing — the SEC filing, the FRED observation, the FDA entry — as structured JSON with provenance metadata and a stable pipeworx:// citation. Web search is the right tool when the answer lives on arbitrary web pages; a data gateway is the right tool when the answer lives in a known authoritative system. Most capable agents need both — and the failure mode we exist to prevent is reaching for web search when a primary source exists.

Side-by-side

Pipeworx Exa
Unit of retrieval The authoritative record — structured JSON from the source system Web pages/passages — neural search + extraction
Provenance _meta.source + fetched_at + stable pipeworx:// citation URI on every response Result URLs; no provenance/citation scheme
Grounded answering ask_pipeworx_grounded — extractive answer + verbatim evidence + explicit refusal when the data doesn't say Returns search results; answering is your model's job
Structured output Native — every tool returns typed JSON with outputSchema Websets builds structured tables from web research
Monitoring Data-event subscriptions (8-Ks, FRED, markets, patents, trials) with push delivery Not an event-subscription product
Interface MCP gateway (one URL, NL router) + REST API + MCP servers for search/Websets

When to use which

Use Exa if

  • The answer lives on arbitrary web pages — news commentary, company sites, long-tail content
  • You need web-scale retrieval and extraction as a building block
  • You're assembling research tables from open-web sources (Websets)

Use Pipeworx if

  • The answer lives in a known authoritative system — filings, statistics, registries, markets
  • You need citations that resolve to the primary record, not to a page that mentions it
  • You want grounded extraction with refusal semantics instead of raw results
  • You want to subscribe to data events rather than re-searching

Connect Pipeworx in one line

Add this to your MCP client (Claude Desktop, Cursor, VS Code, Claude Code, etc.) — no API keys required for public data sources.

{
  "mcpServers": {
    "pipeworx": {
      "url": "https://gateway.pipeworx.io/mcp"
    }
  }
}

Common questions

When should an agent use Exa vs Pipeworx?

Decision rule: if a primary source exists for the question (a filing, a statistical series, a registry), use Pipeworx — you get the record itself, structured, with a verifiable citation. If the question is about open-web content — what people are saying, niche pages, long-tail research — use a web-search API like Exa. Agents that route this way hallucinate less and cite better.

Isn't web search strictly more general?

More general, less authoritative. A web search for "Apple FY2024 revenue" returns pages that paraphrase the 10-K, possibly outdated or wrong. Pipeworx returns the XBRL figure from the filing with a pipeworx:// citation to it. Generality is a feature for discovery; provenance is the feature for answers an agent will be held to.