← Blog

Builder-Validator with Pipeworx: structural fact-checking for AI agents

The Builder-Validator pattern catches AI hallucination by giving two agents structurally incompatible incentives. Pipeworx ships the validator side as a meta-tool — validate_claim grounds natural-language financial claims against SEC XBRL.

The Builder-Validator pattern is having a moment. The idea is simple: don’t try to write one prompt that gets everything right — run two agent calls with structurally incompatible incentives, and let the tension do the work.

builder = claude.complete(
    system="Senior analyst. Make the strongest argument.",
    prompt=task,
)
validator = claude.complete(
    system="Auditor. Find every error, edge case, unsupported claim.",
    prompt=f"Review this and flag any unsupported or factually wrong claim:\n{builder.output}",
)

The pattern works because the validator’s job description forces a different read of the same content. A builder optimizing for “best argument” will paper over weak evidence; a validator optimizing for “find the flaw” won’t.

There’s a limit, though. A validator that only re-reads the builder’s prose can catch logical inconsistencies and obvious overstatements, but it can’t catch a confidently-wrong number. If the builder claims “Apple’s 2024 revenue was $412 billion”, a prose-only validator has no way to know that figure is wrong (it’s $416.1B in the FY2024 10-K).

The fix is to give the validator a primary source. That’s where validate_claim fits.

validate_claim is the validator side, built in

Pipeworx ships validate_claim as a gateway-native meta-tool. It takes a plain-English claim about a public US company’s financials and checks it against SEC XBRL data — the structured financial statements that filers themselves submit to the SEC.

const result = await px.call('validate_claim', {
  claim: "Apple's 2024 revenue was $412 billion",
});

Output shape (verdict, citation, actual_value, discrepancy):

{
  "verdict": "refuted",
  "actual_value": { "value": 416161000000, "unit": "USD", "period": "2024-09-28" },
  "discrepancy": "Claim states $412B; SEC XBRL reports $416.16B (FY2024).",
  "citation": "pipeworx://edgar/company/0000320193/filings",
  "claim_type": "company_financial"
}

The verdict is one of five values: confirmed, approximately_correct, refuted, inconclusive, or unsupported. The citation is a stable Pipeworx resource URI an agent can embed in its final output for audit later.

Used as the validator step:

const draft = await builderAgent(task);

// Parse every numerical claim out of the draft and validate each.
const claims = extractClaims(draft);
const verdicts = await Promise.all(
  claims.map((c) => px.call('validate_claim', { claim: c })),
);

const refuted = verdicts.filter((v) => v.verdict === 'refuted');
if (refuted.length) {
  // Send the draft back to the builder with the refutations.
  return regenerateWith(draft, refuted);
}

The structural property is the same as the original pattern — incompatible incentives between the two roles — but now the validator is grounded against the actual filing instead of just rereading the builder’s argument.

Why this matters more for agents than for humans

A human analyst reading “Apple’s 2024 revenue was $412 billion” might catch the error if they happen to remember the actual figure. An AI agent reading the same sentence will quote it confidently downstream, and may even cite a fabricated 10-K page number to justify it. The hallucination risk compounds: each downstream tool call that consumes the wrong figure carries it further.

Structural validation breaks the chain. Either the claim survives a check against the source of record, or it gets flagged before the next step runs. There’s no in-between.

This is also more reliable than prompt-level fact-checking. Telling a model “double-check your numbers” produces the appearance of verification without the underlying check — the model usually re-asserts the same number with more confidence. Telling it “call validate_claim on each numerical claim before responding” produces an actual lookup with an actual verdict.

What the v1 covers (and what it doesn’t)

validate_claim currently supports claim_type: "company_financial" — revenue, net income, cash, debt, and other items reported in SEC XBRL filings for US public companies. The meta-tool itself tells you when a claim type isn’t supported by returning verdict: "unsupported" along with a hint about what is.

Coming next:

  • Drug claims against FDA / DailyMed / RxNorm (does Ozempic’s label list this side effect? Is this drug interaction documented?)
  • Clinical-trial claims against ClinicalTrials.gov (does this Phase 3 trial actually exist? Is its primary endpoint X?)
  • Economic claims against FRED (was unemployment 4.1% in March 2025?)
  • Trade claims against UN Comtrade / Census (is the US-China trade deficit really that figure?)

The pattern generalizes — any structured authoritative source can become a primary-source validator. We’re prioritizing the claim types where AI hallucination causes the most real-world damage: financial statements, medical info, and economic indicators.

Try it

If you already have Pipeworx connected to your MCP client (Claude Desktop, Claude Code, Cursor, ChatGPT, etc.), validate_claim is one of the gateway-native meta-tools — call it the same way as any other tool. If you don’t, the stack guide shows the one-line config.

The Builder-Validator pattern is real and structurally sound. What was missing for AI agents was a validator with a primary source to check against. That’s the gap validate_claim fills.