The one-line filter that hid 200 lines of working ranking code
Telemetry showed our semantic-search boosts were scoring correctly. The API returned the wrong order anyway. The bug was four characters of TypeScript.
A short, embarrassing-on-purpose story from the Pipeworx gateway. Lesson at the end if you want to skip the setup.
What we were trying to do
Pipeworx is an MCP gateway with about 1,070 tools — 295 packs of single-API wrappers, plus 8 gateway-native “compound” meta-tools that fan out across packs. entity_profile returns SEC filings + fundamentals + patents + news + LEI in one call. validate_claim fact-checks a natural-language claim against SEC XBRL. compare_entities parallels 2–5 companies or drugs.
Compound tools are the differentiated bit. They turn 15 sequential agent calls into one. But they’re worthless if agents can’t find them.
Telemetry from the past 24 hours: across 1,500+ calls to our discovery meta-tools (ask_pipeworx, discover_tools), validate_claim had 3 organic calls. All 3 were internal testing. Same story for entity_profile and recent_changes.
The discovery layer was getting plenty of traffic (~98% of meta-tool calls). It just wasn’t routing anyone to the compound tools.
The diagnosis
discover_tools does semantic search over the catalog. For a fact-check query like “verify that Apple revenue was $400 billion”, validate_claim was ranking around position 17–20 of 20 results. Pack tools like get_company_facts won on literal keyword overlap (revenue, Apple, billion). The compound tool’s description couldn’t compete on that axis no matter how it was worded.
The fix seemed straightforward.
The work
Three layers of progressively harder boost:
-
Description rewrites. Every meta-tool got rewritten to lead with user phrasings (“Use when a user says ‘compare X and Y’ / ‘tell me about Y’ / ‘is it true that…’”). Domain keywords mixed in to compete on literal overlap.
-
A general meta-boost. All meta-tools get +0.04 added to their cosine score — they’re cross-cutting, almost always cheaper than the alternative.
-
Intent-based boosts. Regex on the query for specific verbs. If the query contains “fact-check / verify / validate”, add +0.30 to
validate_claim. If “compare / vs / versus”, boostcompare_entities. And so on.
All of it shipped. Each piece tested correctly in isolation. Total: a couple hundred lines of new code across description rewrites, boost configuration, and intent regex.
The rankings didn’t move.
validate_claim was still at position 4-5 for fact-check queries. Pack tools were still winning. Same exact ordering as before the work.
The investigation
We bumped the validate_claim boost to 1.0 — an absurd value that should have put it at score 1.5+, beating anything else by a wide margin.
Same ranking.
At that point it was clear the boost code wasn’t firing. Or wasn’t being read. Or wasn’t reaching the response.
We added a console.log and tailed the worker:
[discover] intentBoosts: {"validate_claim":1} | top: [email protected]
The log was unambiguous. validate_claim was the top-scored tool. Score 1.685. The intent boost was firing. The math was right.
Then the API response came back with validate_claim at position 5.
The bug
Here’s the function that returns tool definitions from discover_tools:
const scored = toolEmbeds
.map((te) => ({
name: te.name,
score: cosine(te.vec, queryEmbed)
+ (META_NAMES.has(te.name) ? META_BOOST : 0)
+ (intentBoosts[te.name] ?? 0),
}))
.sort((a, b) => b.score - a.score)
.slice(0, limit);
const topNames = new Set(scored.map((s) => s.name));
return [...ALL_TOOLS, ...META_TOOLS].filter((t) => topNames.has(t.name));
Look at the last two lines. The scoring is correct. The sort is correct. The slice is correct. Then we throw away the order entirely: build a Set of the top N names, and filter() the original arrays for membership.
Array.prototype.filter preserves source-array order. Not score order. Set.has lookup gives you membership, not rank.
So the API returned the top N tools — but in the order they happened to appear in ALL_TOOLS (registration order from the workspace pack loop), then META_TOOLS (a static list at the bottom of the file). validate_claim’s real rank was #1. Its position in META_TOOLS is #7. It always came back at position 5 of the response no matter what its score was.
The fix
const byName = new Map<string, McpToolDefinition>();
for (const t of [...ALL_TOOLS, ...META_TOOLS]) byName.set(t.name, t);
return scored
.map((s) => byName.get(s.name))
.filter((t): t is McpToolDefinition => Boolean(t));
Iterate scored (sorted by score), look each up by name. Order is preserved.
One commit. After deploy, every intent query landed its target meta-tool at position 1.
| Query | Before | After |
|---|---|---|
| ”fact-check Apple revenue $400B” | get_company_facts | validate_claim |
| ”compare Apple and Microsoft” | get_company_facts | compare_entities |
| ”tell me about Acme Corp” | fintech_company_deep_dive | entity_profile |
| ”any updates on Tesla lately” | what_happened | recent_changes |
The lesson
This is the dumb kind of bug that takes hours to find because every component looks right in isolation. Embedding scoring tests passed. Boost calculations tested correctly. Description rewrites improved cosine similarity measurably. Telemetry showed the right tool was the top-scored tool.
But scoring and ordering are different things. If you sort a list, slice it, then re-filter from a different source, you’ve lost the sort. Obvious in hindsight; invisible in review when each piece looks correct.
Practical takeaways for anyone shipping semantic ranking:
- Instrument the response, not just the scoring. The intermediate logs all looked right. The bug was in the last step before the wire.
- A
Setis for membership testing. Don’t substitute it for ordering. - When rankings don’t match expectations, the bug is usually in the post-scoring step. Scoring is easy to test; ordering is easy to drop.
- The “this worked in my test” trap. Each piece worked. The composition didn’t. That’s the integration bug class, and the only defense is end-to-end test on the actual API output.
After the fix, validate_claim is reachable for fact-check intents. compare_entities wins on comparison queries. entity_profile wins on “tell me about” framings. Compound tools are finally findable by the agents that need them.
We’ll know in 24 hours whether agents start actually using them.
You can try the gateway at gateway.pipeworx.io/mcp. Free anonymous tier, no signup. If your agent has been asking “fact-check this claim about a public US company’s revenue,” validate_claim is the call.