Card Network Architecture / Whitepaper / v1.0 / April 2026

Your AI bill is broken.
Here is the math, the fix, and the proof.

A 5,000-word founder-direct whitepaper on Card Network Architecture. Five real save-a-business scenarios, real ROI math, real Magic Sprint pricing, and a live retrieval benchmark you can run yourself before you talk to anyone here.

Last month a founder I know got pulled into a budget review. Series-A SaaS, five product features, all of them now wired through ChatGPT, Claude, or GPT-4 vision. Twenty months ago he had zero AI line items. This month his finance team flagged the OpenAI invoice as the third-largest infrastructure cost in the business, ahead of every database he runs. Token spend was up roughly 25% month over month, and nobody could tell the CFO when that curve was going to flatten.

His head of support told a similar story from a different angle. They had bolted Fin AI on top of Intercom to deflect tickets. The CFO had been promised that Fin would cut the support team in half. Instead Fin was kicking 60% of conversations back to humans because the knowledge articles it pulled from were 8,000 tokens long and nothing useful fit a single context window. The support team was still four full-time engineers. The AI bill was an extra $15,000 a month on top of headcount they had not been able to reduce. The AI was supposed to make him cheaper. It was making him more expensive.

Both of these stories have the same root cause. The substrate underneath their AI agents is the wrong shape. They are shoving entire documents into prompts because the chunking happens at retrieval time, in a hurry, with no edge typing and no semantic boundaries. The agent gets too much context for the question and not enough of the right context. Tokens go up. Quality goes down. The CFO calls a meeting.

This whitepaper is about the fix. It is not about a new model. It is not about a new vector database. It is about pre-chunking your knowledge at authoring time into ~250-token cards with typed edges, so that every model and every agent retrieves three to five cards instead of paragraphs of unstructured noise. We call the substrate Card Network. The math underneath it is mechanical, the savings show up in 60 days, and you do not have to take our word for it. Section 4 of this paper links to a benchmark you can run live, against the same model and the same corpus, side by side, in your browser.

Here is the rest of the paper. Section 1 puts a number on the hidden tax of runtime RAG. Section 2 explains what changes when you pre-chunk and type your edges. Section 3 walks through five real scenarios where Card Network has saved real money. Section 4 sends you to the live benchmark. Section 5 maps each scenario to a Magic Sprint tier and a price. Section 6 makes the case for why this window matters now, not in 2027. Section 7 gives you three doors. Pick one.

1. The hidden tax of runtime RAG

Most retrieval-augmented generation pipelines were built in 2023 by engineers who had two weeks to ship a prototype. They wrote a chunker that cuts every document into 1,000-character or 1,500-token windows, embedded those windows into a vector database, and called it good. Two years later that prototype is in production, serving every agent in the company, and the cost curve is the line item that nobody wants to defend in front of the CFO.

Here is the part nobody quantifies. When you chunk at retrieval time without edges, you have no semantic boundary. The chunker cuts mid-paragraph, mid-table, mid-thought. To compensate, your retrieval layer pulls more chunks per query, which means a larger context window, which means more input tokens at the LLM, which means a bigger bill at the end of the month. We have audited customer pipelines where 60 to 80% of the input tokens going to the model were context the model did not need. The model paid full freight on every one of those tokens anyway.

Treat that as a P&L line item, not an architecture diagram. If your agent makes 200,000 calls per month and each call sends 8,000 tokens of context to the model, you are paying for 1.6 billion input tokens a month. At today's commercial Workers AI rates that is meaningful spend. If 70% of those tokens are noise, you are paying for 1.12 billion tokens of noise every month and getting nothing for it. The CFO is right to flag the number. Your engineers are not lying about the tokens. They built the system in two weeks and never went back.

The other tax is invisible until it bites. Quality degrades when you stuff a model with too much context. Modern LLMs lose precision when their input window is full of marginally relevant text. We have seen support deflection rates fall from 60% to 35% in customer deployments simply because the retrieval layer started returning too many chunks per query as the corpus grew. The model has the right answer somewhere in the prompt; it cannot find it in time. The customer ends up routed to a human, who costs an order of magnitude more than the AI was supposed to.

Add it up. Retrieval-time chunking without edges produces three line items the CFO can quantify: an AI input bill that should be 30 to 60% smaller, a support headcount that should be 30 to 50% smaller, and an engineering team that spends a third of its time maintaining retrieval plumbing instead of shipping features. None of these are projected savings. Each one is something we have measured against actual customer telemetry inside the past nine months.

See the live benchmark on /proof for the latest numbers. The benchmark uses the same Workers AI model on the same article corpus on both runs, so you are not comparing apples to oranges. You can also run your own question against it.

2. What changes

Card Network is a substrate, not a library. The substrate has two pieces. First: every piece of content in your corpus gets chunked at authoring time into atomic cards roughly 250 tokens each. Each card is self-contained. It has a header, a body, and metadata. It can stand alone, mean something on its own, and be retrieved on its own. Second: every relationship between cards is expressed as a typed edge. We support seven edge types — stack, link, embed, branch, depends, produces, and sync — and they cover roughly every relationship a corpus has in practice.

The shift from runtime to authoring time is the one that pays. Authoring time means the human or the agent that wrote the content also chunked it, set its boundaries, and labeled its edges. By the time a query arrives, the work is done. Retrieval is a graph traversal across an index of pre-chunked cards, not a panicked guess about where to cut a 12,000-token document. Three to five cards come back. The model gets exactly what it needs. The input context shrinks 60 to 80%. The cost line item shrinks the same percentage.

This is not theory. We run Card Network internally across twelve different D1 systems for our own products. We have used it to serve agent context in our customer-support stack, our sales-enablement stack, and the live demo at /read. The /read page is the proof-of-concept you can scroll right now. It is the same article you would read in a docs site, only it is composed of 12 cards with typed edges, and an agent can retrieve any subset of those cards in under a second without an LLM call. Read the article like a human. Watch the dock on the right show you which edges connect each card to its neighbors. Then ask yourself how much your retrieval pipeline would cost if every doc on your site was shaped like this.

The edge types matter as much as the chunk size. A naive vector database will give you a similarity score and stop. Card Network gives you the same similarity score plus the ability to traverse along typed edges. If a customer asks "what depends on this?", you can answer in microseconds with a graph walk; you do not need an LLM to figure it out. If a sales rep asks "what is the right battle-card for fintech 250-employee ICP?", you can filter cards by an edge typed "ICP=fintech-mid-market" before similarity ranks the rest. Typed edges turn a flat vector index into a real query substrate. Your agents stop guessing.

Pre-chunking also unlocks aggressive caching. Cards are immutable; they have stable hash IDs. Once a card is in your CDN cache it can stay there until you republish it. We have customers serving 60% of their agent retrieval traffic from edge cache, paying nothing for the model call, paying nothing for the vector search, paying nothing for the embedding. The model gets called only when the cards have changed. That is the same cost shift cloud providers got from CDNs in 2008. We did it for AI context.

3. The five save-a-business scenarios

Every consultant who pitches Card Network opens with the same question: "Where is the bleed?" A CFO cannot buy "your AI agents will be 8-second-cards-aware." A CFO can buy "you stop paying $80,000 a quarter for support agents that hallucinate because their context window is broken." These are the five scenarios we use as opening pitches. Each one is real. Each one ends with a Magic Sprint tier and a price. Find the one that sounds like you.

Scenario A / Support burn

Your AI customer-support agent burns $40K/quarter and still routes 60% of tickets to humans

Symptom: SaaS with 5K MAU runs Intercom + Fin AI. Fin escalates 60%+ of tickets because the knowledge base it pulls from has 8K-token articles that don't fit a single context window. AI-spend is $15K/mo ($45K/quarter). Human-team is still 4 FTE because Fin can't actually deflect anything substantive.

CARD Network fix: migrate 80-200 KB articles into 800-1500 CARDs (250 tokens each). Retrieval pulls 3-8 cards per query; total context under 2K tokens. Fin's deflection rate jumps from 40% to 75% in 60 days.

Math: support team goes from 4 FTE to 2 FTE = $200K/yr saved. AI-spend drops by 30-50% as token count per query shrinks = $60K/yr saved. Total: $260K/yr saved.

CARD Network price: Magic Sprint Custom Medium $80K-$130K one-time. Payback under 6 months.

Consultant pitch: "We save you a quarter-million a year on support. Pays for itself before you finish onboarding the next product manager."

Scenario B / Battle-card mess

Your sales team can't find the right battle-card so reps freelance their pricing

Symptom: B2B SaaS with 20-rep sales team. Battle-cards live in Notion (long), Highspot (org chart-shaped), Slack (ephemeral). Rep on a call with prospect can't pull "what's our differentiator vs Competitor X for 250-employee fintech ICP" in real-time. Rep guesses or quotes wrong. Conversion rate sits 6-9 points below industry benchmark.

CARD Network fix: ingest battle-card library into CARD Network with edges typed by ICP, competitor, deal-size. Sales-rep agent (Magic Compass / Magic Hub variant) pulls 3-5 relevant cards in under 2 seconds during the call.

Math: lift conversion 4-6 points = $2-3M new ARR on a $30M ARR base. CARD migration $95K-$130K.

CARD Network price: Magic Sprint Custom Medium $80K-$130K one-time. Payback in the first quarter.

Consultant pitch: "We turn your battle-card mess into a real-time co-pilot. 5-point conversion lift pays this back in the first quarter."

Scenario C / RAG pipeline rebuild

Your engineering team rewrites the same RAG pipeline three times because each AI tool is a different shape

Symptom: Mid-stage AI startup runs multiple model providers and embedding stacks. Each integration team rebuilds chunking + embeddings + retrieval. Three RAG pipelines to maintain, three eval harnesses, three drift-detection systems. Eng team spends 30-40% of their time on retrieval-layer plumbing.

CARD Network fix: one CARD-shaped corpus, one embedding pipeline, one retrieval layer. Every model/agent reads the same substrate.

Math: reclaim 30-40% of an 8-eng team = $1.2M/yr in eng-time recovered. CARD migration $130K-$250K (Magic Sprint Custom Medium-to-Large depending on scope).

CARD Network price: Magic Sprint Custom Medium-Large $130K-$250K one-time. Payback under 4 months.

Consultant pitch: "Your engineers should ship features, not maintain three RAG pipelines. We give you one substrate that makes the model-of-the-month interchangeable."

Scenario D / AI-bill spike

Your AI features hit $8K/mo in model bills and the CFO is asking why

Symptom: Series-A SaaS bolted ChatGPT + Claude + GPT-4 vision into 5 product features in 6 months. Token-spend climbs 20-30% MoM. CFO asking when this stabilizes; founder doesn't know.

CARD Network fix: authoring-time pre-chunking means context windows shrink 60-80%. Same answer quality, fraction of the tokens. Plus ability to cache cards (immutable hash IDs).

Math: $8K/mo → $2-3K/mo after CARD-ification = $60-72K/yr saved.

CARD Network price: Magic Sprint Custom Small $30K-$60K one-time. Payback under 12 months and CFO sleeps again.

Consultant pitch: "We cut your AI bill 60-70% in 60 days. CFO loves you. You can finally green-light the next AI feature."

Scenario E / Call recordings

Your sales-call recordings are gold and you can't use them

Symptom: Series-B SaaS records 200+ calls/month. Gong + Chorus give playback. No way to ask "across all my Q3 calls, which 5-token objections came up most often when we pitched Tier 2 to fintechs?" The data is there; the question is unanswerable.

CARD Network fix: call-transcript ingester chunks each transcript at semantic boundaries into CARDs typed by speaker / sentiment / topic / objection / outcome. Agent retrieves cross-call patterns in seconds.

Math: product + sales + CS team gets a real-time research engine. Hard to math directly; usually drives 2-3 product-roadmap pivots a quarter that compound. Most CFOs buy this on "we already paid for the data, it's malpractice not to use it."

CARD Network price: Magic Sprint Custom Large $150K-$250K one-time. Payback in research-driven roadmap pivots over 2-3 quarters.

Consultant pitch: "You're sitting on 2,400 hours of recorded customer truth and your sales VP can't query it. We turn that pile into a real-time research engine."

The pattern across all five scenarios is the same. The customer is bleeding money in a way they can quantify. The bleed is because their AI substrate is wrong-shaped — too-big chunks, no edges, no retrieval discipline. Card Network fixes the shape. Payback lands inside six months in every scenario except E, where the payoff is research-driven and longer-tailed. The consultant's job on the first call is just to tell you which scenario you are in. Sometimes you are in two of them at once.

4. Live benchmark: see for yourself

Stop reading and run the benchmark. We built it specifically because every founder on a 30-minute call asks the same question: "Where is the proof?" The proof is at cardnetwork.dev/proof. Same Workers AI model on both runs. Same article corpus on both runs. Same question on both runs. The traditional run stuffs the entire 12-card corpus into the prompt. The Card Network run retrieves the top 3 cards by cosine similarity. We capture token counts, latency, cost, and the actual answers in real time.

The numbers we have measured most often: roughly 75% fewer input tokens, faster end-to-end response, lower cost per call. The Card Network run produces an answer of equivalent quality on every question we have tried, sometimes a sharper one because the model is not distracted by surrounding noise. You can ask your own question on the page — we cache the result for seven days so the second person who asks the same question pays nothing. That cache pattern alone is one of the cost-shifts you get from immutable card IDs.

The benchmark is intentionally honest. We tell you which model fired, what the embedding model is, what the cost band assumption is, and what the cache key looks like. There is no place for a vendor to hide. If your engineering team wants to reproduce the result on their own corpus before signing a Sprint, that is precisely the conversation we want to have. The whole point of running the benchmark in the open is to make the next call shorter.

Numbers from the live benchmark belong on the page where they are running, not in a static PDF. Go run it. Then come back and read section 5.

5. Implementation path: the Magic Sprint tiers

Card Network ships through Magic Sprints. A Sprint is a fixed-scope, fixed-price engagement that takes a corpus from where it is today to a Card Network deployment in production. We have nine tiers. Five are productized — same shape every time, fixed price, Stripe checkout, no SOW negotiation. Four are custom Sprints scoped to your bleed. Each tier maps to a state-change scope: E (edge — one feature), S (substrate — one knowledge function), M (multi-system — multiple corpora and agents), B (business-wide — company-level retrieval substrate).

Productized Sprints / Stripe checkout / fixed scope

  • AI Readiness Audit — $1,997. 1-week scoped audit of your existing AI/RAG stack. Output: bleed map, scenario fit, recommended Sprint tier. Payback: pays for itself if it routes you to the right tier.
  • Marketing Diagnostic — $2,997. 1-week diagnostic on your marketing-content corpus. Output: chunking plan, edge-typing plan, agent integration plan. For founders running content + AI marketing.
  • Data + Brain Setup — $4,997. 2-week productized setup of a small Card Network corpus, deploy on your infra. Up to 100 cards + edges. For pre-Series-A teams that want a working substrate before they scale.
  • Sales Pipeline Setup — $4,997. 2-week productized battle-card library migration. Up to 200 sales cards typed by ICP, competitor, deal-size. Pairs with Scenario B.
  • Operations Stack Setup — $7,997. 3-week ops-corpus migration: SOPs, runbooks, escalation matrices. Up to 400 cards. For teams replacing internal-knowledge wikis with retrievable substrate.

Custom Sprints / Talk to Mike / scoped to your bleed

  • Custom Small — $30K-$60K (Scope E). One feature shrunk: cut a runaway model bill, ship one corpus, prove the substrate. Scenario D fit. Payback under 12 months.
  • Custom Medium — $80K-$130K (Scope S). Whole knowledge function migrates: customer support, sales enablement, internal-search. Scenarios A and B fit. The most common Sprint. Payback under 6 months.
  • Custom Large — $150K-$250K (Scope M). Multiple corpora, multiple agents, multiple model providers. Scenarios C and E fit. Eng team reclaims 30-40% of their time. Payback under 4 months on eng-time alone.
  • Custom Enterprise — $250K+ (Scope B). Company-wide substrate, multi-business-unit deploy, compliance + audit-grade isolation. Long-tail strategic engagement. 12-month operations + roadmap retainer included.

Pick by scope, not by price. If you are migrating one feature, Custom Small is the right tier even if your business has $500M in revenue. If you are replacing your support corpus, Custom Medium is the right tier even if your team is six people. The price sits inside the scope; we do not bill by the hour. Mike's time inside any custom Sprint is buried inside the fixed fee — your finance team gets a clean line item, not a timesheet.

Every Sprint follows the same shape. Week one: discovery and edge-mapping. Weeks two through four: card authoring at semantic boundaries. Weeks five through six: agent integration and eval harness. Weeks seven through eight: production deploy and 30 to 180 days of operations support depending on tier. By week eight your retrieval substrate is in production, your token bill has dropped, and your CFO is asking what else we can plug into Card Network.

See the full Magic Sprint tier table at /sprint. Productized tiers have a "Buy now" button. Custom tiers have a "Talk to Mike" button. Both close the same week.

6. Why now

Agent deployment is scaling roughly 10x year over year inside the customers we audit. The teams that are quietly winning are the ones that figured out their substrate first — pre-chunking, edge typing, retrieval discipline. The teams that are bleeding are the ones still running 2023-vintage RAG with a vector database and a chunker that nobody has revisited. The gap between the two camps is widening every month. Card Network is the bridge.

Token costs compound the same way headcount compounds. Six months from now, the same agent doing the same job is sending more tokens because someone added a new feature, the corpus grew, the chunker started returning more chunks per query. CFOs who flagged the AI line item in Q4 will not stop flagging it. Boards are starting to ask CTOs the question that used to be reserved for cloud bills: "What is your unit economics on AI inference?" The teams with a Card Network answer have a number. The teams without one have a story.

The defensibility window matters too. Card Network is patent-pending. The protocol ships dual-licensed: PolyForm Noncommercial for the spec, per-organization Commercial tiers for production deployment. Customers who deploy now lock in early-mover commercial pricing and reference-customer status. We will not retroactively raise the rate on existing customers; the trajectory shows up in new customer pricing as the public artifacts ship — patent provisional, arXiv paper, first case studies. By summer, every claim in this whitepaper will have a public artifact behind it. By fall, our reference rate will be higher.

The compliance window is the third reason. Audit teams are starting to ask AI-using companies to demonstrate per-tenant isolation, retrieval determinism, and token-spend forecasting. Card Network was designed against those requirements from day one. Per-tenant card stores share the protocol but never the substrate. Hash IDs on cards make every retrieval deterministic and auditable. Cost forecasting is built in because token usage is bounded by chunk size. Card Network is the substrate that survives an audit.

7. Three doors. Pick one.

You have read 5,000 words. You either recognize one of the five scenarios or you do not. If you do, here are three doors to walk through. They are sequenced from cheapest to most committed, but you can walk through them in any order. We honor all three.

If none of the three doors fit, drop your email in the form below and tell us the scenario you are working on. We read every submission. If your scenario does not exist in our 5-scenario list, we want to hear it — that is how the next scenario gets added.

End of whitepaper. Source: Card Network Business Plan v2 §1-§6, Card Network Architecture v1.0 spec, live benchmark on /proof.

Three doors / pick one

Did this paper sell you?

Drop your email. We send a PDF copy, schedule a 30-minute discovery call with Mike, and run a free CARD-readiness audit on your stack.

Book a 30-min call instead

No spam. We use your email to send the Migration Playbook + whitepaper, schedule the call, and follow up on the audit. That is it.