Article Demo / Walk a Real Card-Edge Graph

Why your AI agents need authoring-time pre-chunking

Or: how to give a knowledge worker a second brain that the LLM can actually read.

RAG knowledge graphs second brain business data for LLMs AI for business token efficiency agent context window authoring-time pre-chunking

Scroll to read. 12 cards. ~3000 tokens. Each card is its own URL.

Card 1 of 12 content card:intro
8,000-token doc card 1 ~250 tokens card 2 ~250 tokens card 3 ~250 tokens card 4 ~250 tokens card 5 ~250 tokens card 6 ~250 tokens

The problem in one sentence

If your AI agent is making things up, it isn't because the model is dumb. It's because you handed it the wrong 8,000 tokens. Modern AI for business depends on getting the right 250-token context in front of the model at the right moment, every time. That problem has a name in retrieval circles: chunking. And nearly every team is doing it at the wrong time.

We call the right approach authoring-time pre-chunking. The cards you'll read in this article are themselves a working example of the protocol it describes. Each card is roughly 250 tokens, has typed edges to other cards, and was authored in this shape on purpose.

Card 2 of 12 content card:runtime-rag-tax
8K → 250
tokens shipped per agent question, before vs after authoring-time pre-chunking

The runtime-RAG tax

Most AI for business shipped in the last 18 months runs the same playbook: take an existing pile of documents, embed them at request time, retrieve the top-k chunks, stuff them in the prompt. We call that runtime RAG. It's quick to ship, it impresses a demo, and it imposes a quiet tax on every single query.

The tax shows up in three places: token spend (you ship 8K tokens to answer a 250-token question), accuracy (the chunker breaks paragraphs at the wrong place), and engineering hours (every model swap forces a re-tune of the chunker). Token efficiency, second-brain quality, and LLM support burn all live or die at this layer.

Card 3 of 12 content card:what-pre-chunking-is
2K-TOKEN WINDOW card 1 card 2 card 3 card 4 RUNTIME-RAG DUMP overflows the window

What authoring-time pre-chunking actually means

Authoring-time pre-chunking flips the runtime-RAG model on its head. Instead of asking a chunker at retrieval time "where should I cut this 8K-token doc?", you author the corpus as discrete cards from day one. Each card is a self-contained ~250-token unit with metadata, an explicit type, and typed edges to other cards.

The payoff: retrieval becomes deterministic. You ask the graph for cards that match a query, the graph returns 3-8 cards under 2K tokens, and the LLM gets exactly the substrate it needs. No 8K-token padding. No mid-paragraph cuts. No cross-fingers eval pass.

Card 4 of 12 content card:second-brain

A second brain for AI agents

When people say "second brain" they usually mean a personal knowledge system. The same idea applies to AI agents at work. Your agent's first brain is the LLM weights. Its second brain is whatever knowledge graph it can query at runtime. Today most agents have a chaotic second brain made of long PDFs, ephemeral chat snippets, and an embedding store that hallucinates structure.

A Card Network second brain looks different. It is shaped like the questions agents ask. It composes laterally via typed edges. And it lets the agent tell the difference between "reference material" and "prerequisite knowledge" without reading every paragraph.

{
  "id": "card-second-brain",
  "type": "content",
  "header": { "title": "Second brain card" },
  "edges": [
    { "type": "stack",   "to": "edge-types"    },
    { "type": "depends", "to": "what-pre-chunking-is" },
    { "type": "link",    "to": "support-burn"  }
  ]
}
Card 5 of 12 content card:edge-types
stack link depends branch produces embed sync A B C D E F

The seven edge types

Card Network defines seven typed edges, each with a meaning the agent can act on without LLM inference:

  • stack A then B (linear sequence)
  • link A mentions B (lateral reference)
  • embed A contains B (compositional)
  • branch if X then B (conditional)
  • depends A requires B (prerequisite)
  • produces A creates B (output)
  • sync A and B mirror each other

This is the difference between a knowledge graph and a knowledge bag. Untyped links say "there's a connection here." Typed edges say "this is the *kind* of connection." Agents can traverse the graph without paying token cost on every step.

Card 6 of 12 content card:support-burn

Case A: AI customer-support agent burning $40K/quarter

Real bleed-scenario from a SaaS customer with 5,000 monthly active users. Their AI support stack: an agent on top of an 8K-token knowledge base. Symptoms: 60%+ of tickets escalate to humans because the agent's context window keeps missing the key paragraph. AI bill is $15K/month. Human team is still 4 FTE because the agent can't deflect anything substantive.

The Card Network fix is mechanical: migrate 80-200 KB articles into 800-1500 cards (250 tokens each). Retrieval pulls 3-8 cards per query, total context stays under 2K tokens. Deflection lifts from 40% to 75% in 60 days. Support team drops from 4 FTE to 2 FTE. AI bill drops 30-50%. Total annual savings: $260K. The migration is a Magic Sprint Custom Medium ($80K-$130K). Payback under 6 months.

Deflection lifts from 40% to 75% in 60 days. AI bill drops 30-50%. Total annual savings: $260K. Payback under 6 months.

Case A / SaaS, 5,000 MAU
Card 7 of 12 content card:battle-cards
doc tool long-form enablement org-chart chat ephemeral CARD NETWORK 3 to 5 cards per query

Case B: battle-cards your reps can't find

Different shape, same root cause. A B2B SaaS with a 20-rep sales team has battle-cards living in three different tools: a doc tool (long), an enablement platform (org-chart shaped), and a chat tool (ephemeral). On a live call, a rep cannot pull "what's our differentiator vs Competitor X for a 250-employee fintech ICP" in real time. The rep guesses. Conversion sits 6-9 points below industry benchmark.

Ingest the battle-card library into Card Network with edges typed by ICP, competitor, and deal-size. The sales-rep agent pulls 3-5 relevant cards in under 2 seconds during the call. Lift conversion 4-6 points and that's $2-3M new ARR on a $30M ARR base. Card migration: $95K-$130K. Five-point conversion lift pays this back in the first quarter.

Card 8 of 12 content card:rag-pipeline-rebuild
3 → 1
RAG pipelines collapsed into one card-shaped substrate. Reclaim 30 to 40 percent of an 8-engineer team.

Case C: rebuilding the same RAG pipeline three times

Mid-stage AI startups end up with three RAG pipelines because they bolted on three model providers without a shared substrate. Each integration team rebuilds chunking + embeddings + retrieval. Three eval harnesses, three drift-detection systems, three on-call rotations. Engineering spends 30-40% of their time on retrieval-layer plumbing.

Card Network collapses this. One card-shaped corpus, one embedding pipeline, one retrieval layer. Every model and every agent reads the same substrate. The model-of-the-month becomes interchangeable, not a quarterly migration. Reclaim 30-40% of an 8-engineer team and you've recovered $1.2M/yr in eng-time. Card migration is a Magic Sprint Custom Medium-to-Large ($130K-$250K).

Card 9 of 12 content card:ai-bill-spike
$/mo $8,000 runtime RAG $2,500 card-ified -69%

Case D: the AI bill is climbing 30% MoM

Series-A SaaS bolted multiple LLMs into 5 product features in 6 months. Each feature shoves entire docs into prompts because nobody has time to redesign the substrate. Token-spend climbs 20-30% MoM. CFO is asking when this stabilizes. Founder doesn't know.

Authoring-time pre-chunking shrinks context windows 60-80%. Same answer quality, fraction of the tokens. Plus agent-friendly caching on immutable card hash IDs (every card has a stable content hash, so the LLM-cache hit rate jumps). $8K/month becomes $2-3K/month after card-ification, $60-72K/yr saved. Card migration is a Magic Sprint Custom Small ($30K-$60K).

Card 10 of 12 content card:call-recordings
2,400 hrs
of sales calls already paid for. Today: unsearchable. After card-ification: queryable in seconds.

Case E: 2,400 hours of call recordings nobody can query

Series-B SaaS records 200+ sales calls a month. Playback exists; search does not. Nobody can ask "across all my Q3 calls, which 5-token objections came up most often when we pitched Tier 2 to fintechs?" The data is there. The question is unanswerable. Most CFOs buy the fix on a single sentence: "we already paid for the data, it's malpractice not to use it."

The Card Network fix: a transcript ingester that chunks each call at semantic boundaries into cards typed by speaker, sentiment, topic, objection, and outcome. The agent retrieves cross-call patterns in seconds. Product, sales, and customer-success all get a real-time research engine. Card migration is a Magic Sprint Custom Medium-to-Large ($95K-$250K).

Card 11 of 12 content card:tldr-cfo
Small $30K – $60K Medium $80K – $130K Large $150K – $250K Enterprise $250K+ PAYBACK UNDER 6 MONTHS

TL;DR for the CFO

Five real bleed-scenarios. Same root cause: business data for LLMs is shaped wrong. The fix is the same shape every time: pre-chunk at authoring time, type your edges, give the agent a substrate it can actually walk.

Magic Sprint pricing maps to the bleed: Small $30K-$60K (one runaway AI feature), Medium $80K-$130K (a whole knowledge function), Large $150K-$250K (multi-corpus, multi-agent), Enterprise $250K+ (company-wide substrate). Payback band: under 6 months on every Sprint we've modeled. Run the numbers yourself on the ROI calculator.

Card 12 of 12 action card:next-step

What to do next

Three doors:

  1. CFO door: open the ROI calculator at /roi. Plug in your real numbers. See your savings band and your recommended Sprint tier in 30 seconds.
  2. Engineer door: read the whitepaper at /whitepaper. The protocol is documented down to the JSON Schema.
  3. Sales door: scope a Magic Sprint at /sprint. Email sales or book a discovery call.

The substrate works the same way no matter which door you walk through. Every door eventually leads to the same Card Network.

card://cardnetwork.dev/roi          # CFO door
card://cardnetwork.dev/whitepaper   # Engineer door
card://cardnetwork.dev/sprint       # Sales door
Topology card:graph

The whole article, as a graph

Each box is one card. The arrows are typed edges. Click any node to jump back to that card.

Want to see what the same question looks like through traditional runtime RAG? Run it side by side against this corpus.

Run the same question through traditional RAG

Three doors / pick one

Read it. Now run it on your stack.

Drop your email. We send the full whitepaper, schedule a 30-minute discovery call with Mike, and run a free CARD-readiness audit on your domain.

Book a 30-min call instead

No spam. We use your email to send the whitepaper, schedule the call, and follow up on the audit. That is it.