Designing Data for AI-First Architectures
For many years, JSON has been the default choice for data exchange. It works well for APIs, browsers, and human-readable payloads.
But modern systems are no longer just data-driven — they are AI-driven.
When Large Language Models, embeddings, RAG pipelines, and autonomous agents become first‑class citizens, tokens become more important than bytes.
This is where TOON (Token‑Oriented Object Notation) fits naturally.
Why I Started Thinking About TOON
In real projects — AI assistants, RAG platforms, observability pipelines — I repeatedly faced the same problems:
- JSON structures are verbose for LLMs
- Deep nesting increases token noise
- Small schema changes break embeddings consistency
- Logs are hard to reuse as AI context
- Prompt construction becomes fragile
TOON is not meant to replace JSON everywhere.
It is meant to sit between your system and AI.
What Is TOON?
Token‑Oriented Object Notation (TOON) is a structured format where:
- Tokens are the primary design unit
- Structure is expressed via token paths
- Data is flattened but semantically precise
- Each line is deterministic and embedding‑friendly
Conceptually
:path.to.value <scalar>
That’s it.
JSON vs TOON — Practical View
JSON
{
"order": {
"id": 9821,
"total": 1499.99,
"currency": "AUD",
"customer": {
"id": 77,
"type": "business"
}
}
}
TOON
:order.id 9821
:order.total 1499.99
:order.currency "AUD"
:order.customer.id 77
:order.customer.type "business"
Why this matters:
- LLM tokenization becomes stable
- Embeddings improve consistency
- Partial context works reliably
- Logs become reusable AI input
Where TOON Belongs in Architecture
Backend (Primary Target)
TOON is ideal for backend intelligence layers:
AI & RAG Pipelines
- Context normalization
- Prompt assembly
- Chunking before embeddings
- Semantic indexing
Event Sourcing & Logs
:event.type "PaymentCaptured"
:event.order.id 9821
:event.amount 1499.99
:event.provider "Stripe"
Logs stop being dead data — they become AI knowledge.
💬 Comments & Reactions