Designing Data for AI-First Architectures

For many years, JSON has been the default choice for data exchange. It works well for APIs, browsers, and human-readable payloads.
But modern systems are no longer just data-driven — they are AI-driven.

When Large Language Models, embeddings, RAG pipelines, and autonomous agents become first‑class citizens, tokens become more important than bytes.

This is where TOON (Token‑Oriented Object Notation) fits naturally.


Why I Started Thinking About TOON

In real projects — AI assistants, RAG platforms, observability pipelines — I repeatedly faced the same problems:

  • JSON structures are verbose for LLMs
  • Deep nesting increases token noise
  • Small schema changes break embeddings consistency
  • Logs are hard to reuse as AI context
  • Prompt construction becomes fragile

TOON is not meant to replace JSON everywhere.
It is meant to sit between your system and AI.


What Is TOON?

Token‑Oriented Object Notation (TOON) is a structured format where:

  • Tokens are the primary design unit
  • Structure is expressed via token paths
  • Data is flattened but semantically precise
  • Each line is deterministic and embedding‑friendly

Conceptually

:path.to.value <scalar>

That’s it.


JSON vs TOON — Practical View

JSON

{
  "order": {
    "id": 9821,
    "total": 1499.99,
    "currency": "AUD",
    "customer": {
      "id": 77,
      "type": "business"
    }
  }
}

TOON

:order.id 9821
:order.total 1499.99
:order.currency "AUD"
:order.customer.id 77
:order.customer.type "business"

Why this matters:

  • LLM tokenization becomes stable
  • Embeddings improve consistency
  • Partial context works reliably
  • Logs become reusable AI input

Where TOON Belongs in Architecture

Backend (Primary Target)

TOON is ideal for backend intelligence layers:

AI & RAG Pipelines

  • Context normalization
  • Prompt assembly
  • Chunking before embeddings
  • Semantic indexing

Event Sourcing & Logs

:event.type "PaymentCaptured"
:event.order.id 9821
:event.amount 1499.99
:event.provider "Stripe"

Logs stop being dead data — they become AI knowledge.