PURE-SYMBOLIC NLG · TRACE EVERY SENTENCE · 0% PRE-TRAINED

Onion Composer

Compositional natural-language generation for Thai and English · driven entirely by symbolic frames + lexicon · every output ships with an audit trail.

No pre-trained weights. No LLM API call. No neural model in the runtime path. The composer reads a frame definition, picks a register-aware construction, fills slots, and applies Thai sentence-final particles + classifiers + English subject-verb agreement — all from rules.

512
frames
19,757
lexicon entries
100%
end-to-end pass
600+
tests green
0%
pre-trained

All 12 examples below were computed offline by running the live composer module. No backend call from this page.

Live examples · 12 pre-computed compositions

Run via compose(frame_id, bindings, lang, register, ...) — output, construction id, and confidence shown verbatim.

Pipeline · how a sentence is built

Pure data-flow · no learned parameters anywhere along this path.

frame_id ──► load_frame ──► validate_bindings (CORE FE coverage) │ ▼ pick_lu_by_register │ ▼ pick_construction (lang + purpose + register · LU-anchored) │ ▼ slot_fill ([FIGURE] [GROUND] [Count]...) │ ┌────────────┼─────────────┐ ▼ ▼ ▼ TH classifier EN agreement TH plural (3 หนังสือ → (book is / marker (หลายๆ) หนังสือ 3 เล่ม) books are) │ ▼ TH sentence-final particle (ครับ / ค่ะ / ไหม / หน่อย) │ ▼ ComposeResult(sentence, trace[], confidence, warnings[])

For multi-sentence output (paragraphs), compose_paragraph() wraps the above per-sentence pipeline, then layers anaphora resolution (adjacent-sentence pronoun replacement) and discourse markers (ดังนั้น · แล้ว · So · Then · But) inferred from frame relations.

Five axes where this differs from an LLM

Traceability
Every sentence carries its own audit log

The composer returns a trace[] listing the frame loaded, the lexical unit picked, the construction selected, every slot fill, and any post-processing rule applied. You can replay the decision path. An LLM cannot.

Zero hallucination
Output is a function of input

Given the same frame + bindings + register, the composer returns the same surface form, byte-identical. Missing CORE frame elements show up as __MISSING_X__ placeholders with a warning — the composer never invents content.

Thai-native
SFP, classifiers, register, gender at the core

ครับ / ค่ะ politeness, ไหม / หน่อย question and request softeners, and 38 lexical classifiers (เล่ม / คัน / คน / ตัว / ใบ ...) are first-class concepts — not bolted-on filters over English-trained weights.

Privacy
Runs entirely on-device

Composer is pure Python + JSON files. No external API call, no telemetry, no token going off-host. The same module runs identically online and offline.

$0 / query
No metered cost per call

Compose() is a few hundred microseconds of dictionary lookup and regex substitution per sentence. There is no per-token billing because there are no tokens being inferred — only patterns being filled.

Honest limits
Quality ceiling is bounded by the frame library

We trade open-domain fluency for trace + privacy + determinism. If a frame for a given event type has not been authored yet, the composer cannot say it. That is a feature, not a bug.

Honest caveats

transparency

What the composer does not do well yet. We surface this so you can decide where to deploy and where not to.

18 frames flagged PENDING_THAI_NATIVE

Across the 512-frame library, 18 frames need a Thai-native review pass before being safe for production Thai output. They cover fragile pragmatics (REQUEST scaling, register-coupling around politeness, gender-marked particles in mixed-register paragraphs).

Register-coupling residual

When the lexicon picker and the construction picker disagree on register (e.g. only a casual LU is available for a frame whose constructions are formal-leaning), the composer falls back gracefully but emits a warning. This is honest behaviour rather than a hallucinated smoothing — some sentence cards below show the warning verbatim.

NL → frame parser is scaffold

This page demonstrates the generation half (frame → sentence) only. The reverse direction — free-form Thai/English sentence → frame + bindings — is currently a scaffold, not a benchmarked module. End-to-end conversational use therefore still leans on the wider Onion brain stack, not the composer alone.

Long-tail classifier coverage

The Thai classifier system covers ~150-200 nouns harvested from lex_quantity (38 classifier lexemes with example noun lists). Nouns outside this set fall through gracefully (no rewrite, no fabricated classifier) rather than guess.

English agreement is main-verb only

The agreement layer rewrites the first verb after the subject NP for ~30 common verbs. Auxiliary chains, second clauses, and modals (will / can / must) are intentionally left invariant. Out-of-table verbs pass through unchanged rather than being guessed.

ASCII numerals trigger a known false-positive lang warning

Inserting an ASCII digit ("3") into a Thai construction slot raises a "lang mismatch" warning that currently dampens the confidence score even though the surface output is correct. This is on the Phase 2.x list as W_NUM_LANG_PRIOR. The Thai classifier examples below show this honestly — correct output, conservative score.

Onion · pure-symbolic NLG demo · static page · works offline · macllm.ai