Paper — Valid JSON, Wrong Answer

Valid JSON, Wrong Answer: Per-Role Regression Detection and Linguistic Presupposition Labeling for Structured Output.

Breck Baldwin, validjson.com.

PDF preprint · GitHub · DOI 10.5281/zenodo.20075999

Abstract

Generating reliable structured output from large language models — JSON of the form "refundable":"True", schema-conformant tool calls, database queries — remains difficult in production. One possible fix pairs grammar-constrained decoding (for syntax) with LoRA fine-tuning (for semantics), evaluated by aggregate loss. We observe counter-current fields: per-grammar-role components whose loss rises under fine-tuning even as aggregate loss falls. We propose two mitigations rooted in a uniqueness-presupposition view of training labels — margin gating (inference-time abstention) and presupposition labeling (training-time relabel of evidentially underdetermined examples). Tested across Qwen 2.5 at 0.5B / 7B / 32B on Schema-Guided Dialogue and CUAD; presupposition labeling reduces constrained-content loss by 21–58% and eliminates the boolean regression at 7B and 32B.

Reproduce every paper number on a laptop

The supplementary on GitHub ships pre-computed result JSONs and the table-builder scripts. Every paper number is derivable without a GPU:

git clone https://github.com/validjson/valid-json-wrong-answer
cd valid-json-wrong-answer
python3 scripts/build_tables.py

A full from-scratch reproduction of the experiments takes ~5 hours on a single A100 80GB.

Citation

@misc{baldwin2026validjson,
  title  = {Valid JSON, Wrong Answer: Per-Role Regression Detection and
            Linguistic Presupposition Labeling for Structured Output},
  author = {Baldwin, Breck},
  year   = {2026},
  doi    = {10.5281/zenodo.20075999},
  url    = {https://zenodo.org/records/20075999}
}