Consulting
We work with teams who have a structured-output pipeline that mostly works, and a small set of fields that quietly do not.
What we offer
- Diagnostic engagement. Run
valjsonagainst your model and data. Identify per-field regressions, calibration gaps, and prompt-truncation effects. Output: a written report and a reproducible analysis you keep. This is the starting point of every engagement. - Prompt, schema, and gating fixes. Often the cheapest fix is a schema-level
ambiguouspattern, a prompt redesign, or inference-time margin gating — no fine-tuning required. We diagnose, prescribe, and verify against held-out data. - Presupposition labeling — when fine-tuning is warranted. For customers running open weights on their own infrastructure (regulated or proprietary-data domains), training-time relabeling fixes per-field regressions that the cheaper paths cannot. The technique is described in our paper; applying it to a customer’s schema requires judgement about which fields to relabel, what cues to use, and how to split.
- Per-field deployment gates. Configure margin thresholds per grammar role for production. The deployed model abstains when evidence is thin rather than committing a confident-wrong answer.
Who it is for
Consider us if:
- Your pipeline produces schema-valid JSON but downstream systems still reject some fraction.
- Your evaluation says aggregate accuracy improved, but a stakeholder is reporting field-specific regressions.
- You run open-weights models in a regulated or proprietary-data setting and need a per-field quality story for stakeholders.
- You need calibrated abstention behavior, not just confident output.
You probably do not need us if valjson and the documented free-tier patterns address your case. Most do — see How to work with us for the DIY-first ladder.
Contact
Breck Baldwin — breck@validjson.com