Consulting

We work with teams who have a structured-output pipeline that mostly works, and a small set of fields that quietly do not.

What we offer

Diagnostic engagement. Run valjson against your model and data. Identify per-field regressions, calibration gaps, and prompt-truncation effects. Output: a written report and a reproducible analysis you keep. This is the starting point of every engagement.
Prompt, schema, and gating fixes. Often the cheapest fix is a schema-level ambiguous pattern, a prompt redesign, or inference-time margin gating — no fine-tuning required. We diagnose, prescribe, and verify against held-out data.
Presupposition labeling — when fine-tuning is warranted. For customers running open weights on their own infrastructure (regulated or proprietary-data domains), training-time relabeling fixes per-field regressions that the cheaper paths cannot. The technique is described in our paper; applying it to a customer’s schema requires judgement about which fields to relabel, what cues to use, and how to split.
Per-field deployment gates. Configure margin thresholds per grammar role for production. The deployed model abstains when evidence is thin rather than committing a confident-wrong answer.

Who it is for

Consider us if:

Your pipeline produces schema-valid JSON but downstream systems still reject some fraction.
Your evaluation says aggregate accuracy improved, but a stakeholder is reporting field-specific regressions.
You run open-weights models in a regulated or proprietary-data setting and need a per-field quality story for stakeholders.
You need calibrated abstention behavior, not just confident output.

You probably do not need us if valjson and the documented free-tier patterns address your case. Most do — see How to work with us for the DIY-first ladder.

Contact

Breck Baldwin — breck@validjson.com