Validation Runs

Benchmark and integration runs

JOB2026-02-21

lis-swap-contamination-triage · GPT-5 · terminus-2

3 runs · $0.3562 total

1 of 3 runs failed Layer 2 audit

RUN IDSTEPSCOSTLAYER 1LAYER 2RECORDING
hJQzBJW4$0.08L1PASSL2FAIL
HsPAVBJ5$0.12L1PASSL2PASS◉ yes
Zo4iCGU6$0.15L1PASSL2PASS
JOB2026-04-25

hl7-canonicalization-demo · claude-sonnet-4-6 · claude-code

AI-governed autoverification from a real HL7 instrument message — KG-bounded decision, complete evidence trail.

1 runs · $0.0000 total

All runs passed Layer 2 audit

RUN IDSTEPSCOSTLAYER 1LAYER 2RECORDING
HL7 v2 Input4$0.00L1PASSL2PASS
JOB2026-04-28

allotrope-canonicalization-demo · claude-sonnet-4-6 · claude-code

Same KG-governed triage engine, Allotrope ASM input — format is the variable, clinical reasoning is unchanged.

1 runs · $0.0000 total

All runs passed Layer 2 audit

RUN IDSTEPSCOSTLAYER 1LAYER 2RECORDING
Allotrope ASM Input4$0.00L1PASSL2PASS
JOB2026-05-05

delta-check-demo · claude-sonnet-4-6 · claude-code

KG-governed delta check detection — creatinine 664% rise (1.1→8.4 mg/dL) triggers EP33-grounded HOLD. Same framework, different clinical pattern.

1 runs · $0.0000 total

All runs passed Layer 2 audit

RUN IDSTEPSCOSTLAYER 1LAYER 2RECORDING
Delta Check4$0.00L1PASSL2PASS