Validation Runs
Benchmark and integration runs
lis-swap-contamination-triage · GPT-5 · terminus-2
3 runs · $0.3562 total
1 of 3 runs failed Layer 2 audit
| RUN ID | STEPS | COST | LAYER 1 | LAYER 2 | RECORDING |
|---|---|---|---|---|---|
| hJQzBJW | 4 | $0.08 | L1PASS | L2FAIL | — |
| HsPAVBJ★ | 5 | $0.12 | L1PASS | L2PASS | ◉ yes |
| Zo4iCGU | 6 | $0.15 | L1PASS | L2PASS | — |
hl7-canonicalization-demo · claude-sonnet-4-6 · claude-code
AI-governed autoverification from a real HL7 instrument message — KG-bounded decision, complete evidence trail.
1 runs · $0.0000 total
All runs passed Layer 2 audit
| RUN ID | STEPS | COST | LAYER 1 | LAYER 2 | RECORDING |
|---|---|---|---|---|---|
| HL7 v2 Input | 4 | $0.00 | L1PASS | L2PASS | — |
allotrope-canonicalization-demo · claude-sonnet-4-6 · claude-code
Same KG-governed triage engine, Allotrope ASM input — format is the variable, clinical reasoning is unchanged.
1 runs · $0.0000 total
All runs passed Layer 2 audit
| RUN ID | STEPS | COST | LAYER 1 | LAYER 2 | RECORDING |
|---|---|---|---|---|---|
| Allotrope ASM Input | 4 | $0.00 | L1PASS | L2PASS | — |
delta-check-demo · claude-sonnet-4-6 · claude-code
KG-governed delta check detection — creatinine 664% rise (1.1→8.4 mg/dL) triggers EP33-grounded HOLD. Same framework, different clinical pattern.
1 runs · $0.0000 total
All runs passed Layer 2 audit
| RUN ID | STEPS | COST | LAYER 1 | LAYER 2 | RECORDING |
|---|---|---|---|---|---|
| Delta Check | 4 | $0.00 | L1PASS | L2PASS | — |