Validation Runs

Benchmark and integration runs

JOB2026-02-21

lis-swap-contamination-triage · GPT-5 · terminus-2

3 runs · $0.3562 total

1 of 3 runs failed Layer 2 audit

RUN ID	STEPS	COST	LAYER 1	LAYER 2	RECORDING
hJQzBJW	4	$0.08	L1PASS	L2FAIL	—
HsPAVBJ★	5	$0.12	L1PASS	L2PASS	◉ yes
Zo4iCGU	6	$0.15	L1PASS	L2PASS	—

JOB2026-04-25

hl7-canonicalization-demo · claude-sonnet-4-6 · claude-code

AI-governed autoverification from a real HL7 instrument message — KG-bounded decision, complete evidence trail.

1 runs · $0.0000 total

All runs passed Layer 2 audit

RUN ID	STEPS	COST	LAYER 1	LAYER 2	RECORDING
HL7 v2 Input	4	$0.00	L1PASS	L2PASS	—

JOB2026-04-28

allotrope-canonicalization-demo · claude-sonnet-4-6 · claude-code

Same KG-governed triage engine, Allotrope ASM input — format is the variable, clinical reasoning is unchanged.

1 runs · $0.0000 total

All runs passed Layer 2 audit

RUN ID	STEPS	COST	LAYER 1	LAYER 2	RECORDING
Allotrope ASM Input	4	$0.00	L1PASS	L2PASS	—

JOB2026-05-05

delta-check-demo · claude-sonnet-4-6 · claude-code

KG-governed delta check detection — creatinine 664% rise (1.1→8.4 mg/dL) triggers EP33-grounded HOLD. Same framework, different clinical pattern.

1 runs · $0.0000 total

All runs passed Layer 2 audit

RUN ID	STEPS	COST	LAYER 1	LAYER 2	RECORDING
Delta Check	4	$0.00	L1PASS	L2PASS	—