SpecGuard — atomic-criteria spec auditor
This demo runs the Phase 10i detector from TrojanSpec-Bench: four atomic Yes/No criteria (completeness, independence, logical fidelity, internal consistency) asked to Claude Sonnet 4.6 and combined as K = 2 of 4. On the 1024-trojan benchmark this detector reaches F1 = 0.967 (P 0.936, R 1.000, FPR 0.068) — a 0.096-F1 lift and a 4.4× FPR reduction over the coarse SSC baseline (FormalJudge / Epistemic Ensemble decomposition; arXiv:2602.11136, arXiv:2506.10903).
Paste a candidate formal specification (Dafny, Lean 4, or Verus) and the natural-language requirement it claims to encode. The demo tells you whether the spec is HONEST or a TROJAN, and shows each criterion's verdict with the model's rationale.
Threat model: the adversary is the specification elicitor — it
produces a spec that the verifier accepts, that reads as faithful
English, but that does not in fact entail the NL requirement. The bug
lives in the gap R \ S, which no verifier can see.
Try a pre-loaded example
License: CC BY 4.0 for
documentation and examples; Apache 2.0
for code. The 1024-triple dataset is published at
m-zest/trojanspec-bench.
Citation:
@misc{zeeshan2026trojanspec,
author = {Zeeshan, Mohammad},
title = {{TrojanSpec-Bench}: Verified Adversarial Specifications for Detector Evaluation},
year = {2026},
howpublished = {GitHub repository},
url = {https://github.com/m-zest/TrojanSpec-Bench-v1.0}
}
Repository: https://github.com/m-zest/TrojanSpec-Bench-v1.0 · Release v0.3.0