SpecGuard — atomic-criteria spec auditor

This demo runs the Phase 10i detector from TrojanSpec-Bench: four atomic Yes/No criteria (completeness, independence, logical fidelity, internal consistency) asked to Claude Sonnet 4.6 and combined as K = 2 of 4. On the 1024-trojan benchmark this detector reaches F1 = 0.967 (P 0.936, R 1.000, FPR 0.068) — a 0.096-F1 lift and a 4.4× FPR reduction over the coarse SSC baseline (FormalJudge / Epistemic Ensemble decomposition; arXiv:2602.11136, arXiv:2506.10903).

Paste a candidate formal specification (Dafny, Lean 4, or Verus) and the natural-language requirement it claims to encode. The demo tells you whether the spec is HONEST or a TROJAN, and shows each criterion's verdict with the model's rationale.

Threat model: the adversary is the specification elicitor — it produces a spec that the verifier accepts, that reads as faithful English, but that does not in fact entail the NL requirement. The bug lives in the gap R \ S, which no verifier can see.

Language

Try a pre-loaded example


License: CC BY 4.0 for documentation and examples; Apache 2.0 for code. The 1024-triple dataset is published at m-zest/trojanspec-bench.

Citation:

@misc{zeeshan2026trojanspec,
  author = {Zeeshan, Mohammad},
  title  = {{TrojanSpec-Bench}: Verified Adversarial Specifications for Detector Evaluation},
  year   = {2026},
  howpublished = {GitHub repository},
  url    = {https://github.com/m-zest/TrojanSpec-Bench-v1.0}
}

Repository: https://github.com/m-zest/TrojanSpec-Bench-v1.0 · Release v0.3.0