Interactive Research Demo
Unicode Steganography
How invisible characters and visual lookalikes can carry secret messages through ordinary text, and what that means for LLM safety.
A sufficiently capable model could embed covert signals in its outputs, invisible to human readers but recoverable by another model or process. Try it yourself below.
Technique Comparison
Both methods hide data inside normal-looking text, but they have different tradeoffs around detectability, capacity, and robustness.
Why This Matters for AI Safety
LLM steganography is a concrete example of how AI deception could work in practice. Both techniques demonstrated above fool human readers, but neither survives a scanner built to look for them: a Unicode category check catches zero-width characters, and a homoglyph alphabet check catches Cyrillic substitutions.
The harder question for AI alignment is whether a model could invent an encoding that passes human review and beats automated scanners it hasn't seen before.