Interactive Research Demo

Unicode Steganography

by Patrick Vuscan

How invisible characters and visual lookalikes can carry secret messages through ordinary text, and what that means for LLM safety.

A sufficiently capable model could embed covert signals in its outputs, invisible to human readers but recoverable by another model or process. Try it yourself below.

Technique Comparison

Both methods hide data inside normal-looking text, but they have different tradeoffs around detectability, capacity, and robustness.

Zero-Width Characters
Homoglyph Substitution
Detectability
Trivially flagged by any Unicode scanner or hex editor. The non-printable code points are a dead giveaway.
Harder to catch without a Latin↔Cyrillic reference. The characters render identically, and most spell-checkers accept them.
Bandwidth
High: any carrier works. The hidden stream is carrier-agnostic and scales with message length.
Low: capped by the number of substitutable characters in the carrier. Only 21 Latin↔Cyrillic pairs exist in this implementation.
Robustness
Fragile. Slack, Twitter/X, and many email clients strip zero-width characters before storing or displaying text.
Durable. Cyrillic lookalikes survive copy-paste on essentially every platform. No common stripping heuristic targets them.

Why This Matters for AI Safety

LLM steganography is a concrete example of how AI deception could work in practice. Both techniques demonstrated above fool human readers, but neither survives a scanner built to look for them: a Unicode category check catches zero-width characters, and a homoglyph alphabet check catches Cyrillic substitutions.

The harder question for AI alignment is whether a model could invent an encoding that passes human review and beats automated scanners it hasn't seen before.