Interactive Research Demo

Unicode Steganography

by Patrick Vuscan

How invisible characters and visual lookalikes can carry secret messages through ordinary text, and what that means for LLM safety.

A sufficiently capable model could embed covert signals in its outputs, invisible to human readers but recoverable by another model or process. Try it yourself below.

Technique Comparison

Each technique hides data inside normal-looking text, but the tradeoffs around detectability, capacity, and robustness differ.

Zero-Width Characters
Homoglyph Substitution
Variation Selectors
Detectability
Trivially flagged by any Unicode scanner or hex editor. The non-printable code points are a dead giveaway.
Harder to catch without a Latin↔Cyrillic reference. The characters render identically, and most spell-checkers accept them.
Flagged by any walk over Unicode scalars or a variation-selector pass. The visible line can still look unchanged to a reader.
Bandwidth
High: any carrier works. The hidden stream is carrier-agnostic and scales with message length.
Low: capped by the number of substitutable characters in the carrier. Only 21 Latin↔Cyrillic pairs exist in this implementation.
High: suffix mode concentrates bytes on one or few anchor characters, and scales with message length. Low: per-grapheme mode as implemented is capped by the number of graphemes in the carrier.
Robustness
Fragile. Slack, Twitter/X, and many email clients strip zero-width characters before storing or displaying text.
Durable. Cyrillic lookalikes survive copy-paste on essentially every platform. No common stripping heuristic targets them.
Mixed. Often survives naive copy-paste better than zero-width stripping, but NFC and NFKC normalization may reorder or collapse parts of a sequence depending on the carrier, and some pipelines alter or drop selectors, especially in supplementary planes.

Why This Matters for AI Safety

LLM steganography is a concrete example of how AI deception could work in practice. Each of the techniques demonstrated above fools human readers, but none survives a scanner built to look for them: a Unicode category check catches zero-width characters, a homoglyph alphabet check catches Cyrillic substitutions, and code-point inspection catches payloads carried in variation selectors.

The harder question for AI alignment is whether a model could invent an encoding that passes human review and beats automated scanners it hasn't seen before.