Space Characters in Unicode: 20+ Invisible Characters Compared

Reference Symbol Showdown ต.ค. 24, 2023

○ 1. En Dash vs Em Dash: When to Use – and —
○ 2. Curly Quotes vs Straight Quotes: Typography's Most Common Mix-Up
○ 3. Ellipsis (…) vs Three Dots (...): One Character or Three?
○ 4. Multiplication Sign (×) vs Letter X: Spot the Difference
○ 5. Minus vs Hyphen vs Dash: Five Characters That Look Like a Line
○ 6. Zero vs Letter O: Unicode Confusables and Homograph Attacks
● 7. Space Characters in Unicode: 20+ Invisible Characters Compared
○ 8. Bullet (•) vs Middle Dot (·): Small Dots, Big Differences

The space bar produces one character: U+0020, the humble space. But Unicode defines more than 20 distinct characters that produce invisible whitespace of varying widths, breaking behaviors, and formatting properties. These characters look identical (or near-identical) in most contexts — a gap between characters, or no visible gap at all — yet they behave completely differently in text layout, HTML rendering, copy-paste operations, and security contexts.

Understanding this invisible landscape is essential for any developer working with internationalized text, precision typography, or content sanitization.

The Complete Space Character Reference

Character	Name	Unicode	HTML Entity	Width	Line-Break
	Space	U+0020	` ` can't be used here — literal space	Normal word spacing	Yes
	No-Break Space	U+00A0	` `	Same as space	No
	En Space	U+2002	`&ensp;`	1/2 em	Yes
	Em Space	U+2003	`&emsp;`	1 em	Yes
	3-Per-Em Space	U+2004	`&emsp13;`	1/3 em	Yes
	4-Per-Em Space	U+2005	`&emsp14;`	1/4 em	Yes
	6-Per-Em Space	U+2006	—	1/6 em	Yes
	Figure Space	U+2007	`&numsp;`	Same as a digit	No
	Punctuation Space	U+2008	`&puncsp;`	Same as period	Yes
	Thin Space	U+2009	` `	1/5 em (approx.)	Yes
	Hair Space	U+200A	`&hairsp;`	Thinnest visible	Yes
	Zero-Width Space	U+200B	``	0 (invisible)	Yes (word-break opportunity)
‌	Zero-Width Non-Joiner	U+200C	`&zwnj;`	0	No
‍	Zero-Width Joiner	U+200D	`&zwj;`	0	No
	Narrow No-Break Space	U+202F	` `	Thin, no-break	No
	Medium Mathematical Space	U+205F	` `	4/18 em	Yes
⠀	Braille Pattern Blank	U+2800	—	Full width	Yes
ㅤ	Hangul Filler	U+3164	—	Full width	Yes
	Ideographic Space	U+3000	`&idesp;`	Full em (CJK width)	Yes
	Zero-Width No-Break Space / BOM	U+FEFF	—	0	No

Note: Many of these characters render as invisible gaps in HTML. Use our Character Analyzer to detect which space characters are present in any text.

The Most Important Space Characters

U+0020 — Regular Space

The standard space character. Line-break is permitted here in most contexts. In HTML, consecutive spaces collapse to a single space (unless white-space: pre or similar is applied). This is the character produced by pressing the space bar.

<!-- In HTML, multiple spaces collapse to one -->
<p>One      space     or      many</p>
<!-- Renders as: One space or many -->

<!-- Use CSS to control spacing instead of space characters -->
.spaced { letter-spacing: 0.2em; word-spacing: 0.5em; }

U+00A0 — No-Break Space (NBSP)

The most widely used non-standard space. It looks identical to a regular space but prevents a line break at that position. Essential for:

Keeping number and unit together: 15 kg (should not break between 15 and kg)
Keeping title and name together: Dr. Smith, Mr. Jones
French typography: spacing before certain punctuation marks (:, ;, !, ?)
Preventing "orphan" words at the end of lines

<!-- Prevents "15" and "kg" from splitting across lines -->
<p>The package weighs 15&nbsp;kg.</p>

<!-- French-style spacing before colon -->
<p>Résultat&nbsp;: succès</p>

CSS alternative: For most layout-based line-break prevention, white-space: nowrap on a container is more maintainable than inserting NBSP characters in the content.

U+200B — Zero-Width Space

The zero-width space is invisible — it takes up no width — but it provides a line-break opportunity. Its primary uses:

Allowing line breaks inside long words, URLs, or strings that have no natural break points
Permitting line breaks in CJK (Chinese, Japanese, Korean) text where word boundaries are not marked by spaces
Creating break opportunities inside long URLs for display purposes

<!-- Allow break in a very long URL displayed as text -->
<a href="https://example.com/very/long/path/to/some/resource">
  https://example.com/very/long/path/to/some/resource
</a>

<!-- CSS alternative (more robust): -->
.url { word-break: break-all; }

Security warning: Zero-width spaces are invisible and are frequently used in: - Plagiarism watermarking (unique patterns of ZWSP embedded in documents) - Bypassing content filters (inserting ZWSP between characters to defeat keyword matching) - Obfuscating malicious content

// Detect zero-width spaces in user input
function hasInvisibleChars(str) {
  return /[\u200B\u200C\u200D\uFEFF\u00AD\u2060]/.test(str);
}

// Remove zero-width characters from user input (for sanitization)
function removeInvisibleChars(str) {
  return str.replace(/[\u200B\u200C\u200D\uFEFF\u00AD\u2060]/g, '');
}

U+200C — Zero-Width Non-Joiner (ZWNJ)

The ZWNJ prevents two characters from joining via ligature or cursive connection. It is invisible and takes no space but affects rendering:

Arabic/Persian text: Prevents certain characters from connecting in cursive script
Devanagari (Hindi): Controls ligature formation
Ligature prevention: Prevents fi, fl, ff ligatures where they would be wrong (e.g., across a morpheme boundary in German)

<!-- Prevent fi ligature in "shelf·ish" (across morpheme boundary) -->
<p>shelf&#8204;ish</p>

In most web contexts outside of RTL or Indic languages, you'll rarely need ZWNJ. It is, however, used in emoji sequences to control rendering (see ZWJ below).

U+200D — Zero-Width Joiner (ZWJ)

The zero-width joiner causes characters that would normally render separately to join as a ligature or combined form. It is the foundation of modern emoji sequences:

<!-- Family emoji composed with ZWJ sequences -->
👨‍👩‍👧‍👦
<!-- Decoded: 👨 (U+1F468) + ZWJ + 👩 (U+1F469) + ZWJ + 👧 (U+1F467) + ZWJ + 👦 (U+1F466) -->

<!-- Profession emoji: person + ZWJ + tool/symbol -->
👩‍💻  <!-- Woman Technologist: 👩 + ZWJ + 💻 -->
👨‍🍳  <!-- Man Cook: 👨 + ZWJ + 🍳 -->
🏳️‍🌈  <!-- Rainbow Flag: 🏳 + ZWJ + 🌈 -->

ZWJ sequences are why a single displayed emoji can be multiple code points. When building applications that handle user text, always treat ZWJ sequences as atomic units.

// Count user-perceived characters (grapheme clusters), not code points
// The Intl.Segmenter API (modern browsers)
const segmenter = new Intl.Segmenter('en', { granularity: 'grapheme' });
const text = '👨‍👩‍👧‍👦 Family';
const graphemes = [...segmenter.segment(text)];
console.log(graphemes.length); // 8 (1 emoji + 1 space + 6 letters)
console.log(text.length);       // 17 (code units — misleading for display)

U+00AD — Soft Hyphen

Not technically a space character, but in the "invisible formatting characters" family: the soft hyphen (shy) is a hyphenation hint. It renders as nothing unless the word needs to break across a line, at which point it renders as a visible hyphen at the break point.

<!-- A long word with soft hyphens marking acceptable break points -->
<p>antidis&shy;establish&shy;mentarian&shy;ism</p>

<!-- CSS hyphenation (preferred approach for most use cases): -->
p { hyphens: auto; }

U+202F — Narrow No-Break Space

A thinner non-breaking space, approximately half the width of a regular NBSP. Primarily used in:

French and Swiss typography: required before !, ?, ;, :
Number formatting: 1 234 567 (thousands separator in many European countries uses thin space or NBSP)
Mathematical notation: spacing around operators

<!-- Number with narrow no-break space as thousands separator -->
<p>Population: 1&#8239;234&#8239;567</p>

<!-- French punctuation spacing -->
<p>Bonjour&#8239;!</p>

U+3000 — Ideographic Space

The full-width space used in CJK (Chinese, Japanese, Korean) typography. It is exactly as wide as a full-width CJK character — twice the width of a regular ASCII space. In mixed CJK/Latin text, it provides visually appropriate paragraph indentation.

<!-- Traditional CJK paragraph indentation: two ideographic spaces -->
<p>　　这是一个段落的开始。</p>

In CSS, this is better handled with text-indent:

p.cjk { text-indent: 2em; }

Security Implications of Invisible Characters

Content Filtering Bypass

Zero-width spaces and other invisible characters are commonly inserted into spam, phishing messages, and flagged keywords to bypass automated content filters:

viagra  ← Each letter separated by U+200B
# A filter looking for "viagra" as a string will not find this

Robust content moderation systems must normalize and strip invisible characters before applying keyword filters.

Watermarking and Tracking

Publishers and document tracking systems embed unique patterns of invisible characters (typically ZWSP, soft hyphen, or NBSP variations) in text to track document leaks and identify the source of unauthorized copies. This technique is called "text steganography" or "snow steganography" in some contexts.

# Example: encode a 4-bit ID into invisible characters
ZWSP = '\u200B'   # bit 0
ZWNJ = '\u200C'   # bit 1

def encode_id(text: str, user_id: int, bits: int = 8) -> str:
    """Embed a binary user ID into text using invisible characters."""
    marker = ''
    for i in range(bits):
        marker += ZWSP if (user_id >> i) & 1 == 0 else ZWNJ
    return text[:20] + marker + text[20:]  # Insert after first 20 chars

def has_watermark(text: str) -> bool:
    """Detect presence of invisible character watermark."""
    return bool(re.search(r'[\u200B\u200C]{4,}', text))

The Invisible Username Problem

If your application allows Unicode usernames, users can create accounts with names that appear identical to other users but contain different invisible characters:

alice     ← Regular username
alice    ← Username with a trailing ZWSP (looks identical)

Always normalize usernames before comparison:

import unicodedata
import re

def normalize_username_for_comparison(username: str) -> str:
    """Remove invisible chars and normalize for unique-username checking."""
    # Remove zero-width and formatting characters
    cleaned = re.sub(r'[\u200B-\u200F\u2028-\u202F\u2060-\u206F\uFEFF]',
                     '', username)
    # NFKC normalization
    return unicodedata.normalize('NFKC', cleaned).casefold()

Typography Use Cases

Mathematical Spacing

In mathematical notation, thin spaces (U+2009) are used around binary operators and after commas in function arguments:

<!-- Properly spaced mathematical expression -->
<p><i>f</i>(<i>x</i>,&thinsp;<i>y</i>) = <i>x</i>&thinsp;+&thinsp;<i>y</i></p>

In practice, MathML or KaTeX handles this automatically.

French Typography

French typographic rules require spaces before certain punctuation marks. The standard specifies narrow no-break space before !, ?, ;, :, » and after «:

<p>Il a dit&#8239;: &#171;&#8239;Bonjour&#8239;!&#8239;&#187;</p>
<!-- Renders as: Il a dit : « Bonjour ! » -->

Number Formatting

Different countries use different thousand separators. The Unicode standard recommends the narrow no-break space (U+202F) as the international separator, avoiding ambiguity between comma (European decimal) and period (US decimal):

<p>1&#8239;234&#8239;567.89</p>   <!-- International -->
<p>1,234,567.89</p>               <!-- US format -->
<p>1.234.567,89</p>               <!-- German format -->

Detecting Space Characters

Paste any text into our Character Analyzer to see every character's Unicode code point, including invisible ones. This is invaluable for debugging:

Text that "looks" the same but doesn't match in string comparison
Content that bypasses filters
Copy-pasted text with invisible formatting artifacts from Word or rich-text editors

In JavaScript:

// Find all non-standard whitespace in a string
function findInvisibleChars(str) {
  const results = [];
  for (let i = 0; i < str.length; i++) {
    const cp = str.codePointAt(i);
    const char = str[i];
    // Skip regular space (U+0020) and common printable range
    if (cp === 0x0020) continue;
    if (cp >= 0x0021 && cp <= 0x007E) continue; // Basic ASCII printable
    if (/\s/.test(char) || cp === 0x00A0 || cp === 0x200B ||
        cp === 0x200C || cp === 0x200D || cp === 0xFEFF ||
        (cp >= 0x2000 && cp <= 0x200A) || cp === 0x202F ||
        cp === 0x205F || cp === 0x3000) {
      results.push({
        index: i,
        codePoint: cp.toString(16).toUpperCase().padStart(4, '0'),
        char: char,
      });
    }
  }
  return results;
}

How to Type Space Characters

Mac

Character	Method
No-Break Space (U+00A0)	`⌥Space` (Option + Space)
Thin Space (U+2009)	Character Viewer → search "thin space"
Zero-Width Space (U+200B)	Character Viewer or HTML entity

Windows

Character	Method
No-Break Space	`Alt + 0160`
En Space	`Alt + 8194`
Em Space	`Alt + 8195`

Linux

Ctrl + Shift + U → hex code → Enter for any character. For NBSP specifically, many keyboard layouts support AltGr + Space.

HTML

&nbsp;         <!-- No-Break Space (U+00A0) -->
&ensp;         <!-- En Space (U+2002) -->
&emsp;         <!-- Em Space (U+2003) -->
&thinsp;       <!-- Thin Space (U+2009) -->
&hairsp;       <!-- Hair Space (U+200A) -->
&#8203;        <!-- Zero-Width Space (U+200B) -->
&#8239;        <!-- Narrow No-Break Space (U+202F) -->

Next in Series: The bullet character and its look-alikes — middle dot, interpunct, and other small round characters that serve surprisingly different roles. See Bullet vs Middle Dot.