Space Characters in Unicode: 20+ Invisible Characters Compared
- ○ 1. En Dash vs Em Dash: When to Use – and —
- ○ 2. Curly Quotes vs Straight Quotes: Typography's Most Common Mix-Up
- ○ 3. Ellipsis (…) vs Three Dots (...): One Character or Three?
- ○ 4. Multiplication Sign (×) vs Letter X: Spot the Difference
- ○ 5. Minus vs Hyphen vs Dash: Five Characters That Look Like a Line
- ○ 6. Zero vs Letter O: Unicode Confusables and Homograph Attacks
- ● 7. Space Characters in Unicode: 20+ Invisible Characters Compared
- ○ 8. Bullet (•) vs Middle Dot (·): Small Dots, Big Differences
The space bar produces one character: U+0020, the humble space. But Unicode defines more than 20 distinct characters that produce invisible whitespace of varying widths, breaking behaviors, and formatting properties. These characters look identical (or near-identical) in most contexts — a gap between characters, or no visible gap at all — yet they behave completely differently in text layout, HTML rendering, copy-paste operations, and security contexts.
Understanding this invisible landscape is essential for any developer working with internationalized text, precision typography, or content sanitization.
The Complete Space Character Reference
| Character | Name | Unicode | HTML Entity | Width | Line-Break |
|---|---|---|---|---|---|
|
Space | U+0020 | can't be used here — literal space |
Normal word spacing | Yes |
|
No-Break Space | U+00A0 | |
Same as space | No |
| En Space | U+2002 |   |
1/2 em | Yes | |
| Em Space | U+2003 |   |
1 em | Yes | |
| 3-Per-Em Space | U+2004 |   |
1/3 em | Yes | |
| 4-Per-Em Space | U+2005 |   |
1/4 em | Yes | |
| 6-Per-Em Space | U+2006 | — | 1/6 em | Yes | |
| Figure Space | U+2007 |   |
Same as a digit | No | |
| Punctuation Space | U+2008 |   |
Same as period | Yes | |
| Thin Space | U+2009 |   |
1/5 em (approx.) | Yes | |
| Hair Space | U+200A |   |
Thinnest visible | Yes | |
| | Zero-Width Space | U+200B | ​ |
0 (invisible) | Yes (word-break opportunity) |
| | Zero-Width Non-Joiner | U+200C | ‌ |
0 | No |
| | Zero-Width Joiner | U+200D | ‍ |
0 | No |
| Narrow No-Break Space | U+202F |   |
Thin, no-break | No | |
| Medium Mathematical Space | U+205F |   |
4/18 em | Yes | |
| ⠀ | Braille Pattern Blank | U+2800 | — | Full width | Yes |
| ㅤ | Hangul Filler | U+3164 | — | Full width | Yes |
| Ideographic Space | U+3000 | &idesp; |
Full em (CJK width) | Yes | |
| | Zero-Width No-Break Space / BOM | U+FEFF | — | 0 | No |
Note: Many of these characters render as invisible gaps in HTML. Use our Character Analyzer to detect which space characters are present in any text.
The Most Important Space Characters
U+0020 — Regular Space
The standard space character. Line-break is permitted here in most contexts. In HTML, consecutive spaces collapse to a single space (unless white-space: pre or similar is applied). This is the character produced by pressing the space bar.
<!-- In HTML, multiple spaces collapse to one -->
<p>One space or many</p>
<!-- Renders as: One space or many -->
<!-- Use CSS to control spacing instead of space characters -->
.spaced { letter-spacing: 0.2em; word-spacing: 0.5em; }
U+00A0 — No-Break Space (NBSP)
The most widely used non-standard space. It looks identical to a regular space but prevents a line break at that position. Essential for:
- Keeping number and unit together:
15 kg(should not break between 15 and kg) - Keeping title and name together:
Dr. Smith,Mr. Jones - French typography: spacing before certain punctuation marks (
:,;,!,?) - Preventing "orphan" words at the end of lines
<!-- Prevents "15" and "kg" from splitting across lines -->
<p>The package weighs 15 kg.</p>
<!-- French-style spacing before colon -->
<p>Résultat : succès</p>
CSS alternative: For most layout-based line-break prevention, white-space: nowrap on a container is more maintainable than inserting NBSP characters in the content.
U+200B — Zero-Width Space
The zero-width space is invisible — it takes up no width — but it provides a line-break opportunity. Its primary uses:
- Allowing line breaks inside long words, URLs, or strings that have no natural break points
- Permitting line breaks in CJK (Chinese, Japanese, Korean) text where word boundaries are not marked by spaces
- Creating break opportunities inside long URLs for display purposes
<!-- Allow break in a very long URL displayed as text -->
<a href="https://example.com/very/long/path/to/some/resource">
https://example.com/very/long/path/to/some/resource
</a>
<!-- CSS alternative (more robust): -->
.url { word-break: break-all; }
Security warning: Zero-width spaces are invisible and are frequently used in: - Plagiarism watermarking (unique patterns of ZWSP embedded in documents) - Bypassing content filters (inserting ZWSP between characters to defeat keyword matching) - Obfuscating malicious content
// Detect zero-width spaces in user input
function hasInvisibleChars(str) {
return /[\u200B\u200C\u200D\uFEFF\u00AD\u2060]/.test(str);
}
// Remove zero-width characters from user input (for sanitization)
function removeInvisibleChars(str) {
return str.replace(/[\u200B\u200C\u200D\uFEFF\u00AD\u2060]/g, '');
}
U+200C — Zero-Width Non-Joiner (ZWNJ)
The ZWNJ prevents two characters from joining via ligature or cursive connection. It is invisible and takes no space but affects rendering:
- Arabic/Persian text: Prevents certain characters from connecting in cursive script
- Devanagari (Hindi): Controls ligature formation
- Ligature prevention: Prevents
fi,fl,ffligatures where they would be wrong (e.g., across a morpheme boundary in German)
<!-- Prevent fi ligature in "shelf·ish" (across morpheme boundary) -->
<p>shelf‌ish</p>
In most web contexts outside of RTL or Indic languages, you'll rarely need ZWNJ. It is, however, used in emoji sequences to control rendering (see ZWJ below).
U+200D — Zero-Width Joiner (ZWJ)
The zero-width joiner causes characters that would normally render separately to join as a ligature or combined form. It is the foundation of modern emoji sequences:
<!-- Family emoji composed with ZWJ sequences -->
👨👩👧👦
<!-- Decoded: 👨 (U+1F468) + ZWJ + 👩 (U+1F469) + ZWJ + 👧 (U+1F467) + ZWJ + 👦 (U+1F466) -->
<!-- Profession emoji: person + ZWJ + tool/symbol -->
👩💻 <!-- Woman Technologist: 👩 + ZWJ + 💻 -->
👨🍳 <!-- Man Cook: 👨 + ZWJ + 🍳 -->
🏳️🌈 <!-- Rainbow Flag: 🏳 + ZWJ + 🌈 -->
ZWJ sequences are why a single displayed emoji can be multiple code points. When building applications that handle user text, always treat ZWJ sequences as atomic units.
// Count user-perceived characters (grapheme clusters), not code points
// The Intl.Segmenter API (modern browsers)
const segmenter = new Intl.Segmenter('en', { granularity: 'grapheme' });
const text = '👨👩👧👦 Family';
const graphemes = [...segmenter.segment(text)];
console.log(graphemes.length); // 8 (1 emoji + 1 space + 6 letters)
console.log(text.length); // 17 (code units — misleading for display)
U+00AD — Soft Hyphen
Not technically a space character, but in the "invisible formatting characters" family: the soft hyphen (shy) is a hyphenation hint. It renders as nothing unless the word needs to break across a line, at which point it renders as a visible hyphen at the break point.
<!-- A long word with soft hyphens marking acceptable break points -->
<p>antidis­establish­mentarian­ism</p>
<!-- CSS hyphenation (preferred approach for most use cases): -->
p { hyphens: auto; }
U+202F — Narrow No-Break Space
A thinner non-breaking space, approximately half the width of a regular NBSP. Primarily used in:
- French and Swiss typography: required before
!,?,;,: - Number formatting:
1 234 567(thousands separator in many European countries uses thin space or NBSP) - Mathematical notation: spacing around operators
<!-- Number with narrow no-break space as thousands separator -->
<p>Population: 1 234 567</p>
<!-- French punctuation spacing -->
<p>Bonjour !</p>
U+3000 — Ideographic Space
The full-width space used in CJK (Chinese, Japanese, Korean) typography. It is exactly as wide as a full-width CJK character — twice the width of a regular ASCII space. In mixed CJK/Latin text, it provides visually appropriate paragraph indentation.
<!-- Traditional CJK paragraph indentation: two ideographic spaces -->
<p> 这是一个段落的开始。</p>
In CSS, this is better handled with text-indent:
p.cjk { text-indent: 2em; }
Security Implications of Invisible Characters
Content Filtering Bypass
Zero-width spaces and other invisible characters are commonly inserted into spam, phishing messages, and flagged keywords to bypass automated content filters:
viagra ← Each letter separated by U+200B
# A filter looking for "viagra" as a string will not find this
Robust content moderation systems must normalize and strip invisible characters before applying keyword filters.
Watermarking and Tracking
Publishers and document tracking systems embed unique patterns of invisible characters (typically ZWSP, soft hyphen, or NBSP variations) in text to track document leaks and identify the source of unauthorized copies. This technique is called "text steganography" or "snow steganography" in some contexts.
# Example: encode a 4-bit ID into invisible characters
ZWSP = '\u200B' # bit 0
ZWNJ = '\u200C' # bit 1
def encode_id(text: str, user_id: int, bits: int = 8) -> str:
"""Embed a binary user ID into text using invisible characters."""
marker = ''
for i in range(bits):
marker += ZWSP if (user_id >> i) & 1 == 0 else ZWNJ
return text[:20] + marker + text[20:] # Insert after first 20 chars
def has_watermark(text: str) -> bool:
"""Detect presence of invisible character watermark."""
return bool(re.search(r'[\u200B\u200C]{4,}', text))
The Invisible Username Problem
If your application allows Unicode usernames, users can create accounts with names that appear identical to other users but contain different invisible characters:
alice ← Regular username
alice ← Username with a trailing ZWSP (looks identical)
Always normalize usernames before comparison:
import unicodedata
import re
def normalize_username_for_comparison(username: str) -> str:
"""Remove invisible chars and normalize for unique-username checking."""
# Remove zero-width and formatting characters
cleaned = re.sub(r'[\u200B-\u200F\u2028-\u202F\u2060-\u206F\uFEFF]',
'', username)
# NFKC normalization
return unicodedata.normalize('NFKC', cleaned).casefold()
Typography Use Cases
Mathematical Spacing
In mathematical notation, thin spaces (U+2009) are used around binary operators and after commas in function arguments:
<!-- Properly spaced mathematical expression -->
<p><i>f</i>(<i>x</i>, <i>y</i>) = <i>x</i> + <i>y</i></p>
In practice, MathML or KaTeX handles this automatically.
French Typography
French typographic rules require spaces before certain punctuation marks. The standard specifies narrow no-break space before !, ?, ;, :, » and after «:
<p>Il a dit : « Bonjour ! »</p>
<!-- Renders as: Il a dit : « Bonjour ! » -->
Number Formatting
Different countries use different thousand separators. The Unicode standard recommends the narrow no-break space (U+202F) as the international separator, avoiding ambiguity between comma (European decimal) and period (US decimal):
<p>1 234 567.89</p> <!-- International -->
<p>1,234,567.89</p> <!-- US format -->
<p>1.234.567,89</p> <!-- German format -->
Detecting Space Characters
Paste any text into our Character Analyzer to see every character's Unicode code point, including invisible ones. This is invaluable for debugging:
- Text that "looks" the same but doesn't match in string comparison
- Content that bypasses filters
- Copy-pasted text with invisible formatting artifacts from Word or rich-text editors
In JavaScript:
// Find all non-standard whitespace in a string
function findInvisibleChars(str) {
const results = [];
for (let i = 0; i < str.length; i++) {
const cp = str.codePointAt(i);
const char = str[i];
// Skip regular space (U+0020) and common printable range
if (cp === 0x0020) continue;
if (cp >= 0x0021 && cp <= 0x007E) continue; // Basic ASCII printable
if (/\s/.test(char) || cp === 0x00A0 || cp === 0x200B ||
cp === 0x200C || cp === 0x200D || cp === 0xFEFF ||
(cp >= 0x2000 && cp <= 0x200A) || cp === 0x202F ||
cp === 0x205F || cp === 0x3000) {
results.push({
index: i,
codePoint: cp.toString(16).toUpperCase().padStart(4, '0'),
char: char,
});
}
}
return results;
}
How to Type Space Characters
Mac
| Character | Method |
|---|---|
| No-Break Space (U+00A0) | ⌥Space (Option + Space) |
| Thin Space (U+2009) | Character Viewer → search "thin space" |
| Zero-Width Space (U+200B) | Character Viewer or HTML entity |
Windows
| Character | Method |
|---|---|
| No-Break Space | Alt + 0160 |
| En Space | Alt + 8194 |
| Em Space | Alt + 8195 |
Linux
Ctrl + Shift + U → hex code → Enter for any character. For NBSP specifically, many keyboard layouts support AltGr + Space.
HTML
<!-- No-Break Space (U+00A0) -->
  <!-- En Space (U+2002) -->
  <!-- Em Space (U+2003) -->
  <!-- Thin Space (U+2009) -->
  <!-- Hair Space (U+200A) -->
​ <!-- Zero-Width Space (U+200B) -->
  <!-- Narrow No-Break Space (U+202F) -->
Next in Series: The bullet character and its look-alikes — middle dot, interpunct, and other small round characters that serve surprisingly different roles. See Bullet vs Middle Dot.