Whitespace Characters in Web Development: Beyond the Space Bar
- ○ 1. Ligatures in Web Typography: From fi to Modern OpenType Features
- ● 2. Whitespace Characters in Web Development: Beyond the Space Bar
- ○ 3. CJK Web Typography: Chinese, Japanese, and Korean Text on the Web
- ○ 4. Box Drawing Characters: Building Text-Based UI with Unicode
- ○ 5. Font Fallback and Tofu: Why Characters Display as Empty Boxes
The space bar produces exactly one character: U+0020, SPACE. But Unicode defines over two dozen distinct whitespace characters, each with a different width, different line-breaking behavior, and different semantic meaning. Knowing when to reach for a thin space instead of a regular space, or a non-breaking space instead of white-space: nowrap, is the difference between text that behaves and text that occasionally breaks in embarrassing ways at production screen sizes.
How HTML Collapses Whitespace
Before examining individual space characters, you need to understand the browser's default whitespace model. HTML collapses whitespace aggressively:
- Multiple consecutive ASCII spaces (U+0020) collapse to one rendered space
- Tab characters (U+0009) collapse to one space
- Newlines in source HTML collapse to one space
- Leading and trailing whitespace within inline elements is stripped
This means the following HTML produces a single space between "Hello" and "World":
<p>Hello World</p>
<p>Hello World</p>
<p>Hello
World</p>
The CSS white-space property controls this behavior:
/* Default: collapse runs, wrap at word boundaries */
p { white-space: normal; }
/* Preserve all whitespace, no wrapping */
pre { white-space: pre; }
/* Preserve whitespace, allow wrapping */
.formatted { white-space: pre-wrap; }
/* Collapse whitespace, prevent all wrapping */
.nowrap { white-space: nowrap; }
/* Preserve runs of spaces, collapse newlines, wrap */
.pre-spaces { white-space: pre-line; }
The newer white-space-collapse and text-wrap longhands (CSS Text Level 4) split these concerns:
.element {
white-space-collapse: collapse; /* or preserve, preserve-breaks, preserve-spaces */
text-wrap: wrap; /* or nowrap, balance, pretty */
}
The Unicode Whitespace Inventory
Unicode's White_Space property identifies characters that the standard formally considers whitespace. Here is the practical web-development subset:
Fixed-Width Spaces
| Character | Code Point | HTML Entity | Width | Breaks? |
|---|---|---|---|---|
| Space | U+0020 | (when used as NBSP) |
1 en | Yes |
| Non-Breaking Space | U+00A0 | |
1 en | No |
| En Space | U+2002 |   |
1 en | Yes |
| Em Space | U+2003 |   |
1 em | Yes |
| Three-Per-Em Space | U+2004 |   |
⅓ em | Yes |
| Four-Per-Em Space | U+2005 |   |
¼ em | Yes |
| Six-Per-Em Space | U+2006 | — | ⅙ em | Yes |
| Figure Space | U+2007 |   |
Width of a digit | No |
| Punctuation Space | U+2008 |   |
Width of a period | Yes |
| Thin Space | U+2009 |   |
~⅕ em (varies) | Yes |
| Hair Space | U+200A |   |
Thinner than thin | Yes |
| Narrow No-Break Space | U+202F | — | ~⅕ em | No |
| Medium Mathematical Space | U+205F |   |
~⁴⁄₁₈ em | Yes |
| Ideographic Space | U+3000 | &idesp; |
1 em (full-width) | Yes |
Zero-Width Characters
| Character | Code Point | Purpose |
|---|---|---|
| Zero-Width Space | U+200B | Line break opportunity without visible gap |
| Zero-Width Non-Joiner | U+200C | Prevent ligature/joining between adjacent characters |
| Zero-Width Joiner | U+200D | Request ligature/joining between adjacent characters |
| Word Joiner | U+2060 | Prevent line break (invisible, like NBSP but zero-width) |
| Zero-Width No-Break Space | U+FEFF | BOM when at start of file; ZWNBS in content |
ASCII Control Whitespace
| Character | Code Point | Description |
|---|---|---|
| Tab | U+0009 | Horizontal tab; collapses in HTML |
| Line Feed | U+000A | Unix newline |
| Vertical Tab | U+000B | Rarely used in text |
| Form Feed | U+000C | Page break; treated as whitespace in JS |
| Carriage Return | U+000D | Windows newline component |
| Next Line | U+0085 | NEL; Unicode line separator |
| Line Separator | U+2028 | Explicit line break (not paragraph) |
| Paragraph Separator | U+2029 | Explicit paragraph break |
Use our Character Counter tool to identify which whitespace characters appear in text you are analyzing — pasted content from word processors and PDFs frequently contains unexpected space variants.
Non-Breaking Space (U+00A0)
The non-breaking space is the most commonly needed whitespace character in web work. It inserts a space that the line-breaking algorithm will never split across. Use cases:
Units and values: "100 km" should not break between the number and the unit.
100 km
25 °C
$1,299 USD
Titles and names: "Mr. Smith" looks odd if "Mr." ends a line.
Mr. Smith
Dr. Patel
J. R. R. Tolkien
Short conjunctions before line-sensitive content:
Chapter 7
Figure 3
Section 4.2
Preventing orphaned prepositions in French typography — French style requires a non-breaking space before double punctuation marks (colon, semicolon, exclamation, question mark):
Bonjour !
Comment allez-vous ?
Résultat : 42
NBSP has one significant difference from regular space: it is not collapsed by HTML's whitespace algorithm, even in white-space: normal mode. This means consecutive NBSPs produce visible gaps — useful for manual indentation, but easy to introduce accidentally when pasting from a rich-text editor.
Thin Space (U+2009)
The thin space is approximately one-fifth of an em wide, though its exact width is font-dependent. It is ideal for:
Number grouping in non-English locales. The SI/ISO 80000 standard uses thin spaces (not commas) as thousands separators: 1 000 000. Many European typographic traditions follow this. Use NBSP variants when the number should not break:
<!-- Breaking thin space is fine mid-number in most contexts -->
1 000 000
<!-- Use NNBSP (U+202F) for guaranteed non-breaking thin space -->
1 000 000
Spacing around mathematical operators in inline text (as opposed to MathML, where spacing is automatic):
<i>f</i> (<i>x</i>)
<i>a</i> + <i>b</i>
After opening and before closing guillemets in some typographic traditions:
« Bonjour »
Em Space and En Space (U+2003, U+2002)
Em and en spaces are fixed-width spaces equal to 1 em and 0.5 em respectively. They are not recommended for general content spacing — CSS margin, padding, and gap are more appropriate and responsive. However, they have niche uses:
Typographic alignment in plain-text contexts (email, terminal output, markdown tables):
Name Score
Alice 100
Bob  98
Inside <pre> blocks where you want semantic spacing that survives copy-paste better than ASCII art using regular spaces.
Note that em and en spaces respect line-breaking (they can appear at line-break opportunities), which distinguishes them from NBSP. If you need a non-breaking em-width gap, combine white-space: nowrap on the container or use U+2007 (Figure Space) wrapped in a <span style="white-space: nowrap">.
Zero-Width Space (U+200B)
Zero-width space inserts an invisible line-break opportunity. The browser may wrap the line at a ZWSP position but renders no visible gap. This is useful for:
Long URLs and technical strings where you want to hint line breaks without changing the visual text:
<span>https://example.com/very/long/path​/that/needs/to/wrap</span>
CJK text mixed with non-CJK when automatic break opportunities are insufficient (though word-break: break-all is usually more appropriate — see Part 3 of this series).
Long compound words in German and Dutch where automatic hyphenation is unavailable:
Donau​dampf​schiff​fahrts​gesellschaft
ZWSP can cause subtle bugs. If a user copies text containing ZWSP into a form field or a search box, the invisible character travels with it, potentially causing mismatches. It also appears in string comparison operations in JavaScript:
// This is false — the string contains U+200B
"hello\u200Bworld" === "helloworld"; // false
// Strip ZWSP and other invisible characters before comparison
const clean = str.replace(/[\u200B-\u200D\uFEFF]/g, "");
Zero-Width Non-Joiner and Zero-Width Joiner
ZWNJ (U+200C) prevents two adjacent characters from forming a ligature or joining shape. In Arabic and Devanagari, this has grammatical significance — it controls whether letters take their joined or isolated form. In Latin typography, it suppresses discretionary ligatures:
<!-- Prevent the fi ligature in "find" for brand-name styling -->
f‌ind
ZWJ (U+200D) requests joining. Its most visible use in web contexts is emoji sequences — many multi-person and family emoji are encoded as sequences of individual emoji joined with ZWJ:
👨💻 = U+1F468 ZWJ U+1F4BB (man + laptop)
🏳️🌈 = U+1F3F3 U+FE0F ZWJ U+1F308 (flag + rainbow)
Platforms that recognize the ZWJ sequence render the combined emoji; platforms that do not render the individual components side by side.
Word Break and Overflow Control
Unicode whitespace characters interact with CSS text-overflow properties. The full cascade:
/* Allow breaking at any character (use for very long words/URLs) */
.break-anywhere {
overflow-wrap: anywhere;
}
/* Suggest break points but only break if necessary */
.break-word {
overflow-wrap: break-word;
}
/* CJK: break between any characters; non-CJK: no mid-word breaks */
.cjk-text {
word-break: normal;
}
/* Break mid-word for any script (aggressive) */
.aggressive-break {
word-break: break-all;
}
/* Keep CJK characters together as words (rare) */
.cjk-words {
word-break: keep-all;
}
The <wbr> HTML element is the semantic equivalent of ZWSP — it marks a preferred line-break opportunity without inserting visible whitespace:
<p>The function <code>calculate​TransactionFee<wbr>WithDiscount()</code> returns a decimal.</p>
For URLs specifically, word-break: break-all is the most reliable solution; ZWSP insertion is fragile because it must be maintained manually when URLs change.
Practical Spacing Patterns
Phone Numbers and Codes
<!-- Non-breaking spaces keep phone numbers together -->
<a href="tel:+12025551234">+1 (202) 555-1234</a>
<!-- Narrow no-break spaces for international format -->
+1 202 555 1234
Mathematical and Scientific Text
/* Use math-specific spacing in inline math contexts */
.math-inline {
font-feature-settings: "calt" 1;
white-space: nowrap;
}
<!-- Thin spaces around operators, NBSP after coefficient -->
<span class="math-inline">E = mc²</span>
<span class="math-inline">F = ma</span>
Quotation Marks and Punctuation
<!-- Typographic quotes with correct spacing -->
“Hello,” she said.
<!-- French guillemets with narrow no-break spaces -->
 « Bonjour » 
Currency and Financial Values
<!-- Keep currency symbol with amount -->
£ 1,299
€ 49.99
 ¥ 10,000
Detecting Whitespace Characters
When working with user-generated content or text from external sources, auditing for unexpected whitespace is important:
// Unicode whitespace regex (covers most whitespace code points)
const unicodeWhitespace = /[\u0009-\u000D\u0020\u00A0\u1680\u2000-\u200A\u2028\u2029\u202F\u205F\u3000\uFEFF]/g;
// Count whitespace characters by type
function auditWhitespace(text) {
const whitespaceMap = {
'\u0020': 'SPACE',
'\u00A0': 'NO-BREAK SPACE',
'\u200B': 'ZERO WIDTH SPACE',
'\u200C': 'ZERO WIDTH NON-JOINER',
'\u200D': 'ZERO WIDTH JOINER',
'\u2009': 'THIN SPACE',
'\u202F': 'NARROW NO-BREAK SPACE',
'\u3000': 'IDEOGRAPHIC SPACE',
'\uFEFF': 'ZERO WIDTH NO-BREAK SPACE',
};
const counts = {};
for (const char of text) {
const name = whitespaceMap[char];
if (name) {
counts[name] = (counts[name] || 0) + 1;
}
}
return counts;
}
For a visual audit without writing code, paste your text into our Character Counter tool — it identifies every Unicode code point, including invisible whitespace characters that are otherwise impossible to spot.
Next in Series: Part 3 moves to the complexity of CJK typography — font stacks, ruby annotations, vertical writing modes, and the CSS properties that make Chinese, Japanese, and Korean text render correctly. CJK Web Typography: Chinese, Japanese, and Korean Text on the Web