Whitespace in Unicode is not a single character but a category encompassing many characters that represent horizontal or vertical blank space. The Unicode standard defines whitespace characters through several properties, and the behavior of different whitespace characters varies significantly across programming languages, browsers, regular expressions, and text processing tools.
Unicode Whitespace Characters
U+0009 CHARACTER TABULATION (horizontal tab)
U+000A LINE FEED
U+000B LINE TABULATION (vertical tab)
U+000C FORM FEED
U+000D CARRIAGE RETURN
U+0020 SPACE
U+00A0 NO-BREAK SPACE
U+1680 OGHAM SPACE MARK
U+2000 EN QUAD
U+2001 EM QUAD
U+2002 EN SPACE
U+2003 EM SPACE
U+2004 THREE-PER-EM SPACE
U+2005 FOUR-PER-EM SPACE
U+2006 SIX-PER-EM SPACE
U+2007 FIGURE SPACE
U+2008 PUNCTUATION SPACE
U+2009 THIN SPACE
U+200A HAIR SPACE
U+2028 LINE SEPARATOR
U+2029 PARAGRAPH SEPARATOR
U+202F NARROW NO-BREAK SPACE
U+205F MEDIUM MATHEMATICAL SPACE
U+3000 IDEOGRAPHIC SPACE
CSS White-Space Properties
CSS controls how whitespace in HTML source is rendered:
/* Default: collapse runs of whitespace, wrap lines */
.normal { white-space: normal; }
/* Preserve all whitespace, no wrapping */
.pre { white-space: pre; }
/* Preserve whitespace, allow wrapping */
.pre-wrap { white-space: pre-wrap; }
/* Collapse whitespace, no wrapping */
.nowrap { white-space: nowrap; }
/* Preserve line breaks only, collapse spaces */
.pre-line { white-space: pre-line; }
HTML Whitespace Collapsing
In HTML, by default, any sequence of whitespace characters (spaces, tabs, newlines) is collapsed to a single space for rendering, and newlines are treated as spaces:
<!-- These render identically -->
<p>Hello World</p>
<p>Hello World</p>
<!-- To preserve whitespace -->
<pre>Hello World</pre>
JavaScript Whitespace Handling
// \s in regex matches: space, tab, newline, CR, form feed, vertical tab,
// and in Unicode-aware mode, also Unicode whitespace
'hello world'.replace(/\s+/g, ' '); // 'hello world'
// trim() removes: space, tab, newline, CR, form feed, vertical tab
// It does NOT remove NBSP (U+00A0) by default in many engines
' hello '.trim(); // 'hello'
// Check if char is any Unicode whitespace
function isUnicodeWhitespace(char) {
return /^\p{White_Space}$/u.test(char);
}
// The \p{White_Space} Unicode property class covers all Unicode whitespace
'hello\u2003world'.replace(/\p{White_Space}/gu, '_'); // 'hello_world'
Typographic Space Characters
The variety of space widths allows fine typographic control:
En space:   (U+2002, 1/2 em)
Em space:   (U+2003, 1 em)
Thin space:   (U+2009, ~1/6 em)
Hair space: (U+200A, thinner than thin)
Narrow NBSP: (U+202F, narrow non-breaking)
Figure space: (U+2007, same width as digits)
Python vs JavaScript Behavior
# Python str.split() without args splits on ALL Unicode whitespace
'hello\u2003world'.split() # ['hello', 'world']
# Python str.strip() also handles Unicode whitespace
'\u2003hello\u2003'.strip() # 'hello'
Understanding the full Unicode whitespace category is essential for building robust text processing systems that correctly handle international content.