What Is the Basic Multilingual Plane?
The Basic Multilingual Plane (BMP) is the first of Unicode's 17 planes, encompassing all code points from U+0000 to U+FFFF — a total of 65,536 positions. The BMP was the entirety of the original Unicode 1.0 standard, before the standard was expanded to include supplementary planes. As a result, it contains the vast majority of characters used in everyday modern text: virtually all living scripts, punctuation, symbols, and the most common Han ideographs.
Why the BMP Matters
The BMP has outsized importance because of UTF-16, the encoding used internally by JavaScript, Java, Windows NT APIs, and older systems. In UTF-16, every BMP code point is represented as a single 16-bit code unit, making BMP characters fast and easy to handle. Code points outside the BMP require a surrogate pair — two 16-bit code units — which complicates string length calculations and iteration.
// BMP character — length is 1 as expected
console.log('A'.length); // 1 (U+0041)
console.log('\u4E2D'.length); // 1 (U+4E2D, '中')
// Non-BMP character — length is 2 (surrogate pair)
console.log(''.length); // 2 (U+1F600, grinning face)
console.log([...''].length); // 1 (spread is code-point-aware)
// Correct iteration over non-BMP text
for (const char of '') {
console.log(char.codePointAt(0).toString(16)); // '1f600'
}
# Python 3 strings are sequences of code points — no surrogate issues
print(len('')) # 1
print(len('A')) # 1
print(len('中')) # 1
Key Regions of the BMP
Scripts and Language Characters
The BMP houses all major living scripts: Latin, Greek, Cyrillic, Arabic, Hebrew, Devanagari, CJK Unified Ideographs (20,902 characters), Hangul syllables (11,172 characters), and many more. This makes the BMP sufficient for rendering the vast majority of the world's written languages.
Symbols and Technical Characters
The BMP contains extensive symbol ranges: arrows, mathematical operators, box-drawing characters, block elements, dingbats, and miscellaneous technical symbols. It also contains control characters (C0 and C1 controls), the space character, and many formatting characters.
Special Zones
- Private Use Area (
U+E000-F8FF): 6,400 code points reserved for vendor/application-specific use. - Surrogates (
U+D800-DFFF): 2,048 code points reserved exclusively for UTF-16 encoding; they are not valid Unicode characters on their own. - Specials (
U+FFF0-FFFF): Contains the Byte Order Mark (BOM) atU+FEFFand two noncharacters atU+FFFEandU+FFFF.
BMP vs. Supplementary Planes
Code points above U+FFFF reside in 16 supplementary planes. These include historic scripts, rare CJK extension ideographs, musical notation, mathematical alphanumeric symbols, and all emoji added in recent Unicode versions. While supplementary plane characters are increasingly common due to emoji, the BMP remains the primary plane for text processing and the one that legacy systems and encodings are built around.