SymbolFYI

Basic Multilingual Plane (BMP)

Unicode Standard

定義

The first 65,536 code points of Unicode (U+0000 to U+FFFF), containing the most commonly used characters.

What Is the Basic Multilingual Plane?

The Basic Multilingual Plane (BMP) is the first of Unicode's 17 planes, encompassing all code points from U+0000 to U+FFFF — a total of 65,536 positions. The BMP was the entirety of the original Unicode 1.0 standard, before the standard was expanded to include supplementary planes. As a result, it contains the vast majority of characters used in everyday modern text: virtually all living scripts, punctuation, symbols, and the most common Han ideographs.

Why the BMP Matters

The BMP has outsized importance because of UTF-16, the encoding used internally by JavaScript, Java, Windows NT APIs, and older systems. In UTF-16, every BMP code point is represented as a single 16-bit code unit, making BMP characters fast and easy to handle. Code points outside the BMP require a surrogate pair — two 16-bit code units — which complicates string length calculations and iteration.

// BMP character — length is 1 as expected
console.log('A'.length);         // 1  (U+0041)
console.log('\u4E2D'.length);    // 1  (U+4E2D, '中')

// Non-BMP character — length is 2 (surrogate pair)
console.log(''.length);        // 2  (U+1F600, grinning face)
console.log([...''].length);   // 1  (spread is code-point-aware)

// Correct iteration over non-BMP text
for (const char of '') {
  console.log(char.codePointAt(0).toString(16)); // '1f600'
}

# Python 3 strings are sequences of code points — no surrogate issues
print(len(''))    # 1
print(len('A'))    # 1
print(len('中'))    # 1

Key Regions of the BMP

Scripts and Language Characters

The BMP houses all major living scripts: Latin, Greek, Cyrillic, Arabic, Hebrew, Devanagari, CJK Unified Ideographs (20,902 characters), Hangul syllables (11,172 characters), and many more. This makes the BMP sufficient for rendering the vast majority of the world's written languages.

Symbols and Technical Characters

The BMP contains extensive symbol ranges: arrows, mathematical operators, box-drawing characters, block elements, dingbats, and miscellaneous technical symbols. It also contains control characters (C0 and C1 controls), the space character, and many formatting characters.

Special Zones

Private Use Area (U+E000-F8FF): 6,400 code points reserved for vendor/application-specific use.
Surrogates (U+D800-DFFF): 2,048 code points reserved exclusively for UTF-16 encoding; they are not valid Unicode characters on their own.
Specials (U+FFF0-FFFF): Contains the Byte Order Mark (BOM) at U+FEFF and two noncharacters at U+FFFE and U+FFFF.

BMP vs. Supplementary Planes

Code points above U+FFFF reside in 16 supplementary planes. These include historic scripts, rare CJK extension ideographs, musical notation, mathematical alphanumeric symbols, and all emoji added in recent Unicode versions. While supplementary plane characters are increasingly common due to emoji, the BMP remains the primary plane for text processing and the one that legacy systems and encodings are built around.

Basic Multilingual Plane (BMP)

What Is the Basic Multilingual Plane?

Why the BMP Matters

Key Regions of the BMP

Scripts and Language Characters

Symbols and Technical Characters

Special Zones

BMP vs. Supplementary Planes

関連記号

関連用語

関連ツール

関連ガイド