SymbolFYI

Unicode Plane

Unicode Standard

Definição

A group of 65,536 consecutive code points. Unicode has 17 planes (0–16), with Plane 0 being the BMP.

What Is a Unicode Plane?

A plane in Unicode is a contiguous group of 65,536 (2¹⁶) code points. Unicode is organized into 17 planes numbered 0 through 16, providing a total of 17 × 65,536 = 1,114,112 code points. Each plane spans a range from U+XX0000 to U+XXFFFF, where XX is the plane number in hexadecimal (00 through 10).

The plane structure reflects the historical evolution of Unicode from a 16-bit system (one plane) to the current 21-bit system (17 planes).

The 17 Planes

Plane	Range	Name	Contents
0	`U+0000–FFFF`	Basic Multilingual Plane (BMP)	Most living scripts, common symbols
1	`U+10000–1FFFF`	Supplementary Multilingual Plane (SMP)	Historic scripts, emoji, musical notation
2	`U+20000–2FFFF`	Supplementary Ideographic Plane (SIP)	Rare and historic CJK ideographs
3	`U+30000–3FFFF`	Tertiary Ideographic Plane (TIP)	CJK Extension G–H
4–13	`U+40000–DFFFF`	Unassigned	Reserved for future use
14	`U+E0000–EFFFF`	Supplementary Special-purpose Plane	Language tags, variation selectors
15	`U+F0000–FFFFF`	Supplementary Private Use Area-A	Application-specific
16	`U+100000–10FFFF`	Supplementary Private Use Area-B	Application-specific

Plane 0: Basic Multilingual Plane

The BMP contains the overwhelming majority of characters used in modern text worldwide. It was the original scope of Unicode before the standard was extended. All BMP characters are represented as single 16-bit code units in UTF-16.

Plane 1: Supplementary Multilingual Plane

The SMP is the most actively developed supplementary plane. It contains: - Historic scripts: Linear B, Cuneiform, Phoenician, Egyptian Hieroglyphs - Emoji: Emoticons, pictographs, and symbol blocks - Musical notation: Western musical symbols - Mathematical Alphanumeric Symbols: Bold, italic, script mathematical letters - Playing cards, dominos, chess symbols

Plane 2: Supplementary Ideographic Plane

The SIP houses CJK Extension B through F — rare, archaic, and dialect-specific Chinese characters needed by scholars and for processing historical texts.

Encoding Across Planes

# Python handles all planes natively
bmp_char = chr(0x4E2D)          # U+4E2D '中' — Plane 0
smp_emoji = chr(0x1F600)        # U+1F600 '😀' — Plane 1
sip_rare = chr(0x20000)         # U+20000 '𠀀' — Plane 2

print(f'Plane 0: {bmp_char}  (code point {hex(ord(bmp_char))})')
print(f'Plane 1: {smp_emoji}  (code point {hex(ord(smp_emoji))})')
print(f'Plane 2: {sip_rare}  (code point {hex(ord(sip_rare))})')

# Determine which plane a code point is in
cp = ord('😀')   # 128512
plane = cp >> 16
print(f'Plane: {plane}')  # 1

// JavaScript UTF-16: BMP chars are 1 code unit, others are 2 (surrogate pair)
const bmp = '\u4E2D';          // 1 code unit
const emoji = '\u{1F600}';    // 2 code units (surrogate pair)
console.log(bmp.length);       // 1
console.log(emoji.length);     // 2

// Determine plane from code point
const cp = emoji.codePointAt(0);  // 128512
const plane = cp >> 16;
console.log(plane);  // 1

The 17-Plane Limit

The maximum code point U+10FFFF is not arbitrary — it is the highest value that can be encoded using UTF-16 surrogate pairs without collision. The high surrogate range is U+D800–U+DBFF (1,024 values) and the low surrogate range is U+DC00–U+DFFF (1,024 values), providing 1,024 × 1,024 = 1,048,576 supplementary code points — exactly 16 additional planes above the BMP.

Termos relacionados