SymbolFYI

Unicode Block

Unicode Standard
คำจำกัดความ

A contiguous range of code points defined by the Unicode standard, grouping related characters (e.g., 'Arrows' block: U+2190–U+21FF).

What Is a Unicode Block?

A Unicode block is a named, contiguous, non-overlapping range of code points within the Unicode standard. Blocks serve as an organizational tool, grouping characters that belong to the same script, symbol category, or historical/technical purpose. Every assigned code point belongs to exactly one block, and block boundaries are always aligned to multiples of 16 (0x10) code points.

For example: - Basic Latin spans U+0000 to U+007F (128 code points) — the ASCII-compatible range. - CJK Unified Ideographs spans U+4E00 to U+9FFF (20,992 code points) — the core Han ideograph block. - Emoticons spans U+1F600 to U+1F64F (80 code points) — containing face emoji.

As of Unicode 16.0, there are 326 named blocks.

Block Structure and Naming

Blocks are defined in the Blocks.txt data file of the Unicode Character Database (UCD). The naming convention is descriptive and generally reflects the script or character category. Some blocks are densely populated (nearly every code point is assigned), while others are sparse — having a code point within a block does not guarantee the code point is assigned to a character.

import unicodedata

# Python's unicodedata module does not expose block names directly,
# but you can query character properties
char = 'A'
print(unicodedata.name(char))      # 'LATIN CAPITAL LETTER A'
print(unicodedata.category(char))  # 'Lu' (Letter, uppercase)

# Using the 'unicodeblock' third-party package:
# pip install unicodeblock
import unicodeblock.blocks
print(unicodeblock.blocks.of('A'))  # 'BASIC LATIN'
print(unicodeblock.blocks.of(''))  # 'EMOTICONS'
// JavaScript does not have a built-in block lookup,
// but you can use regex Unicode property escapes (ES2018+)
const isBasicLatin = /^\p{Script=Latin}$/u;
console.log(isBasicLatin.test('A')); // true

Important Blocks for Web Developers

Text and Punctuation

  • Basic Latin (U+0000-007F): ASCII; the backbone of most web content.
  • Latin Extended-A/B (U+0100-024F): Accented and extended Latin letters.
  • General Punctuation (U+2000-206F): Em dashes, smart quotes, ellipsis, etc.

Symbols

  • Miscellaneous Symbols and Pictographs (U+1F300-1F5FF): Weather, nature, objects.
  • Supplemental Symbols and Pictographs (U+1F900-1F9FF): Newer emoji additions.
  • Mathematical Operators (U+2200-22FF): Math symbols like ∑, ∞, ≠.

CJK

  • CJK Unified Ideographs (U+4E00-9FFF): Core 20,902 ideographs.
  • CJK Extension A-H: Additional ideographs added in later Unicode versions.

Blocks vs. Scripts

Blocks and scripts are related but distinct Unicode properties. A block is defined purely by code point range and is a static organizational division. A script is a property assigned to each individual code point based on the writing system it belongs to. Multiple scripts can appear within the same block, and a single script (like Latin) can span multiple blocks. When doing language detection or text analysis, scripts are generally more useful than blocks.

สัญลักษณ์ที่เกี่ยวข้อง

คำที่เกี่ยวข้อง

เครื่องมือที่เกี่ยวข้อง

คู่มือที่เกี่ยวข้อง

How to Use the SymbolFYI Symbol Table Tool
A complete guide to SymbolFYI's Symbol Table — browse characters by Unicode block, filter by category, copy characters, and explore encoding details.
Box Drawing Characters: Building Text-Based UI with Unicode
Use Unicode box drawing characters to build tables, borders, and text-based interfaces — the complete reference with copy-paste examples and CSS tips.
CJK Web Typography: Chinese, Japanese, and Korean Text on the Web
Master CJK web typography — font stacks, line breaking rules, ruby annotation, vertical writing, CSS text-spacing, and mixed-script layout techniques.
The Private Use Area: Custom Characters in Unicode
Explore Unicode's Private Use Areas — how they work, why icon fonts use them, PUA in corporate fonts, and the risks of PUA characters in data exchange.
Mathematical Symbols in Unicode: A Complete Reference
The definitive reference for mathematical symbols in Unicode — operators, Greek letters, set theory, logic, arrows, and where to find them by block.
Mathematical Notation in Unicode: From Clay Tablets to Code Points
How mathematical symbols were standardized in Unicode — the history of +, −, ×, ÷, =, π, ∑, ∫ and the challenges of encoding mathematical notation.
Braille in Unicode: How a Tactile System Became Digital Text
The story of Braille's journey into Unicode — from Louis Braille's 1824 invention to the 256-character Braille Patterns block in Unicode.
Minus vs Hyphen vs Dash: Five Characters That Look Like a Line
Navigate the confusing world of horizontal line characters — hyphen-minus, en dash, em dash, minus sign, and horizontal bar.
CJK Unification: How Unicode Handles Chinese, Japanese, and Korean
Learn about Han Unification in Unicode — how shared CJK ideographs are unified, the controversy it creates, and how language tags affect rendering.
Unicode Planes and Blocks: How 1.1 Million Code Points Are Organized
Understand Unicode's 17 planes and hundreds of blocks — from the Basic Multilingual Plane to supplementary planes for emoji and historic scripts.