SymbolFYI

Code Point

Unicode Standard
Tanım

A numerical value in the Unicode standard that maps to a specific character, written as U+ followed by hexadecimal digits (e.g., U+0041 for 'A').

What Is a Code Point?

A code point is the fundamental unit of the Unicode standard — a unique numerical value assigned to every character, symbol, control code, and abstract entity in the Unicode repertoire. Every code point is written in the format U+ followed by four to six hexadecimal digits, such as U+0041 for the Latin capital letter A, U+1F600 for the grinning face emoji, or U+200D for the Zero Width Joiner.

The Unicode standard defines a total space of 1,114,112 code points, ranging from U+0000 to U+10FFFF. Not all of these are assigned to characters — some are reserved for future use, some are designated as private-use, and some are surrogates used by the UTF-16 encoding.

Code Point Notation

By convention, code points with values below U+FFFF are written with exactly four hex digits: U+0041. Code points in supplementary planes use five or six digits: U+1F600, U+10FFFF. The U+ prefix is always uppercase and there are no spaces within the notation.

In programming, you will often encounter code points expressed in different bases depending on the encoding or language:

# Python: get the code point of a character
code_point = ord('A')          # 65 (decimal)
print(hex(code_point))         # '0x41'
print(f'U+{code_point:04X}')   # 'U+0041'

# Convert a code point back to a character
char = chr(0x1F600)            # returns the grinning face emoji
print(char)                    # prints the emoji
// JavaScript: get the code point of a character
const cp = ''.codePointAt(0);  // 128512
console.log(cp.toString(16));  // '1f600'
console.log(`U+${cp.toString(16).toUpperCase().padStart(4, '0')}`); // 'U+1F600'

// Convert a code point back to a string
const char = String.fromCodePoint(0x1F600); // grinning face emoji

Code Points vs. Characters vs. Glyphs

These three terms are often confused but represent distinct concepts:

  • A code point is the abstract number assigned by Unicode.
  • A character is the semantic entity the code point represents (a letter, digit, symbol).
  • A glyph is the visual representation rendered on screen by a specific font.

One code point may map to multiple glyphs depending on context (e.g., Arabic letters change shape based on their position in a word), and one visible character as perceived by a user (a grapheme cluster) may require multiple code points — for instance, a base letter combined with a diacritical mark.

Assigned vs. Unassigned Code Points

As of Unicode 16.0, roughly 154,998 code points are assigned to characters. The rest fall into categories such as unassigned, reserved, noncharacters, and surrogates. Noncharacters (like U+FFFE and U+FFFF) are permanently reserved and will never be assigned to a character; they are intended for internal use within applications.

Practical Importance for Developers

Understanding code points is essential when working with string length calculations, text processing, and encoding. For example, in JavaScript, ''.length returns 2 because the emoji sits outside the Basic Multilingual Plane and is represented as a surrogate pair in UTF-16. Using [...''].length (spread operator, which is code-point-aware) correctly returns 1. Similarly in Python 3, strings are sequences of code points, so len('') returns 1 as expected.

İlgili Semboller

İlgili Terimler

İlgili Araçlar

İlgili Kılavuzlar

How to Use the SymbolFYI Text Diff Tool
Guide to SymbolFYI's Text Diff Tool — compare two texts character by character to find invisible Unicode differences, encoding issues, and confusables.
How to Use the SymbolFYI Fancy Text Generator
A guide to SymbolFYI's Fancy Text Generator — convert text to Unicode bold, italic, script, fraktur, and monospace styles for social media.
How to Use the SymbolFYI Symbol Table Tool
A complete guide to SymbolFYI's Symbol Table — browse characters by Unicode block, filter by category, copy characters, and explore encoding details.
How to Use the SymbolFYI Character Analyzer
A guide to SymbolFYI's Character Analyzer — count characters, words, and bytes, inspect Unicode properties, and analyze text encoding character by character.
How to Use the SymbolFYI Unicode Lookup Tool
A guide to SymbolFYI's Unicode Lookup — enter a U+ codepoint to see the character's name, block, script, and full encoding details.
How to Use the SymbolFYI Symbol Search Tool
A complete guide to SymbolFYI's Symbol Search — find Unicode characters by name, keyword, HTML entity, or pasted character, with one-click copy in any format.
Windows Alt Codes: Complete Reference for Special Characters
The complete Windows Alt code reference — how Alt codes work, the most useful codes for common symbols, and alternatives for modern Windows.
Unicode Hex Input on macOS: Type Any Character by Code Point
Enable and use the Unicode Hex Input keyboard on macOS — type any Unicode character by holding Option and typing its hex code point.
Mathematical Symbols in Unicode: A Complete Reference
The definitive reference for mathematical symbols in Unicode — operators, Greek letters, set theory, logic, arrows, and where to find them by block.
Mathematical Notation in Unicode: From Clay Tablets to Code Points
How mathematical symbols were standardized in Unicode — the history of +, −, ×, ÷, =, π, ∑, ∫ and the challenges of encoding mathematical notation.