SymbolFYI

Unicode Escape Sequence

Encoding
परिभाषा

A way to represent characters by their code point in programming languages (\u2603 in JS/Java, \u{2603} in ES6+, \U00002603 in Python).

A Unicode escape sequence is a textual notation for representing a Unicode character using only ASCII characters. Escape sequences appear in source code strings, HTML markup, URLs, JSON, and configuration files -- anywhere a Unicode character might be inconvenient or impossible to type directly.

Escape Sequence Formats

Different languages and formats use different syntaxes:

Context Format Example for A (U+0041)
JavaScript / JSON (BMP) \uXXXX \u0041
JavaScript (any code point) \u{XXXXX} \u{1F600}
Python (BMP) \uXXXX \u0041
Python (full range) \UXXXXXXXX \U0001F600
HTML named entity &name; &, <
HTML decimal entity &#DDDDD; A
HTML hex entity &#xHHHH; A
CSS \HHHHHH \000041
Java \uXXXX \u0041

JavaScript Unicode Escapes

JavaScript originally supported only the 4-digit \uXXXX form, which covers BMP characters (U+0000-U+FFFF). ES2015 introduced the \u{...} form for code points up to U+10FFFF:

// 4-digit form (BMP only)
console.log('\u0041');       // 'A'
console.log('\u00e9');       // 'e-acute'
console.log('\u4e2d');       // Chinese character for middle

// ES2015 braced form (any code point)
console.log('\u{1F600}');    // emoji
console.log('\u{0041}');     // 'A'

// Emoji via surrogate pair escape (old style)
console.log('\uD83D\uDE00'); // emoji (surrogate pair for U+1F600)

Python Unicode Escapes

# \uXXXX for BMP, \UXXXXXXXX for full range
print('\u0041')           # A
print('\u00e9')           # e-acute
print('\U0001F600')       # emoji
print('\N{SNOWMAN}')      # snowman character (named character)

# In raw strings, escapes are not processed
print(r'\u0041')          # \u0041 (literal backslash)

# Encoding to escape form
print('\U0001F600'.encode('unicode_escape'))  # b'\\U0001f600'

HTML Character References

HTML supports both named and numeric character references. Numeric references use either decimal or hexadecimal notation:

<!-- Named entity -->
&amp;    <!-- & -->
&lt;     <!-- < -->
&copy;   <!-- copyright symbol -->

<!-- Decimal reference -->
&#169;   <!-- copyright symbol -->
&#128512; <!-- smiley emoji -->

<!-- Hex reference -->
&#xA9;    <!-- copyright symbol -->
&#x1F600; <!-- smiley emoji -->

URL Percent-Encoding vs Unicode Escapes

URL encoding (%XX) is not a Unicode escape -- it encodes bytes, not code points. To include a non-ASCII character in a URL, first encode it to UTF-8 bytes, then percent-encode each byte:

// encodeURIComponent handles the full pipeline
console.log(encodeURIComponent('cafe'));
// 'caf%C3%A9'  (e-acute -> UTF-8 bytes C3 A9 -> percent-encoded)

Practical Uses

Unicode escapes are useful when a source file must stay ASCII-only (legacy systems or toolchains), when embedding characters that would be ambiguous in a template (like < or & in HTML), or when communicating code points precisely in documentation and bug reports. For general source code in modern editors with UTF-8 support, typing the character directly is usually clearer than using an escape.

संबंधित प्रतीक

संबंधित शब्द

संबंधित टूल

संबंधित गाइड