A Unicode escape sequence is a textual notation for representing a Unicode character using only ASCII characters. Escape sequences appear in source code strings, HTML markup, URLs, JSON, and configuration files -- anywhere a Unicode character might be inconvenient or impossible to type directly.
Escape Sequence Formats
Different languages and formats use different syntaxes:
| Context | Format | Example for A (U+0041) |
|---|---|---|
| JavaScript / JSON (BMP) | \uXXXX |
\u0041 |
| JavaScript (any code point) | \u{XXXXX} |
\u{1F600} |
| Python (BMP) | \uXXXX |
\u0041 |
| Python (full range) | \UXXXXXXXX |
\U0001F600 |
| HTML named entity | &name; |
&, < |
| HTML decimal entity | &#DDDDD; |
A |
| HTML hex entity | &#xHHHH; |
A |
| CSS | \HHHHHH |
\000041 |
| Java | \uXXXX |
\u0041 |
JavaScript Unicode Escapes
JavaScript originally supported only the 4-digit \uXXXX form, which covers BMP characters (U+0000-U+FFFF). ES2015 introduced the \u{...} form for code points up to U+10FFFF:
// 4-digit form (BMP only)
console.log('\u0041'); // 'A'
console.log('\u00e9'); // 'e-acute'
console.log('\u4e2d'); // Chinese character for middle
// ES2015 braced form (any code point)
console.log('\u{1F600}'); // emoji
console.log('\u{0041}'); // 'A'
// Emoji via surrogate pair escape (old style)
console.log('\uD83D\uDE00'); // emoji (surrogate pair for U+1F600)
Python Unicode Escapes
# \uXXXX for BMP, \UXXXXXXXX for full range
print('\u0041') # A
print('\u00e9') # e-acute
print('\U0001F600') # emoji
print('\N{SNOWMAN}') # snowman character (named character)
# In raw strings, escapes are not processed
print(r'\u0041') # \u0041 (literal backslash)
# Encoding to escape form
print('\U0001F600'.encode('unicode_escape')) # b'\\U0001f600'
HTML Character References
HTML supports both named and numeric character references. Numeric references use either decimal or hexadecimal notation:
<!-- Named entity -->
& <!-- & -->
< <!-- < -->
© <!-- copyright symbol -->
<!-- Decimal reference -->
© <!-- copyright symbol -->
😀 <!-- smiley emoji -->
<!-- Hex reference -->
© <!-- copyright symbol -->
😀 <!-- smiley emoji -->
URL Percent-Encoding vs Unicode Escapes
URL encoding (%XX) is not a Unicode escape -- it encodes bytes, not code points. To include a non-ASCII character in a URL, first encode it to UTF-8 bytes, then percent-encode each byte:
// encodeURIComponent handles the full pipeline
console.log(encodeURIComponent('cafe'));
// 'caf%C3%A9' (e-acute -> UTF-8 bytes C3 A9 -> percent-encoded)
Practical Uses
Unicode escapes are useful when a source file must stay ASCII-only (legacy systems or toolchains), when embedding characters that would be ambiguous in a template (like < or & in HTML), or when communicating code points precisely in documentation and bug reports. For general source code in modern editors with UTF-8 support, typing the character directly is usually clearer than using an escape.