SymbolFYI

Character Reference

Web & HTML

정의

An HTML markup for inserting characters by number (☃ or ☃) or name (©), used for special or reserved characters.

Character Reference (HTML)

An HTML character reference is a special syntax for inserting a Unicode character into an HTML document using its name, decimal code point, or hexadecimal code point rather than the literal character. This is essential for characters that have special meaning in HTML, characters not available on a keyboard, or characters that might cause encoding issues.

Three Forms

1. Named Character References (Named Entities)

HTML defines a set of named references for commonly used characters:

&amp;    <!-- & (ampersand) -->
&lt;     <!-- < (less-than sign) -->
&gt;     <!-- > (greater-than sign) -->
&quot;   <!-- " (double quotation mark) -->
&apos;   <!-- ' (apostrophe, HTML5 only) -->
&copy;   <!-- © (copyright sign) -->
&reg;    <!-- ® (registered sign) -->
&trade;  <!-- ™ (trade mark sign) -->
&nbsp;   <!-- non-breaking space -->
&euro;   <!-- € (euro sign) -->

Named references are case-sensitive: & and &AMP; are treated the same by most browsers, but the HTML5 spec defines them as distinct. Always use lowercase to be safe.

2. Decimal Numeric References

Any Unicode character can be referenced by its decimal code point:

&#169;   <!-- © (U+00A9, decimal 169) -->
&#8364;  <!-- € (U+20AC, decimal 8364) -->
&#128512; <!-- 😀 (U+1F600, decimal 128512) -->

The format is &# followed by the decimal number, followed by a semicolon.

3. Hexadecimal Numeric References

Code points can also be expressed in hexadecimal, which maps directly to Unicode notation:

&#x00A9;  <!-- © (U+00A9) -->
&#x20AC;  <!-- € (U+20AC) -->
&#x1F600; <!-- 😀 (U+1F600) -->

The format is &#x (case-insensitive x) followed by hex digits, followed by a semicolon.

When to Use Character References

Required escaping — these characters must always be escaped in HTML content:

< → < (would be parsed as a tag)
> → > (closes a tag)
& → & (starts a character reference)
" → " (in attribute values)

Optional but useful — when the document encoding does not support the character, or for readability:

<!-- Mathematical content -->
f(x) = x&sup2; + 2x + 1  <!-- x² + 2x + 1 -->

<!-- Currency -->
&pound;4.99  <!-- £4.99 -->

Modern Best Practice

With UTF-8 as the universal encoding for web documents, most characters can be used literally without escaping:

<meta charset="UTF-8">
<!-- Now you can write directly: -->
<p>Copyright © 2024 — All rights reserved</p>
<p>Price: €49.99</p>

Only &, <, >, and quotes (in attribute values) must be escaped. Using character references for everything else is unnecessary and reduces readability.

Python Example

import html

# Escape for safe HTML insertion
unsafe = '<script>alert("xss")</script>'
safe = html.escape(unsafe)
print(safe)  # &lt;script&gt;alert(&quot;xss&quot;)&lt;/script&gt;

# Unescape references back to characters
print(html.unescape('&copy; 2024 &mdash; All rights reserved'))
# © 2024 — All rights reserved

Character Reference

Character Reference (HTML)

Three Forms

1. Named Character References (Named Entities)

2. Decimal Numeric References

3. Hexadecimal Numeric References

When to Use Character References

Modern Best Practice

Python Example

관련 기호

관련 용어

관련 도구

관련 가이드