HTML Entities
HTML entities are special text sequences used in HTML markup to represent characters that either have special meaning in HTML syntax or are not easily typed on a keyboard. They come in three forms: named entities, decimal numeric references, and hexadecimal numeric references. All three produce the same output—the specified Unicode character—but differ in readability and typing convenience.
Three Entity Forms
Named Entities
Named entities use a descriptive name preceded by & and followed by ;:
& → & (U+0026 AMPERSAND)
< → < (U+003C LESS-THAN SIGN)
> → > (U+003E GREATER-THAN SIGN)
" → " (U+0022 QUOTATION MARK)
' → ' (U+0027 APOSTROPHE)
© → © (U+00A9 COPYRIGHT SIGN)
® → ® (U+00AE REGISTERED SIGN)
™ → ™ (U+2122 TRADE MARK SIGN)
€ → € (U+20AC EURO SIGN)
→ (U+00A0 NO-BREAK SPACE)
— → — (U+2014 EM DASH)
– → – (U+2013 EN DASH)
… → … (U+2026 HORIZONTAL ELLIPSIS)
Decimal Numeric References
Numeric references use the decimal Unicode code point:
© → © (U+00A9)
€ → € (U+20AC)
☃ → ☃ (U+2603 SNOWMAN)
😀 → 😀 (U+1F600 GRINNING FACE)
Hexadecimal Numeric References
Hexadecimal references use &#x followed by the hex code point:
© → © (U+00A9)
€ → € (U+20AC)
☃ → ☃ (U+2603)
😀 → 😀 (U+1F600)
When to Use Entities
Required Escaping
Four characters must be escaped when they appear as literal content in HTML:
<!-- Inside element content -->
& <!-- instead of & -->
< <!-- instead of < -->
<!-- Inside attribute values -->
" <!-- inside double-quoted attributes -->
' <!-- inside single-quoted attributes -->
Optional Escaping
In modern UTF-8 encoded HTML, all other characters can be written directly without entities:
<!-- Both are equivalent in UTF-8 HTML -->
<p>Copyright © 2024</p>
<p>Copyright © 2024</p>
<!-- Both are equivalent -->
<p>☃</p>
<p>☃</p>
Entities are still useful when the source file encoding cannot guarantee correct character storage, or when communicating characters in contexts where Unicode characters might be corrupted (such as some email systems or legacy tools).
HTML5 Named Entity Reference
HTML5 defines 2,231 named character references. The most commonly needed symbols:
| Entity | Character | Description |
|---|---|---|
♥ |
♥ | Heart suit |
♠ |
♠ | Spade suit |
✓ |
✓ | Check mark |
✗ |
✗ | Ballot X |
∞ |
∞ | Infinity |
∑ |
∑ | N-ary summation |
⇒ |
⇒ | Rightwards double arrow |
π |
π | Greek pi |
Entities in Code
When generating HTML programmatically, use your language's escaping utilities rather than manual entity replacement:
import html
html.escape('<script>alert("XSS")</script>')
# → '<script>alert("XSS")</script>'
html.unescape('© 2024')
# → '© 2024'
// No built-in, but a common pattern:
function escapeHtml(str) {
return str
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/"/g, '"');
}
Django templates auto-escape by default; use {{ value|safe }} only when the value is already trusted HTML.