SymbolFYI

HTML Entities: The Complete Guide to Character References

HTML entities are the mechanism HTML uses to represent characters that either cannot appear directly in markup or would be interpreted as markup syntax. Understanding them fully — not just & and < — makes you a more precise developer and helps you avoid subtle bugs in templates, APIs, and content pipelines.

What Is an HTML Entity?

An HTML entity is a text string that begins with & and ends with ;. The browser parser replaces it with the corresponding Unicode character before rendering. There are three forms:

Named entity references — human-readable names defined in the HTML specification:

&copy;   <!-- © -->
&mdash;  <!-- — -->
&hellip; <!-- … -->
&nbsp;   <!-- non-breaking space (U+00A0) -->

Decimal numeric character references — the Unicode code point in base 10:

&#169;   <!-- © (U+00A9) -->
&#8212;  <!-- — (U+2014) -->
&#8230;  <!-- … (U+2026) -->

Hexadecimal numeric character references — the code point in base 16, prefixed with x:

&#xA9;   <!-- © -->
&#x2014; <!-- — -->
&#x2026; <!-- … -->

All three forms for © are equivalent. The named form is the most readable; the hex form is the most common in generated output because it maps directly to Unicode code point notation (U+00A9 → &#xA9;).

When Escaping Is Required

The HTML specification only requires escaping in specific contexts. Knowing exactly where is important so you do not over-escape (breaking readability) or under-escape (introducing bugs or vulnerabilities).

In text content

The characters < and & must be escaped in text nodes because they start tag and entity syntax respectively:

<!-- Wrong: breaks parsing -->
<p>Use if (a < b) && (c > d) to compare.</p>

<!-- Correct -->
<p>Use if (a &lt; b) &amp;&amp; (c &gt; d) to compare.</p>

> does not technically need escaping in text content, but escaping it is harmless and many sanitizers do it anyway.

In attribute values

Inside quoted attributes, you must escape the quote character being used and &:

<!-- Double-quoted: escape " and & -->
<a href="search?q=rock+&amp;+roll&amp;lang=en" title="Rock &amp; Roll">

<!-- Single-quoted: escape ' and & -->
<a href='search?q=it&apos;s-fine'>

The &apos; entity is valid in HTML5 but was not in HTML 4. For maximum compatibility in HTML attributes, use &#39; or switch to double quotes.

In raw text elements

<script> and <style> are raw text elements — the parser does not process entities inside them. Do not use HTML entities inside JavaScript string literals embedded in <script> tags:

<!-- Wrong: the &amp; is NOT decoded inside <script> -->
<script>
  const name = "Rock &amp; Roll"; // literal string contains "&amp;"
</script>

<!-- Correct -->
<script>
  const name = "Rock & Roll";
</script>

If you need to embed user-controlled data into a <script> block, use JSON serialization, not HTML entity encoding.

The &nbsp; Trap

&nbsp; (U+00A0, NO-BREAK SPACE) is one of the most misused entities. It looks identical to a regular space but behaves differently:

  • It prevents line breaking between adjacent words
  • It is not collapsed by CSS white-space: normal
  • Screen readers may announce it differently
  • It is invisible in most text editors
<!-- Avoid using &nbsp; for layout spacing -->
<td>&nbsp;&nbsp;&nbsp;Padded text</td>  <!-- use CSS padding instead -->

<!-- Legitimate use: prevent unwanted line breaks -->
<span>10&nbsp;kg</span>       <!-- keeps "10" and "kg" together -->
<span>§&nbsp;42</span>        <!-- section number and its symbol -->
<span>Dr.&nbsp;Smith</span>   <!-- title stays with name -->

For layout spacing, always use CSS padding, margin, or gap. Reserve &nbsp; for semantic no-break situations.

Named Entity Reference Table

The HTML5 specification defines over 2,000 named character references. Here are the ones you will actually use:

Punctuation and typography

Entity Character Unicode Description
&amp; & U+0026 Ampersand
&lt; < U+003C Less-than sign
&gt; > U+003E Greater-than sign
&quot; " U+0022 Quotation mark
&apos; ' U+0027 Apostrophe (HTML5)
&mdash; U+2014 Em dash
&ndash; U+2013 En dash
&hellip; U+2026 Horizontal ellipsis
&laquo; « U+00AB Left double angle quote
&raquo; » U+00BB Right double angle quote
&ldquo; " U+201C Left double quotation mark
&rdquo; " U+201D Right double quotation mark
&lsquo; ' U+2018 Left single quotation mark
&rsquo; ' U+2019 Right single quotation mark

Special spaces

Entity Character Unicode Description
&nbsp; (NBSP) U+00A0 No-break space
&ensp; (EN SP) U+2002 En space
&emsp; (EM SP) U+2003 Em space
&thinsp; (THIN SP) U+2009 Thin space
&zwnj; (ZWNJ) U+200C Zero-width non-joiner
&zwj; (ZWJ) U+200D Zero-width joiner

Symbols and currency

Entity Character Unicode Description
&copy; © U+00A9 Copyright sign
&reg; ® U+00AE Registered sign
&trade; U+2122 Trade mark sign
&euro; U+20AC Euro sign
&pound; £ U+00A3 Pound sign
&yen; ¥ U+00A5 Yen sign
&deg; ° U+00B0 Degree sign
&plusmn; ± U+00B1 Plus-minus sign
&times; × U+00D7 Multiplication sign
&divide; ÷ U+00F7 Division sign
&infin; U+221E Infinity
&ne; U+2260 Not equal to

Arrows

Entity Character Unicode Description
&larr; U+2190 Leftwards arrow
&rarr; U+2192 Rightwards arrow
&uarr; U+2191 Upwards arrow
&darr; U+2193 Downwards arrow
&harr; U+2194 Left right arrow

HTML Entities vs. Direct Unicode Characters

Modern HTML documents are almost always UTF-8. In UTF-8, you can write most Unicode characters directly without entities:

<!-- Both are valid in UTF-8 HTML -->
<p>Copyright &copy; 2024 Acme Corp</p>
<p>Copyright © 2024 Acme Corp</p>

<!-- Both produce identical DOM -->
<p>Price: &euro;49.99</p>
<p>Price: €49.99</p>

The direct form is more readable in source and equally safe when your document is properly declared as UTF-8:

<meta charset="UTF-8">

Use entities when: - Your editor or build pipeline cannot reliably preserve certain Unicode characters - You are generating HTML in a context where the output encoding is not guaranteed to be UTF-8 - The character is invisible or confusable (e.g., &nbsp; over a literal non-breaking space that looks identical to a regular space) - You need to store HTML snippets in a system that strips non-ASCII characters

Escaping in Template Engines

Every major template engine auto-escapes HTML by default. Know what your engine escapes:

# Django templates — auto-escapes &, <, >, ", '
{{ user_input }}             # safe — escaped automatically
{{ user_input|safe }}        # unsafe — disables escaping
{% autoescape off %}...{% endautoescape %}  # unsafe block
// Jinja2 (Python) — same behavior as Django
{{ user_input }}        // escaped
{{ user_input|safe }}   // raw

// Handlebars (JS)
{{ user_input }}        // escaped: & < > " ' ` =
{{{ user_input }}}      // raw — triple curly means unescaped
// React JSX — auto-escapes text content
<p>{userInput}</p>                          // safe
<p dangerouslySetInnerHTML={{__html: raw}} /> // unsafe — name says it all

The common mistake is double-escaping: taking already-escaped HTML and running it through the escaper again, producing &amp;lt; instead of &lt;. If you see literal entity strings appearing in your rendered page, this is almost always the cause.

Generating Entities Programmatically

When building HTML in code, use your language's dedicated escaping function rather than implementing your own:

# Python — html module (stdlib)
import html

html.escape('<script>alert("xss")</script>')
# → '&lt;script&gt;alert(&quot;xss&quot;)&lt;/script&gt;'

html.escape("Rock & Roll", quote=False)  # don't escape quotes
# → 'Rock &amp; Roll'

html.unescape('&lt;p&gt;Hello&lt;/p&gt;')
# → '<p>Hello</p>'
// JavaScript — no stdlib function, but this pattern is reliable
function escapeHtml(str) {
  return str
    .replace(/&/g, '&amp;')
    .replace(/</g, '&lt;')
    .replace(/>/g, '&gt;')
    .replace(/"/g, '&quot;')
    .replace(/'/g, '&#39;');
}

// Or use the DOM itself (browser only)
function escapeHtml(str) {
  const div = document.createElement('div');
  div.textContent = str;
  return div.innerHTML;
}

Never build your own escaper by replacing just < and >. Missing & means a second pass of escaping will corrupt already-escaped content, and missing " opens attribute injection vulnerabilities.

Common Pitfalls

Entities in JSON: JSON does not use HTML entities. If you are storing HTML-escaped content in JSON, the consumer must un-escape it. Prefer storing raw content in JSON and escaping at render time.

Entities in email: Many email clients have inconsistent HTML support. Numeric entities are safer than named entities in email HTML, as some older clients do not implement the full named entity list.

The semicolon is not always required: Legacy HTML parsers accept entities without a closing semicolon in some contexts (&amp parses as &). Always include the semicolon. Omitting it causes subtle breakage when the entity is followed by certain characters.

Not all named entities are in HTML 4: &apos; and many mathematical entities (&forall;, &part;, etc.) were added in HTML5. If you need to support very old parsers, use numeric references instead.

Practical Checklist

Before shipping HTML content, verify:

  1. Text content is escaped for < and & at minimum
  2. Attribute values are escaped for the quote character in use and &
  3. No HTML entities appear inside <script> or <style> blocks
  4. Template auto-escaping is enabled and not accidentally disabled with |safe or {{{ }}}
  5. No double-escaping in content pipelines that pass data through multiple layers

Use the SymbolFYI Encoding Converter to inspect code points and generate the correct entity form for any character.


Next in Series: CSS Content Property: Using Unicode Symbols in Stylesheets — how to inject Unicode characters via CSS ::before and ::after, write correct escape sequences, and handle accessibility when decorating with symbols.

संबंधित प्रतीक

संबंधित शब्दावली

संबंधित टूल

और गाइड