SymbolFYI

The Private Use Area: Custom Characters in Unicode

Reference Tháng 8 27, 2024

Unicode contains three special zones where the standard deliberately assigns no meaning — leaving the space open for private agreements between parties. These are the Private Use Areas (PUA). Understanding them explains why icon fonts like Font Awesome work the way they do, why some characters look correct in one application and appear as rectangles in another, and why PUA characters require careful handling in data exchange.

What Is the Private Use Area?

The Unicode standard defines the Private Use Area as code point ranges whose interpretation is not specified by Unicode — they are available for use by applications, organizations, or private agreements. The standard guarantees only that these code points will never be assigned an official Unicode meaning.

There are three PUAs in total:

Block Range Size Location
BMP Private Use Area U+E000–U+F8FF 6,400 code points Basic Multilingual Plane
Supplementary Private Use Area-A U+F0000–U+FFFFF 65,534 code points Plane 15
Supplementary Private Use Area-B U+100000–U+10FFFF 65,534 code points Plane 16

The BMP range (U+E000–U+F8FF) is by far the most commonly used, because BMP characters are more widely supported by fonts and older systems.

The Design Purpose

The PUA was intentionally included in Unicode to solve a real problem: some communities, industries, and applications need custom characters that Unicode does not define — and they need those characters to work in Unicode text streams.

Examples of legitimate PUA uses: - A corporate font that includes a company logo character - A language research project documenting an undocumented script - A video game that has proprietary symbols in its lore - An internal enterprise application with domain-specific symbols - A font that maps decorative dingbats to PUA slots for historical compatibility

The key principle is that PUA characters only make sense within a closed system where all parties have agreed on the same mapping. Send a PUA character outside that system, and it becomes meaningless — or worse, gets mapped to something entirely different.

Icon Fonts: The Most Common PUA Use Case

The most widespread use of PUA characters in web development is icon fonts. Libraries like Font Awesome, Material Icons, and Google's Material Symbols encode their icons at PUA code points and distribute a custom font that maps those code points to icon glyphs.

How icon fonts work

  1. The icon font file maps PUA code points (e.g., U+F015) to icon glyphs (e.g., a home icon)
  2. CSS applies the icon font to a specific element
  3. A character — invisible in any other font — is placed at that code point in the HTML
  4. When the browser renders the element with the icon font loaded, the character becomes the icon
/* Font Awesome pattern (simplified) */
@font-face {
  font-family: 'FontAwesome';
  src: url('fontawesome.woff2') format('woff2');
}

.fa-home::before {
  font-family: 'FontAwesome';
  content: '\f015';  /* U+F015 in the PUA */
}
<!-- An icon rendered via PUA character -->
<i class="fa fa-home"></i>

The character U+F015 has no Unicode name or meaning. In Font Awesome's private agreement, it means "home icon." In any other font, it renders as a missing glyph (□) or nothing.

Common PUA mappings in icon fonts

Different icon libraries use different PUA ranges to avoid collisions:

Library PUA Range Used
Font Awesome 4 U+F000–U+F2E0
Font Awesome 5/6 U+E000–U+EFFF (extended)
Material Icons (Google) U+E000–U+EFFF
Glyphicons (Bootstrap) U+E001–U+E638
IcoFont U+EF00–U+F1A4

There is no coordination between these libraries, so different icon fonts may assign different icons to the same PUA code point. This is exactly the "private agreement" nature of PUA — it only works within one font's context.

The shift away from icon fonts

Modern development has largely moved from icon fonts to SVG icons and CSS icon systems (using background-image with SVG data URIs or separate SVG sprite files). The reasons include: - SVG icons are not affected by font loading failures - SVG icons scale and color with CSS without font hacks - Screen readers handle SVG icons more predictably - SVG icons avoid the PUA accessibility problems described below

That said, icon fonts remain common in large legacy codebases and in frameworks that built their icon system before SVG alternatives matured.

Corporate and Specialized Fonts

Beyond icon fonts, PUA characters appear in:

Corporate brand fonts: Many companies create custom fonts that include a trademarked logo character, a brand symbol, or specialized product glyphs in the PUA. These work reliably within brand-controlled environments (printed materials, internal apps) but are meaningless externally.

Apple's PUA usage: Apple's San Francisco font uses PUA code points for system symbols specific to Apple platforms — the Apple logo (U+F8FF) is the most famous example. This code point has been used in Mac fonts since the original Macintosh character set.

Emoji and symbol extensions: Some platforms historically extended emoji or symbol sets using PUA before Unicode assigned official code points. Characters that once lived in PUA may have been migrated to official Unicode code points as the standard expanded.

The ConScript Unicode Registry

One organized approach to PUA use is the ConScript Unicode Registry (CSUR), an unofficial registry that coordinates PUA assignments for constructed scripts and languages — Tolkien's Tengwar and Cirth, Klingon, Shavian (before Unicode added it officially), and dozens of other writing systems invented for fiction, linguistic experiments, or artistic purposes.

CSUR assigns specific PUA ranges to each script so that fonts implementing constructed scripts can interoperate:

Script CSUR Range
Tengwar (Tolkien) U+E000–U+E07F
Cirth (Tolkien) U+E080–U+E0FF
Klingon U+F8D0–U+F8FF
Unifon U+E740–U+E77F

CSUR gives constructed script enthusiasts a common framework, but it still requires everyone involved to use CSUR-compliant fonts.

Risks of PUA Characters

PUA characters are safe within their closed context but become problematic when text crosses system boundaries.

Data exchange

If a document containing PUA characters is sent to someone who uses a different font — or no font — for those code points, the characters will render incorrectly: - Displayed as empty boxes or question marks - Rendered as whatever glyph the recipient's default font assigns to that code point (possibly a different icon, or a different script entirely)

This is a common issue when: - Exporting data from an enterprise system that uses a custom font - Sharing documents across organizations - Indexing or processing content with a search engine or NLP pipeline

Accessibility

Screen readers cannot interpret PUA characters because there is no Unicode name to vocalize. An icon font character like U+F015 is completely meaningless to a screen reader — it will either skip it, read a generic "unknown character" announcement, or attempt to read the raw code point.

This is a significant accessibility concern for icon fonts. The WCAG-compliant approach is to ensure icon font characters have aria-hidden="true" and that meaningful labels are provided separately:

<!-- Accessible icon font usage -->
<button>
  <i class="fa fa-home" aria-hidden="true"></i>
  <span class="sr-only">Home</span>
</button>

Without aria-hidden="true", screen readers may announce something meaningless or disruptive when encountering the PUA character.

Search and indexing

Search engines and full-text search systems that do not know about a specific PUA mapping cannot index PUA characters semantically. An icon font character in a heading has zero SEO value and may actually confuse the search engine's understanding of the page.

Copy-paste behavior

When a user copies text containing PUA characters from a web page and pastes it elsewhere, the PUA code points travel with the copied text. In the destination application — which likely uses a different font — the PUA characters will render as unknown glyphs or replacement characters. This is a subtle usability problem: text that looks fine on screen produces garbled output when pasted into an email or document.

Best Practices for PUA Usage

If you are using icon fonts: - Always add aria-hidden="true" to icon elements - Provide visible or screen-reader-accessible text labels - Ensure the icon font is reliably loaded (use font-display: swap with a fallback) - Consider migrating to SVG icons for new projects

If you are creating a font with PUA characters: - Document your PUA assignments clearly - Use CSUR ranges if relevant to your use case - Do not place user-readable content (text that should be searchable, copyable, or accessible) in PUA characters - Consider whether your characters could be submitted to Unicode for official assignment

If you are processing text data: - Be aware that PUA characters in input data are meaningful only in the context of the originating system - Strip or replace PUA characters if you are normalizing or sanitizing text for output in a different context - Do not attempt to use PUA characters as a general-purpose encoding trick — that is what user-defined Unicode characters (official Unicode 15.0+ additions) are for

In databases: - Store PUA characters faithfully if you need to preserve them (UTF-8 handles all valid Unicode code points) - Consider whether round-tripping through export/import preserves PUA characters if your data pipeline involves format conversion

Use the SymbolFYI Symbol Table tool to browse Unicode ranges, and the Unicode Lookup tool to inspect specific code points — including those in the PUA range.

Ký hiệu liên quan

Thuật ngữ liên quan

Công cụ liên quan

Thêm hướng dẫn