SymbolFYI

Web Fonts and Unicode Subsetting: Loading Only What You Need

A full-featured variable font for a single typeface — covering Latin, Greek, Cyrillic, and common symbols — can easily exceed 500 KB. A CJK font covering Chinese, Japanese, and Korean characters may run to several megabytes. Loading all of this for a page that uses only Latin text with a few mathematical symbols is wasteful. The unicode-range CSS descriptor, combined with the browser's lazy font loading behavior, solves this.

unicode-range-works">How unicode-range Works

The unicode-range descriptor in a @font-face rule tells the browser which Unicode code points a given font file contains. The browser downloads the font file only when the page actually needs to render one of those characters:

@font-face {
  font-family: 'MyFont';
  src: url('myfont-latin.woff2') format('woff2');
  unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC,
                 U+02C6, U+02DA, U+02DC, U+2000-206F, U+2074,
                 U+20AC, U+2122, U+2191, U+2193, U+2212, U+2215,
                 U+FEFF, U+FFFD;
}

@font-face {
  font-family: 'MyFont';
  src: url('myfont-cyrillic.woff2') format('woff2');
  unicode-range: U+0301, U+0400-045F, U+0490-0491, U+04B0-04B1, U+2116;
}

@font-face {
  font-family: 'MyFont';
  src: url('myfont-greek.woff2') format('woff2');
  unicode-range: U+0370-03FF;
}

With these declarations: - A page in English downloads only myfont-latin.woff2 - A page in Russian downloads only myfont-cyrillic.woff2 (plus Latin if needed) - A page mixing languages downloads only the subsets it actually uses - None of the subset files is downloaded until a character in its range appears in the page content

Unicode Range Notation

The unicode-range value accepts several forms:

/* Single code point */
unicode-range: U+26;           /* U+0026 AMPERSAND */

/* Contiguous range */
unicode-range: U+0025-00FF;    /* code points from U+0025 to U+00FF */

/* Wildcard (? matches any hex digit) */
unicode-range: U+4??;          /* U+0400 to U+04FF — Cyrillic block */
unicode-range: U+26??;         /* U+2600 to U+26FF — Miscellaneous Symbols */

/* Comma-separated list */
unicode-range: U+0025-00FF, U+0131, U+0152-0153;

/* Supplementary plane (5+ hex digits) */
unicode-range: U+1F300-1F9FF;  /* emoji ranges */

Google Fonts Subsetting Model

Google Fonts is the most visible implementation of unicode-range subsetting in production. When you embed a Google Font, the CSS response contains multiple @font-face declarations — one per subset — each with a precise unicode-range. The browser then downloads only the subsets needed for the current page's content.

Inspect a Google Fonts CSS URL to see this in action:

/* Example from Google Fonts CSS API (simplified) */

/* Devanagari subset — only downloaded for pages with Devanagari text */
@font-face {
  font-family: 'Noto Sans';
  src: url('noto-sans-v26-devanagari-regular.woff2') format('woff2');
  unicode-range: U+0900-097F, U+1CD0-1CF6, U+1CF8-1CF9, U+200C-200D,
                 U+20A8, U+20B9, U+25CC, U+A830-A835, U+A8E0-A8F7;
}

/* Latin extended subset */
@font-face {
  font-family: 'Noto Sans';
  src: url('noto-sans-v26-latin-ext-regular.woff2') format('woff2');
  unicode-range: U+0100-024F, U+0259, U+1E00-1EFF, U+2020, U+20A0-20AB,
                 U+20AD-20CF, U+2113, U+2C60-2C7F, U+A720-A7FF;
}

/* Latin core subset — always needed for most pages */
@font-face {
  font-family: 'Noto Sans';
  src: url('noto-sans-v26-latin-regular.woff2') format('woff2');
  unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+02C6,
                 U+02DA, U+02DC, U+2000-206F, U+2074, U+20AC, U+2122,
                 U+2191, U+2193, U+2212, U+2215, U+FEFF, U+FFFD;
}

Creating Your Own Font Subsets

Using fonttools (Python)

fonttools is the standard library for font manipulation:

pip install fonttools brotli zopfli
from fontTools import subset

# Define which characters to include (by unicode range)
options = subset.Options()
options.set(
    unicodes='U+0000-00FF,U+0131,U+0152-0153,U+02BB-02BC,U+20AC',
    flavor='woff2',
    desubroutinize=True,
    no_hinting=False,  # keep hinting for screen rendering
)

with subset.load_font('MyFont-Regular.ttf', options) as font:
    subsetter = subset.Subsetter(options=options)
    subsetter.populate(unicodes=options.unicodes)
    subsetter.subset(font)
    subset.save_font(font, 'MyFont-Latin.woff2', options)

From the command line:

# Create a Latin subset
pyftsubset MyFont-Regular.ttf \
  --output-file=MyFont-Latin.woff2 \
  --flavor=woff2 \
  --unicodes="U+0000-00FF,U+0131,U+0152-0153,U+02BB-02BC,U+20AC" \
  --layout-features='*'  # preserve OpenType features

# Create a symbols-only subset
pyftsubset MyFont-Regular.ttf \
  --output-file=MyFont-Symbols.woff2 \
  --flavor=woff2 \
  --unicodes="U+2600-26FF,U+2700-27BF"

Common Subset Ranges

Subset Unicode Range Approx. Glyphs
Latin Basic U+0020-007E 95
Latin-1 Supplement U+00A0-00FF 96
Latin Extended U+0100-024F + ~300
Cyrillic U+0400-04FF ~256
Greek U+0370-03FF ~144
Arabic U+0600-06FF ~256
Devanagari U+0900-097F ~128
CJK Unified (core) U+4E00-9FFF ~20,000
Hiragana U+3040-309F ~96
Katakana U+30A0-30FF ~96
Emoji (basic) U+1F300-1F9FF ~1,800
Mathematical Operators U+2200-22FF ~256
Arrows U+2190-21FF ~112
Miscellaneous Symbols U+2600-26FF ~256

CJK: The Hard Problem

CJK fonts are the most challenging to subset because a single language may require thousands of glyphs. Strategies:

Progressive loading with font-display

@font-face {
  font-family: 'NotoSansCJK';
  src: url('noto-sans-cjk-common.woff2') format('woff2');
  unicode-range: U+4E00-9FFF;   /* Common CJK — 6,800 most frequent */
  font-display: swap;            /* show fallback immediately, swap when loaded */
}

@font-face {
  font-family: 'NotoSansCJK';
  src: url('noto-sans-cjk-rare.woff2') format('woff2');
  unicode-range: U+3400-4DBF, U+20000-2A6DF;  /* Extension A + B */
  font-display: optional;  /* use only if already cached */
}

Glyphhanger for automatic subsetting

glyphhanger scans your pages and creates subsets containing only the glyphs actually used:

npm install -g glyphhanger

# Scan a URL and generate a subset
glyphhanger https://example.com --subset=MyFont.ttf --formats=woff2

# Scan multiple URLs
glyphhanger https://example.com https://example.com/about --subset=MyFont.ttf

# Use with a local HTML file
glyphhanger ./dist --subset=MyFont.ttf --formats=woff2,woff

font-display and Loading Strategy

The font-display descriptor controls what the browser shows while a font is loading:

@font-face {
  font-family: 'MyFont';
  src: url('myfont.woff2') format('woff2');
  unicode-range: U+0000-00FF;
  font-display: swap;      /* fallback → custom when loaded (FOUT) */
}
Value Block Period Swap Period Use Case
auto Browser default Legacy default
block Short (~3s) Infinite Custom icons (invisible fallback better than wrong glyph)
swap None Infinite Body text (FOUT acceptable)
fallback Very short (~100ms) Short (~3s) Best for performance
optional Very short None Load only if already cached

For most text fonts: font-display: swap or font-display: fallback. For icon fonts where the fallback character would be meaningless or misleading: font-display: block.

Font Fallback and Tofu

When no loaded font has a glyph for a character, the browser falls back through the font stack. If no font in the stack covers the character, the browser renders a tofu box (□ or ▯) — a rectangular placeholder indicating a missing glyph.

/* Comprehensive fallback stack */
body {
  font-family:
    'Inter',              /* primary web font — Latin */
    'Noto Sans CJK SC',   /* CJK supplement */
    'Noto Sans Arabic',   /* Arabic supplement */
    'Noto Sans Devanagari', /* Devanagari supplement */
    system-ui,            /* OS default UI font */
    -apple-system,        /* macOS/iOS San Francisco */
    'Segoe UI',           /* Windows */
    sans-serif;           /* ultimate fallback */
}

The OS system fonts (system-ui, -apple-system, Segoe UI) have broad Unicode coverage. Including them before the bare sans-serif means most characters will render correctly even without a custom web font.

For an application that must display arbitrary Unicode without tofu, Noto (Google's "No Tofu" font family) covers virtually all scripts:

/* Nuclear option: load only what you use */
@font-face {
  font-family: 'Noto Sans';
  src: url('noto-sans-latin.woff2') format('woff2');
  unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+20AC;
}

/* Let system fonts handle everything else */
body {
  font-family: 'Noto Sans', system-ui, sans-serif;
}

Variable Fonts and Unicode Range

Variable fonts (OpenType 1.8+) consolidate multiple weights and styles into a single file. Combined with unicode-range, they offer significant savings:

/* One variable font file covers all weights — subset by script */
@font-face {
  font-family: 'InterVariable';
  src: url('inter-latin.var.woff2') format('woff2-variations');
  unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC, U+20AC;
  font-weight: 100 900;  /* entire weight axis */
  font-style: oblique 0deg 10deg;
}

@font-face {
  font-family: 'InterVariable';
  src: url('inter-cyrillic.var.woff2') format('woff2-variations');
  unicode-range: U+0400-045F, U+0490-0491, U+04B0-04B1;
  font-weight: 100 900;
}

Variable fonts are not automatically smaller than static fonts. A variable font covering 100 weights may be larger than the two static weights you actually use. Measure before switching.

Icon Fonts vs. SVG Symbols

Icon fonts are a specific use case for unicode-range — they map arbitrary code points (often in the Private Use Area, U+E000–U+F8FF) to icon glyphs:

@font-face {
  font-family: 'MyIcons';
  src: url('icons.woff2') format('woff2');
  unicode-range: U+E000-E0FF;  /* Private Use Area subset for icons */
  font-display: block;         /* avoid showing wrong fallback glyph */
}

.icon-home::before  { font-family: 'MyIcons'; content: "\E001"; }
.icon-user::before  { font-family: 'MyIcons'; content: "\E002"; }
.icon-search::before { font-family: 'MyIcons'; content: "\E003"; }

However, SVG symbols or inline SVG have largely superseded icon fonts for new projects: - SVG icons are resolution-independent and style with CSS color / fill - No font loading required — no FOIT/FOUT risk for icons - Better accessibility (can have <title> and aria-label) - No Private Use Area encoding hackery

If you maintain an existing icon font, unicode-range is still valuable to prevent the icon font from loading on pages that do not use any icons.

Measuring Font Subset Impact

Before and after subsetting, measure the actual file sizes and transfer savings:

# Check original font size
ls -lh MyFont-Regular.ttf
# 235K  MyFont-Regular.ttf

# Create Latin subset
pyftsubset MyFont-Regular.ttf \
  --output-file=MyFont-Latin.woff2 \
  --flavor=woff2 \
  --unicodes="U+0000-00FF,U+0131,U+0152-0153,U+02BB-02BC,U+20AC" \
  --layout-features='*'

ls -lh MyFont-Latin.woff2
# 18K  MyFont-Latin.woff2  — 92% reduction for Latin-only pages

# Verify the subset contains what you need
python3 -c "
from fontTools.ttLib import TTFont
font = TTFont('MyFont-Latin.woff2')
cmap = font.getBestCmap()
test_chars = 'ABCabc€£áéíóú'
for c in test_chars:
    cp = ord(c)
    status = 'OK' if cp in cmap else 'MISSING'
    print(f'U+{cp:04X} {c}  {status}')
"

Self-Hosting vs. Google Fonts

Google Fonts provides pre-subsetted, pre-compressed WOFF2 files with optimized unicode-range declarations. The trade-offs:

Factor Google Fonts Self-Hosted
Setup effort Minimal Manual subsetting required
Subset quality Excellent (curated) Depends on your tooling
Privacy Third-party requests No external requests
Cache sharing Possible (same URL across sites) Per-site cache only
Control None Full
Reliability Dependent on CDN Dependent on your CDN

For privacy-sensitive applications (GDPR compliance) or offline-capable apps, self-hosting is necessary. For most public-facing sites, Google Fonts' subsetting quality is hard to beat without significant tooling investment.

To self-host Google Fonts with their subsetting preserved, use the google-webfonts-helper service to download pre-subsetted files with the corresponding CSS.

Performance Checklist

Before shipping custom web fonts:

  1. Serve WOFF2 format — it has the best compression (20-30% smaller than WOFF)
  2. Use unicode-range to split large fonts into subsets
  3. Preload critical subsets: html <link rel="preload" as="font" type="font/woff2" href="/fonts/myfont-latin.woff2" crossorigin>
  4. Set appropriate font-displayswap for text, block for icon fonts
  5. Verify no tofu with the SymbolFYI Character Counter — paste your page's full text to identify which Unicode blocks are needed
  6. Use glyphhanger or fonttools to remove unused glyphs from subset files
  7. Cache fonts aggressively — they are immutable (Cache-Control: max-age=31536000, immutable)
  8. For icon fonts, consider migrating to inline SVG or SVG sprites
  9. Measure variable font vs. static font trade-offs before assuming variable is smaller

Next in Series: Character Encoding Detection: How to Identify Unknown Text Encoding — BOM sniffing, statistical detection with chardet, when detection fails, and the hierarchy of signals for determining encoding.

Simbol Terkait

Glosarium Terkait

Alat Terkait

Panduan Lainnya