SymbolFYI

Tofu: Why Characters Show as Empty Rectangles and How to Fix It

How-To Tháng 2 11, 2025

You've seen it: a small empty rectangle □ or a filled square ▯ or a box with an X in it ⍰ appearing in the middle of otherwise normal text. That placeholder is called tofu — named after the blank white cubes of tofu food, because the rectangles look like little white blocks. Tofu appears when a rendering system encounters a Unicode code point but cannot find a glyph for it in any available font.

Understanding tofu — what causes it, how to diagnose it, and how to fix it — is essential for anyone who works with multilingual content, emoji, or technical symbols on the web.

What Tofu Is

Tofu is a rendering placeholder, not an encoding error. The character exists in the document and is correctly encoded. The rendering system knows what code point it's dealing with. It just can't find a way to draw it.

The placeholder shape varies by operating system and rendering engine:

  • — the classic white rectangle, common on older systems and plain-text terminals
  • — a black outlined rectangle, common in newer system fonts
  • with hex digits inside — some fonts show the code point in a small box (e.g., on macOS for some rare code points)
  • ? in a box — used by some rendering systems for unknown characters
  • (U+FFFD, REPLACEMENT CHARACTER) — used specifically when encoding errors occur, not font missing glyphs

The last one is important: tofu (a rendering issue) and the replacement character (an encoding issue) look similar but mean different things. Tofu means "this character is valid but I can't draw it." The replacement character means "this byte sequence is invalid in this encoding."

The .notdef Glyph

In font technology, the glyph that a font shows when it cannot render a requested character is called the .notdef glyph (the "not defined" glyph). Every properly constructed font must include a .notdef glyph by the OpenType specification, though the specification is flexible about what it looks like.

The .notdef glyph is what you see as tofu. Different fonts implement it differently:

  • Many system fonts use a blank rectangle
  • Some use a box with the hex code point
  • Some use nothing (an advance width with no ink) — which is invisible tofu, even harder to notice
  • Google's Noto fonts use "Noto" (short for "No Tofu") as the project name — the explicit goal of eliminating the .notdef glyph by providing font coverage for every Unicode character

When a rendering system needs to draw a character, it searches through the available fonts in order. Only if no font in the fallback chain has a glyph for that code point does the .notdef of the first font appear.

Causes of Tofu

Missing Font Glyph (Most Common)

The character exists in Unicode, your document encodes it correctly, but no font currently accessible to the rendering system contains a glyph for it. This is the classic cause.

Unicode has over 154,000 assigned code points. No single font covers all of them — typical system fonts cover a few thousand to tens of thousands of code points. Highly specialized characters like ancient scripts (Linear B, Cuneiform, Old Persian), mathematical symbols in obscure blocks, or newly added emoji may not be present in common system fonts.

Font Loaded But Unicode Range Not Covered

A web font may be loaded and active for the main text, but explicitly limited to a specific Unicode range using the @font-face unicode-range descriptor. Characters outside that range fall through to the next font in the stack. If the fallback font also lacks coverage, tofu appears.

Emoji Version Mismatch

Emoji are added to Unicode in annual releases. An emoji added in Unicode 15.1 (released late 2023) may not render on operating systems that shipped before that emoji set was incorporated into system fonts. On older Android versions, Windows 10 without updates, or older macOS, recent emoji sequences appear as tofu.

Emoji ZWJ sequences (multiple emoji joined by U+200D) are particularly affected. A ZWJ sequence that appears as a single unified emoji on one platform may render as a series of separate emoji on another, or as a combination of emoji and visible ZWJ boxes on older systems.

Private Use Area Characters

The Private Use Area (PUA) — code points U+E000–U+F8FF in the BMP, plus supplementary PUAs — is reserved for applications to define their own characters. Icon fonts like Font Awesome use PUA code points to assign glyphs to icons. These characters have no standard glyph: without the specific icon font loaded, PUA characters render as tofu.

PUA characters in content that gets separated from its associated icon font — copied text, scraped content, search result snippets — will always produce tofu on systems without that font.

Wrong Encoding (Produces Replacement Character, Not Tofu)

When bytes are interpreted with the wrong character encoding — Latin-1 for UTF-8 content, or vice versa — the resulting code points may be undefined or produce the Unicode replacement character U+FFFD (�). This looks similar to tofu but has a different cause: the byte data is being misinterpreted. The Encoding Converter at /tools/encoding-converter/ helps diagnose these byte-level issues.

System Font Coverage Differences

Different operating systems ship with different font collections, which means the same HTML can render differently across platforms.

macOS

macOS ships with extensive system fonts and a sophisticated font fallback system. The system automatically selects from fonts including: - San Francisco (UI font) — Latin, some symbols - Helvetica Neue, Arial — broad Latin coverage - PingFang SC/TC/HK — CJK (Chinese) - Hiragino Sans, Kaku Gothic — Japanese - Apple SD Gothic Neo — Korean - Arial Unicode MS — broad fallback coverage - Apple Color Emoji — emoji

macOS typically renders very little tofu for common scripts. It may show tofu for rare historical scripts and some newly added Unicode blocks that haven't been incorporated into system fonts.

Windows

Windows ships with a smaller font selection by default, though it has improved significantly with each release: - Segoe UI — primary UI font - Segoe UI Emoji — emoji - Yu Gothic — Japanese - Microsoft JhengHei — Traditional Chinese - Microsoft YaHei — Simplified Chinese - Malgun Gothic — Korean - Nirmala UI — Devanagari, Bengali, Tamil, and other Indic scripts - Leelawadee UI — Thai, Lao

Older Windows versions (pre-10 without cumulative updates) have more limited emoji coverage and more tofu for recent emoji additions. Windows generally has less coverage than macOS for non-Latin scripts not in its default font set.

Linux

Linux distributions vary enormously in font coverage depending on which fonts are installed by default. Minimal installations may only have a single fallback font covering Latin characters. The solution is usually installing the relevant script-specific fonts from package repositories (fonts-noto-cjk, fonts-noto-extra, etc.).

Android and iOS

Mobile operating systems update their system emoji periodically but may lag behind the latest Unicode release by a year or more. Apps can bundle custom fonts to avoid this gap. iOS benefits from the same Apple Color Emoji font as macOS; Android uses Noto Color Emoji, which is similarly comprehensive.

Diagnosing Tofu

When you encounter tofu in a browser, the diagnosis process is:

Step 1: Identify the Character

Right-click the text area in Chrome or Firefox and inspect the element. In the DevTools Elements panel, look at the text content of the node. Copy the tofu character from the rendered text — it should copy as the actual Unicode character.

Paste the copied character into the Character Analyzer at /tools/character-counter/. The breakdown table will show you the exact code point, Unicode name, block, and category — even for a character that renders as tofu on your system.

Step 2: Check Font Coverage

With the code point in hand, check whether the fonts in your CSS font stack include that code point:

  1. Open DevTools, select the element, go to the Computed tab, and look at the font-family value
  2. Check each font in the stack using a font inspector tool or by looking at the font's Unicode coverage tables
  3. The font actually used for that character is shown in the DevTools Fonts tab in Chrome (under the Styles pane when a text element is selected)

Step 3: Determine If It's a Font Issue or Data Issue

If the character analyzer shows the code point is a valid, assigned Unicode character in a standard block, it's a font coverage issue. If the code point is in a Private Use Area, it's a custom glyph that requires the specific font that defines it. If the character shows as U+FFFD (replacement character), it's an encoding error in the data, not a font issue.

Building a Robust CSS Font Stack

The primary tool for preventing tofu on the web is a well-constructed CSS font stack with appropriate fallbacks.

The font-family Cascade

CSS font-family accepts a comma-separated list. The browser tries each font in order, selecting the first one available on the system. For any character not covered by the selected font, it continues to the next font in the list:

body {
  font-family:
    "Your Custom Font",   /* 1. Preferred font (may not cover everything) */
    -apple-system,        /* 2. macOS/iOS system font */
    BlinkMacSystemFont,   /* 2. Chrome on macOS */
    "Segoe UI",           /* 2. Windows */
    Roboto,               /* 2. Android */
    "Helvetica Neue",     /* 3. macOS fallback */
    Arial,                /* 4. Universal sans-serif */
    sans-serif;           /* 5. Browser default sans-serif */
}

This stack works well for Latin text. For multilingual content, you need to explicitly include CJK and other script fonts, or rely on system defaults being sufficient.

Including Noto Fonts as Universal Fallback

Google's Noto font family (from "No Tofu") is designed to cover all of Unicode. The family is massive — over 100 individual font files for different scripts — but you can include specific subsets or the full family depending on your needs.

For web use, loading the full Noto family would be impractical due to size. Instead, include Noto fonts for the specific scripts your content uses:

/* Via Google Fonts — Noto Sans for multiple scripts */
@import url('https://fonts.googleapis.com/css2?family=Noto+Sans:wght@400;700&display=swap');
@import url('https://fonts.googleapis.com/css2?family=Noto+Sans+JP&display=swap');
@import url('https://fonts.googleapis.com/css2?family=Noto+Sans+KR&display=swap');
@import url('https://fonts.googleapis.com/css2?family=Noto+Sans+SC&display=swap');

Then include these in your font stack as fallbacks after your primary font:

body {
  font-family:
    "Source Sans 3",
    "Noto Sans",
    "Noto Sans JP",
    "Noto Sans KR",
    "Noto Sans SC",
    sans-serif;
}

For a single comprehensive fallback, "Noto Sans" alone covers Latin, Cyrillic, Greek, and many other common scripts. Script-specific Noto fonts fill in the gaps for CJK and more specialized scripts.

@font-face unicode-range for Targeted Loading

The unicode-range descriptor in @font-face tells the browser to only load a font file when the page contains characters in the specified Unicode ranges. This enables efficient loading of script-specific fonts: users reading English-only content don't download the CJK font; users reading Japanese content get the Japanese font automatically.

@font-face {
  font-family: "Noto Sans CJK";
  src: url("NotoSansJP.woff2") format("woff2");
  unicode-range: U+3000-9FFF, U+F900-FAFF, U+FF00-FFEF;
}

@font-face {
  font-family: "Noto Sans Arabic";
  src: url("NotoSansArabic.woff2") format("woff2");
  unicode-range: U+0600-06FF, U+0750-077F, U+08A0-08FF;
}

The browser only downloads these font files when it encounters characters in the specified ranges. For pages with primarily Latin content, this means zero overhead for the CJK or Arabic fonts. For pages with mixed-script content, the right fonts load automatically.

This technique is exactly how Google Fonts serves its international font subsets — each language's font file is loaded only when needed, based on the characters present on the page.

Debugging Tofu in Browser DevTools

Chrome DevTools offers the most detailed font debugging tools:

  1. Open DevTools (F12)
  2. Select an element that contains tofu
  3. In the Styles panel on the right, look for the Fonts section (may need to scroll down or expand)
  4. The Fonts section lists which font was actually used for the selected element's text
  5. If tofu appears, the Fonts section may show the fallback font and indicate that the character was rendered using the system's last-resort font

In the Elements panel, the Computed tab shows the resolved font-family value after all CSS cascading. Compare this list against the actual fonts available on your test system.

For production debugging, the Chrome Lighthouse audit includes font coverage checks. The browser's rendering process logs warnings to the console when it falls back to the system default font due to character coverage failures — enable verbose logging to see these.

Summary: Eliminating Tofu

Tofu appears when the Unicode code point is valid but no available font has a glyph for it. The solution is always to ensure appropriate fonts are available at rendering time:

  • For web projects: build a comprehensive CSS font stack with script-appropriate Noto font fallbacks; use unicode-range for efficient loading
  • For content with emoji: accept that very new emoji may not render on older OS versions; consider emoji image fallbacks for critical emoji
  • For icon fonts: ensure the icon font is always bundled with the content; avoid copy-pasting icon font characters out of context
  • For applications: embed fonts for the scripts you support; on mobile, test on older OS versions

To identify what character is hiding behind tofu, the Character Analyzer at /tools/character-counter/ will reveal the code point even when your font can't render it. Once you know the code point, the Encoding Converter at /tools/encoding-converter/ shows the full Unicode properties and helps you identify which Unicode block and script the character belongs to — useful information for choosing the right font.

Ký hiệu liên quan

Thuật ngữ liên quan

Công cụ liên quan

Thêm hướng dẫn