SymbolFYI

CJK Web Typography: Chinese, Japanese, and Korean Text on the Web

Building a site that reads as naturally in Tokyo or Shanghai as it does in New York requires more than translation. Chinese, Japanese, and Korean (CJK) scripts have fundamentally different typographic rules — their line-breaking behavior, spacing conventions, punctuation handling, and vertical layout support are all distinct from Latin typography. Getting these details right signals respect for the reader's language and noticeably improves readability.

CJK Font Stacks

The first challenge is font selection. Most operating systems ship reasonable CJK system fonts, but the defaults vary considerably in quality and coverage. A well-constructed font stack uses the best available font on each platform:

Chinese (Simplified)

body {
  font-family:
    /* macOS 10.11+ / iOS 9+ */
    "PingFang SC",
    /* Windows 8.1+ */
    "Microsoft YaHei UI",
    "Microsoft YaHei",
    /* Android */
    "Source Han Sans SC",
    "Noto Sans CJK SC",
    /* Generic fallback */
    sans-serif;
}

Chinese (Traditional)

body {
  font-family:
    /* macOS / iOS */
    "PingFang TC",
    /* Windows */
    "Microsoft JhengHei UI",
    "Microsoft JhengHei",
    /* Web font */
    "Noto Sans CJK TC",
    sans-serif;
}

Japanese

body {
  font-family:
    /* macOS 10.9+ / iOS 7+ */
    "Hiragino Kaku Gothic ProN",
    "Hiragino Sans",
    /* Windows */
    "Meiryo UI",
    "Meiryo",
    /* Google Fonts alternative */
    "Noto Sans JP",
    sans-serif;
}

Korean

body {
  font-family:
    /* macOS / iOS */
    "Apple SD Gothic Neo",
    /* Windows */
    "Malgun Gothic",
    /* Google Fonts alternative */
    "Noto Sans KR",
    sans-serif;
}

script-specific-fonts">Using lang Attribute for Script-Specific Fonts

The cleanest approach is to use the lang attribute with distinct font declarations, since CJK fonts can contain glyphs for multiple scripts but render them differently:

:lang(zh-hans), :lang(zh-CN) {
  font-family: "PingFang SC", "Microsoft YaHei", "Noto Sans CJK SC", sans-serif;
}

:lang(zh-hant), :lang(zh-TW), :lang(zh-HK) {
  font-family: "PingFang TC", "Microsoft JhengHei", "Noto Sans CJK TC", sans-serif;
}

:lang(ja) {
  font-family: "Hiragino Kaku Gothic ProN", "Hiragino Sans", "Meiryo", "Noto Sans JP", sans-serif;
}

:lang(ko) {
  font-family: "Apple SD Gothic Neo", "Malgun Gothic", "Noto Sans KR", sans-serif;
}

Ensure your HTML elements carry the correct lang attribute — this affects not only font selection but also screen reader pronunciation, spell-check behavior, and quotation mark styling.

Line Breaking in CJK Text

How CJK Line Breaking Differs

In Latin text, word spaces (U+0020) are the primary line-break opportunities. CJK text has no word spaces — a line break can occur between any two adjacent CJK characters (with some exceptions for punctuation). This means browser engines must use a different algorithm.

/* Default: CJK chars break between themselves, Latin respects word boundaries */
p:lang(ja) {
  word-break: normal;
  overflow-wrap: normal;
}

For Japanese and Chinese, word-break: normal is almost always what you want. It allows breaks between CJK characters while keeping Latin words intact.

word-break: break-all breaks Latin words mid-syllable — it is occasionally appropriate for mixed-content contexts where you need to prevent overflow, but it harms Latin readability. Prefer overflow-wrap: break-word which breaks Latin only when the line would overflow.

word-break: keep-all prevents line breaks between CJK characters, forcing the engine to treat them as space-delimited words (which they are not, in Chinese/Japanese). This is rarely useful and produces very poor line breaking in most CJK prose.

Prohibited Break Positions

Japanese typography (JIS X 4051) and Chinese (GB/T 15834) define characters that must not appear at the start or end of a line. Leading prohibition characters include most closing punctuation: 」』)】,。!?. Trailing prohibition characters include opening punctuation: 「『(【.

Modern browsers implement these rules via the line-break property:

/* Default: standard JIS X 4051 prohibition rules */
p:lang(ja) {
  line-break: normal;
}

/* Stricter: more characters prohibited at line ends */
p:lang(ja) {
  line-break: strict;
}

/* Looser: allow breaking even before closing punctuation (for short lines) */
.narrow-column:lang(ja) {
  line-break: loose;
}

/* Most permissive: only prohibited at punctuation specified in Unicode */
.very-narrow:lang(ja) {
  line-break: anywhere;
}

For Chinese text, line-break: normal applies the equivalent GB/T punctuation rules. The property is universally supported in modern browsers.

Ruby Annotation

Ruby text (also called furigana in Japanese) places small phonetic guide characters above or beside CJK base characters. It is essential for educational content, children's books, texts containing rare kanji, and certain bilingual layouts.

<!-- Basic ruby: kanji with hiragana reading -->
<ruby><rt>かん</rt><rt></rt></ruby>

<!-- Ruby with fallback for non-supporting browsers (parentheses) -->
<ruby>東京<rp>(</rp><rt>とうきょう</rt><rp>)</rp></ruby>

<!-- Ruby for Chinese: hanzi with pinyin -->
<ruby><rt>hàn</rt><rt></rt></ruby>

<!-- Ruby for Korean: hanja with hangul -->
<ruby><rt></rt><rt></rt></ruby>

CSS controls ruby positioning and text size:

ruby {
  ruby-align: center;          /* or space-around, space-between, start */
}

rt {
  font-size: 0.5em;            /* Typically half the base text size */
  text-emphasis: none;         /* Prevent emphasis marks on ruby text */
}

/* Ruby below the base text (used in some horizontal Japanese layouts) */
ruby {
  ruby-position: under;
}

For complex ruby (where the phonetic run spans multiple base characters), use <rb> elements explicitly:

<ruby>
  <rb></rb><rb></rb>
  <rt>とう</rt><rt>きょう</rt>
</ruby>

Browser support for all ruby features is strong in Chrome and Safari; Firefox has full support from version 38.

Vertical Writing

Japanese, traditional Chinese, and classical Korean text is historically written top-to-bottom, right-to-left. CSS Writing Modes Level 3 brings this to the web with full browser support:

/* Vertical text, columns run right to left */
.vertical-text {
  writing-mode: vertical-rl;
}

/* Vertical text, columns run left to right (used in some Mongolian contexts) */
.vertical-text-lr {
  writing-mode: vertical-lr;
}

/* Rotate Latin characters upright within vertical flow */
.vertical-text {
  text-orientation: mixed;    /* default: CJK upright, Latin rotated 90° */
}

.vertical-text-upright {
  text-orientation: upright;  /* all characters upright */
}

/* Rotate entire text 90° — useful for table headers */
.sideways {
  writing-mode: vertical-rl;
  text-orientation: sideways;
}

When writing-mode: vertical-rl is applied, the logical properties map differently: what was width becomes the block dimension, margin-top and margin-bottom become inline margins, and padding-left and padding-right control block spacing. Use logical properties to avoid confusion:

.vertical-text {
  writing-mode: vertical-rl;
  /* Use logical properties instead of physical top/right/bottom/left */
  block-size: 100%;           /* height in horizontal mode */
  inline-size: auto;          /* width in horizontal mode */
  padding-block: 1rem;        /* top/bottom in horizontal mode */
  padding-inline: 0.5rem;     /* left/right in horizontal mode */
}

Vertical text has specific line-break rules: punctuation that normally sits at the end of a line rotates or is replaced by vertical-specific glyphs in well-designed CJK fonts.

CSS text-spacing-trim and JIS Punctuation Optimization

Japanese punctuation characters (、。「」!?) are full-width glyphs by design, but when they appear adjacent to other punctuation or at the start/end of a line, the built-in spacing creates excessive gaps. JIS X 4051 specifies kerning rules to collapse this spacing.

The text-spacing-trim property (CSS Text Level 4) automates this:

/* Apply JIS X 4051 spacing for Japanese typography */
p:lang(ja) {
  text-spacing-trim: trim-start;
  /* or: trim-end | trim-both | space-first | allow-end | auto */
}

text-autospace handles the complementary problem of mixed Latin-CJK spacing:

/* Insert thin space between CJK and Latin/numeric characters */
p:lang(ja) {
  text-autospace: ideograph-alpha ideograph-numeric;
}

These are cutting-edge CSS properties. As of 2024, text-spacing-trim has landed in Chrome and Safari; text-autospace is in development. Always test and provide reasonable fallback rendering, since the properties degrade gracefully when unsupported.

Mixed Latin-CJK Spacing

When Latin text (words, numbers, units) appears inline with CJK characters, spacing between scripts should be slightly larger than zero but smaller than a word space. The traditional approach uses manual thin spaces; the modern approach uses text-autospace.

Without text-autospace support, you can use a CSS approach:

/* Detect and add inter-script spacing using adjacent sibling selectors */
/* This requires wrapping script transitions in spans */
.latin + .cjk,
.cjk + .latin {
  margin-inline-start: 0.15em;
}

In practice, many Japanese publishers simply accept the default rendering and rely on the font's glyph metrics to provide adequate visual separation. Avoid inserting manual NBSP or thin spaces into CJK content — it creates inconsistent spacing that is impossible to maintain as content is edited.

Use our Character Counter tool to analyze mixed-script text and identify which scripts are present and where script boundaries occur.

Full-Width vs Half-Width Characters

CJK character sets include full-width (全角, zenkaku) and half-width (半角, hankaku) variants of ASCII punctuation, digits, and Latin letters. These exist in the Halfwidth and Fullwidth Forms block (U+FF01–U+FFEF):

Character Half-Width Full-Width
A A (U+0041) A (U+FF21)
1 1 (U+0031) 1 (U+FF11)
! ! (U+0021) ! (U+FF01)
( ( (U+0028) ( (U+FF08)

Avoid full-width variants for Latin content. They exist for legacy JIS encoding compatibility. Modern practice uses standard ASCII for Latin characters and numbers even in CJK text, and relies on font metrics and text-autospace to handle spacing between scripts.

If you receive content that contains full-width Latin characters (common in legacy Japanese databases), normalize them:

// Normalize full-width Latin to standard ASCII
function normalizeFullWidth(str) {
  return str.replace(/[\uFF01-\uFF5E]/g, (char) => {
    return String.fromCodePoint(char.codePointAt(0) - 0xFF01 + 0x21);
  }).replace(/\u3000/g, ' '); // Ideographic space → regular space
}

Font Subsetting for CJK

CJK fonts contain tens of thousands of glyphs. A complete CJK font is typically 5–20 MB — completely impractical to serve as a web font without subsetting.

Strategies:

Unicode-range subsetting: Load only the characters actually used in your content.

@font-face {
  font-family: "MyJapaneseFont";
  src: url("myfont-ja.woff2") format("woff2");
  unicode-range:
    U+3000-303F,   /* CJK Symbols and Punctuation */
    U+3040-309F,   /* Hiragana */
    U+30A0-30FF,   /* Katakana */
    U+4E00-9FAF;   /* CJK Unified Ideographs (common) */
}

Google Fonts handles subsetting automatically for CJK — it serves character-level subsets based on what characters are actually present on the page, using the text= API parameter technique. For Japanese, font-display: swap with Noto Sans JP is a practical starting point:

<link rel="preconnect" href="https://fonts.googleapis.com">
<link href="https://fonts.googleapis.com/css2?family=Noto+Sans+JP:wght@400;700&display=swap" rel="stylesheet">

Self-hosted subsetting with pyftsubset (from the fonttools package) allows you to generate a minimal font file containing only the glyphs you need:

pyftsubset NotoSansJP-Regular.ttf \
  --text-file=page-content.txt \
  --layout-features=* \
  --flavor=woff2 \
  --output-file=NotoSansJP-subset.woff2

For sites with dynamic CJK content, Google Fonts' on-demand subsetting is usually the most practical option. For sites with a fixed CJK character set (product catalogs, marketing pages), self-hosted subsets give you full control over file sizes and loading behavior.

Practical Checklist for CJK Typography

Item Recommendation
Font stack PingFang / Hiragino / Apple SD + Windows fallbacks + Noto
lang attribute Set on <html> and on any element that changes script
Line breaking word-break: normal for Japanese/Chinese prose
Punctuation rules line-break: strict for Japanese body text
Ruby Use <ruby> with <rp> fallback for furigana
Vertical text writing-mode: vertical-rl with logical properties
Inter-script spacing Use text-autospace where supported; manual thin spaces otherwise
Full-width normalization Normalize legacy full-width Latin before storage
Web font loading Unicode-range subsetting or Google Fonts on-demand subsets

Next in Series: Part 4 explores box drawing characters — the Unicode block that lets you build tables, borders, and text-based UIs directly in HTML, with copy-paste reference tables and CSS alignment tips. Box Drawing Characters: Building Text-Based UI with Unicode

Símbolos relacionados

Glossário relacionado

Ferramentas relacionadas

Mais guias