SymbolFYI

Combining Character

Unicode Standard

Definition

A Unicode character that modifies the preceding base character, such as accents and diacritical marks.

What Is a Combining Character?

A combining character is a Unicode character that has no standalone visual representation — instead, it attaches to and modifies the appearance of the preceding base character. Combining characters are used to add diacritical marks (accents, umlauts, cedillas), phonetic annotations, mathematical decorations, and other visual modifications to base glyphs.

For example, the sequence e (U+0065) followed by ◌́ (U+0301, combining acute accent) produces é. This is visually identical to the precomposed form é (U+00E9), but consists of two separate code points.

Unicode General Category for Combining Characters

Combining characters belong to the Mark (M) general category:

Code	Name	Description
`Mn`	Nonspacing Mark	Positioned relative to base, takes no width (e.g., accents)
`Mc`	Spacing Combining Mark	Takes up space next to the base (common in South Asian scripts)
`Me`	Enclosing Mark	Encloses the base character (e.g., combining enclosing circle)

Common Combining Characters

Code Point	Name	Example
`U+0301`	Combining Acute Accent	á, é, í, ó, ú
`U+0300`	Combining Grave Accent	à, è, ì
`U+0308`	Combining Diaeresis	ä, ë, ï
`U+0327`	Combining Cedilla	ç, ş
`U+0303`	Combining Tilde	ã, ñ
`U+20D7`	Combining Right Arrow Above	vector notation

Normalization: NFC vs. NFD

Unicode provides multiple normalization forms to handle the equivalence between precomposed and decomposed representations:

NFD (Canonical Decomposition): Decomposes precomposed characters into base + combining sequences. é → e + ◌́
NFC (Canonical Composition): Composes base + combining sequences into precomposed forms where available. e + ◌́ → é

import unicodedata

eacute_precomposed = '\u00E9'    # é (precomposed)
eacute_decomposed  = 'e\u0301'  # e + combining acute

print(len(eacute_precomposed))    # 1
print(len(eacute_decomposed))     # 2
print(eacute_precomposed == eacute_decomposed)  # False!

# Normalize to NFC for string comparison
nfc_a = unicodedata.normalize('NFC', eacute_precomposed)
nfc_b = unicodedata.normalize('NFC', eacute_decomposed)
print(nfc_a == nfc_b)             # True

# Check if a character is a combining mark
for char in eacute_decomposed:
    cat = unicodedata.category(char)
    print(f'{repr(char)}: {cat}')  # 'e': Ll, '\u0301': Mn

// Normalize strings before comparing
const a = '\u00E9';    // precomposed é
const b = 'e\u0301';  // decomposed e + combining acute
console.log(a === b);                     // false
console.log(a.normalize('NFC') === b.normalize('NFC'));  // true

// Split into grapheme clusters (not just code points)
const segmenter = new Intl.Segmenter();
console.log([...segmenter.segment(b)].length);  // 1 (one grapheme)

Stacking Combining Marks

Multiple combining characters can be applied to a single base character, stacking on top of or below it. For example, q̈̈ could have both a tilde and a diaeresis stacked above q. The order of combining characters can matter for rendering: Unicode specifies Canonical Combining Class values (0–254) that determine the relative ordering of combining marks during normalization.

Related Terms