SymbolFYI

Combining Character

Unicode Standard
Definition

A Unicode character that modifies the preceding base character, such as accents and diacritical marks.

What Is a Combining Character?

A combining character is a Unicode character that has no standalone visual representation — instead, it attaches to and modifies the appearance of the preceding base character. Combining characters are used to add diacritical marks (accents, umlauts, cedillas), phonetic annotations, mathematical decorations, and other visual modifications to base glyphs.

For example, the sequence e (U+0065) followed by ◌́ (U+0301, combining acute accent) produces é. This is visually identical to the precomposed form é (U+00E9), but consists of two separate code points.

Unicode General Category for Combining Characters

Combining characters belong to the Mark (M) general category:

Code Name Description
Mn Nonspacing Mark Positioned relative to base, takes no width (e.g., accents)
Mc Spacing Combining Mark Takes up space next to the base (common in South Asian scripts)
Me Enclosing Mark Encloses the base character (e.g., combining enclosing circle)

Common Combining Characters

Code Point Name Example
U+0301 Combining Acute Accent á, é, í, ó, ú
U+0300 Combining Grave Accent à, è, ì
U+0308 Combining Diaeresis ä, ë, ï
U+0327 Combining Cedilla ç, ş
U+0303 Combining Tilde ã, ñ
U+20D7 Combining Right Arrow Above vector notation

Normalization: NFC vs. NFD

Unicode provides multiple normalization forms to handle the equivalence between precomposed and decomposed representations:

  • NFD (Canonical Decomposition): Decomposes precomposed characters into base + combining sequences. ée + ◌́
  • NFC (Canonical Composition): Composes base + combining sequences into precomposed forms where available. e + ◌́é
import unicodedata

eacute_precomposed = '\u00E9'    # é (precomposed)
eacute_decomposed  = 'e\u0301'  # e + combining acute

print(len(eacute_precomposed))    # 1
print(len(eacute_decomposed))     # 2
print(eacute_precomposed == eacute_decomposed)  # False!

# Normalize to NFC for string comparison
nfc_a = unicodedata.normalize('NFC', eacute_precomposed)
nfc_b = unicodedata.normalize('NFC', eacute_decomposed)
print(nfc_a == nfc_b)             # True

# Check if a character is a combining mark
for char in eacute_decomposed:
    cat = unicodedata.category(char)
    print(f'{repr(char)}: {cat}')  # 'e': Ll, '\u0301': Mn
// Normalize strings before comparing
const a = '\u00E9';    // precomposed é
const b = 'e\u0301';  // decomposed e + combining acute
console.log(a === b);                     // false
console.log(a.normalize('NFC') === b.normalize('NFC'));  // true

// Split into grapheme clusters (not just code points)
const segmenter = new Intl.Segmenter();
console.log([...segmenter.segment(b)].length);  // 1 (one grapheme)

Stacking Combining Marks

Multiple combining characters can be applied to a single base character, stacking on top of or below it. For example, q̈̈ could have both a tilde and a diaeresis stacked above q. The order of combining characters can matter for rendering: Unicode specifies Canonical Combining Class values (0–254) that determine the relative ordering of combining marks during normalization.

Related Symbols

Related Terms

Related Tools

Related Guides