What Is a Combining Character?
A combining character is a Unicode character that has no standalone visual representation — instead, it attaches to and modifies the appearance of the preceding base character. Combining characters are used to add diacritical marks (accents, umlauts, cedillas), phonetic annotations, mathematical decorations, and other visual modifications to base glyphs.
For example, the sequence e (U+0065) followed by ◌́ (U+0301, combining acute accent) produces é. This is visually identical to the precomposed form é (U+00E9), but consists of two separate code points.
Unicode General Category for Combining Characters
Combining characters belong to the Mark (M) general category:
| Code | Name | Description |
|---|---|---|
Mn |
Nonspacing Mark | Positioned relative to base, takes no width (e.g., accents) |
Mc |
Spacing Combining Mark | Takes up space next to the base (common in South Asian scripts) |
Me |
Enclosing Mark | Encloses the base character (e.g., combining enclosing circle) |
Common Combining Characters
| Code Point | Name | Example |
|---|---|---|
U+0301 |
Combining Acute Accent | á, é, í, ó, ú |
U+0300 |
Combining Grave Accent | à, è, ì |
U+0308 |
Combining Diaeresis | ä, ë, ï |
U+0327 |
Combining Cedilla | ç, ş |
U+0303 |
Combining Tilde | ã, ñ |
U+20D7 |
Combining Right Arrow Above | vector notation |
Normalization: NFC vs. NFD
Unicode provides multiple normalization forms to handle the equivalence between precomposed and decomposed representations:
- NFD (Canonical Decomposition): Decomposes precomposed characters into base + combining sequences.
é→e+◌́ - NFC (Canonical Composition): Composes base + combining sequences into precomposed forms where available.
e+◌́→é
import unicodedata
eacute_precomposed = '\u00E9' # é (precomposed)
eacute_decomposed = 'e\u0301' # e + combining acute
print(len(eacute_precomposed)) # 1
print(len(eacute_decomposed)) # 2
print(eacute_precomposed == eacute_decomposed) # False!
# Normalize to NFC for string comparison
nfc_a = unicodedata.normalize('NFC', eacute_precomposed)
nfc_b = unicodedata.normalize('NFC', eacute_decomposed)
print(nfc_a == nfc_b) # True
# Check if a character is a combining mark
for char in eacute_decomposed:
cat = unicodedata.category(char)
print(f'{repr(char)}: {cat}') # 'e': Ll, '\u0301': Mn
// Normalize strings before comparing
const a = '\u00E9'; // precomposed é
const b = 'e\u0301'; // decomposed e + combining acute
console.log(a === b); // false
console.log(a.normalize('NFC') === b.normalize('NFC')); // true
// Split into grapheme clusters (not just code points)
const segmenter = new Intl.Segmenter();
console.log([...segmenter.segment(b)].length); // 1 (one grapheme)
Stacking Combining Marks
Multiple combining characters can be applied to a single base character, stacking on top of or below it. For example, q̈̈ could have both a tilde and a diaeresis stacked above q. The order of combining characters can matter for rendering: Unicode specifies Canonical Combining Class values (0–254) that determine the relative ordering of combining marks during normalization.