Glossar

Unicode-, Kodierungs- und Typografie-Begriffe erklärt.

Alle Encoding Unicode Standard Typography Input Methods Web & HTML Accessibility Programming & Dev

Encoding Detection

Techniques for detecting the character encoding of text files, including BOM sniffing, heuristics, and chardet libraries.

A phishing technique using visually similar Unicode characters in domain names to impersonate legitimate sites.

JS String methods for Unicode: codePointAt(), String.fromCodePoint(), and the spread operator for grapheme iteration.

Python standard library module for looking up Unicode character names, categories, and properties.

Why str.length in JavaScript returns UTF-16 code units, not visual characters — and how to count graphemes correctly.

Sorting text according to language-specific rules using the Unicode Collation Algorithm (UCA, UTS #10).

How Unicode characters in URLs are handled: IRI (RFC 3987), percent-encoding of UTF-8 bytes, and browser display.

Regex syntax (\p{Script=Greek}, \p{Letter}) that matches characters by Unicode properties. Supported in JS, Java, Python 3.8+.

A programming best practice: decode bytes → process text as Unicode → encode bytes. Keeps Unicode in the middle.

The Unicode algorithm for splitting text into user-perceived characters, handling emoji sequences, combining marks, etc.

Using Unicode-aware regular expressions with flags like /u in JS and re.UNICODE in Python.

Understanding the three abstraction levels: a code point (number), a character (abstract), and a glyph (visual rendering).