SymbolFYI

शब्दावली

Unicode, एन्कोडिंग और टाइपोग्राफी शब्दों की व्याख्या।

Code Point vs Character vs Glyph

Understanding the three abstraction levels: a code point (number), a character (abstract), and a glyph (visual rendering).

Programming & Dev

Encoding Detection

Techniques for detecting the character encoding of text files, including BOM sniffing, heuristics, and chardet libraries.

Programming & Dev

Grapheme Segmentation (UAX #29)

The Unicode algorithm for splitting text into user-perceived characters, handling emoji sequences, combining marks, etc.

Programming & Dev

IDN Homograph Attack

A phishing technique using visually similar Unicode characters in domain names to impersonate legitimate sites.

Programming & Dev

JavaScript String & Code Points

JS String methods for Unicode: codePointAt(), String.fromCodePoint(), and the spread operator for grapheme iteration.

Programming & Dev

Python unicodedata Module

Python standard library module for looking up Unicode character names, categories, and properties.

Programming & Dev

Regex Unicode Support

Using Unicode-aware regular expressions with flags like /u in JS and re.UNICODE in Python.

Programming & Dev

String Length vs Character Count

Why str.length in JavaScript returns UTF-16 code units, not visual characters — and how to count graphemes correctly.

Programming & Dev

Unicode Collation

Sorting text according to language-specific rules using the Unicode Collation Algorithm (UCA, UTS #10).

Programming & Dev

Unicode Property Escapes (\p{})

Regex syntax (\p{Script=Greek}, \p{Letter}) that matches characters by Unicode properties. Supported in JS, Java, Python 3.8+.

Programming & Dev

Unicode Sandwich Pattern

A programming best practice: decode bytes → process text as Unicode → encode bytes. Keeps Unicode in the middle.

Programming & Dev

Unicode in URLs & IRIs

How Unicode characters in URLs are handled: IRI (RFC 3987), percent-encoding of UTF-8 bytes, and browser display.

Programming & Dev