Руководства
The Private Use Area: Custom Characters in Unicode
Explore Unicode's Private Use Areas — how they work, why icon fonts use them, PUA in corporate fonts, and the risks of PUA characters in data exchange.
Punycode and IDN: How Unicode Domain Names Work
How Internationalized Domain Names work — Punycode encoding, IDNA 2003 vs 2008, homograph attacks, and implementing IDN support in your applications.
Legacy Encodings: Latin-1, Windows-1252, Shift-JIS, and When You Still Need Them
A practical guide to legacy character encodings — when you'll encounter Latin-1, Windows-1252, Shift-JIS, EUC-KR, and how to convert them to UTF-8.
UTF-16 and Surrogate Pairs: Why JavaScript Strings Are Complicated
Understand UTF-16 encoding and surrogate pairs — why emoji have .length 2 in JavaScript, how to handle supplementary characters, and when UTF-16 matters.
Character Encoding Detection: How Browsers and Tools Guess Your Encoding
How encoding detection works — the algorithm browsers use, statistical detectors like chardet, BOM sniffing, and why detection is never 100% reliable.
Mojibake: Why Text Turns to Garbage and How to Fix It
Understand mojibake — garbled text from encoding mismatches. Learn to diagnose, fix, and prevent encoding errors in files, databases, and web applications.
UTF-8: The Complete Guide to the Web's Dominant Encoding
Everything about UTF-8 — how it works, why it won, byte patterns, BOM handling, validation, and common pitfalls for developers.
Diacritical Marks: Understanding Accents, Umlauts, and Combining Characters
A complete guide to diacritical marks in Unicode — precomposed vs combining characters, normalization, typing accented letters, and handling diacritics in code.
Mathematical Symbols in Unicode: A Complete Reference
The definitive reference for mathematical symbols in Unicode — operators, Greek letters, set theory, logic, arrows, and where to find them by block.
Bullet (•) vs Middle Dot (·): Small Dots, Big Differences
Compare the bullet (•), middle dot (·), and other dot-like characters — proper usage in lists, navigation separators, and interpuncts.
Space Characters in Unicode: 20+ Invisible Characters Compared
Explore Unicode's space characters — regular space, non-breaking space, zero-width space, em space, thin space, and other invisible formatting characters.
Zero vs Letter O: Unicode Confusables and Homograph Attacks
How 0, O, and О (Cyrillic) create confusion — from font design to IDN homograph attacks, confusable detection, and security implications.
Minus vs Hyphen vs Dash: Five Characters That Look Like a Line
Navigate the confusing world of horizontal line characters — hyphen-minus, en dash, em dash, minus sign, and horizontal bar.
Variation Selectors: How Unicode Controls Text vs Emoji Display
Understand Unicode variation selectors — VS15 for text presentation, VS16 for emoji presentation, and how they control whether ☺ or 😊 appears.
Multiplication Sign (×) vs Letter X: Spot the Difference
Distinguish the multiplication sign (×, U+00D7) from lowercase x and uppercase X — visual comparison, Unicode properties, and proper usage in math.
Ellipsis (…) vs Three Dots (...): One Character or Three?
Compare the Unicode ellipsis character (…) with three period characters (...) — typographic differences, CSS text-overflow, and when each is appropriate.
Curly Quotes vs Straight Quotes: Typography's Most Common Mix-Up
Understand the difference between smart quotes (“ ”) and straight quotes (" ") — when to use each, code vs prose, and auto-conversion pitfalls.
En Dash vs Em Dash: When to Use – and —
Learn the difference between en dash (–) and em dash (—) — usage rules, typing methods, HTML entities, and CSS implementation.
Grapheme Clusters: Why String Length Is More Complicated Than You Think
Understand grapheme clusters — why 'café' can be 4 or 5 code points, why emoji have .length 2+ in JavaScript, and how to count what users actually see.
Code Point vs Character vs Glyph: The Three Levels of Text
Understand the three levels of text representation — code points (numbers), characters (abstract identities), and glyphs (visual shapes in fonts).
What Is a Code Point? Understanding Unicode's U+ Notation
Learn what Unicode code points are — the U+ notation system, how code points differ from characters and glyphs, and how to find any character's code point.