Reference
In-depth reference guides for symbols, encodings, and character comparisons — from confusable character pairs to encoding survival guides.
The Private Use Area: Custom Characters in Unicode
Explore Unicode's Private Use Areas — how they work, why icon fonts use them, PUA in corporate fonts, and the risks of PUA characters in data exchange.
Aoû 27, 2024Punycode and IDN: How Unicode Domain Names Work
How Internationalized Domain Names work — Punycode encoding, IDNA 2003 vs 2008, homograph attacks, and implementing IDN support in your applications.
Aoû 20, 2024Legacy Encodings: Latin-1, Windows-1252, Shift-JIS, and When You Still Need Them
A practical guide to legacy character encodings — when you'll encounter Latin-1, Windows-1252, Shift-JIS, EUC-KR, and how to convert them to UTF-8.
Aoû 6, 2024UTF-16 and Surrogate Pairs: Why JavaScript Strings Are Complicated
Understand UTF-16 encoding and surrogate pairs — why emoji have .length 2 in JavaScript, how to handle supplementary characters, and when UTF-16 matters.
Jul 23, 2024Character Encoding Detection: How Browsers and Tools Guess Your Encoding
How encoding detection works — the algorithm browsers use, statistical detectors like chardet, BOM sniffing, and why detection is never 100% reliable.
Jul 9, 2024Mojibake: Why Text Turns to Garbage and How to Fix It
Understand mojibake — garbled text from encoding mismatches. Learn to diagnose, fix, and prevent encoding errors in files, databases, and web applications.
Jui 25, 2024UTF-8: The Complete Guide to the Web's Dominant Encoding
Everything about UTF-8 — how it works, why it won, byte patterns, BOM handling, validation, and common pitfalls for developers.
Jui 18, 2024Diacritical Marks: Understanding Accents, Umlauts, and Combining Characters
A complete guide to diacritical marks in Unicode — precomposed vs combining characters, normalization, typing accented letters, and handling diacritics in code.
Mar 12, 2024Mathematical Symbols in Unicode: A Complete Reference
The definitive reference for mathematical symbols in Unicode — operators, Greek letters, set theory, logic, arrows, and where to find them by block.
Jan 30, 2024Bullet (•) vs Middle Dot (·): Small Dots, Big Differences
Compare the bullet (•), middle dot (·), and other dot-like characters — proper usage in lists, navigation separators, and interpuncts.
Nov 7, 2023Space Characters in Unicode: 20+ Invisible Characters Compared
Explore Unicode's space characters — regular space, non-breaking space, zero-width space, em space, thin space, and other invisible formatting characters.
Oct 24, 2023Zero vs Letter O: Unicode Confusables and Homograph Attacks
How 0, O, and О (Cyrillic) create confusion — from font design to IDN homograph attacks, confusable detection, and security implications.
Oct 10, 2023Minus vs Hyphen vs Dash: Five Characters That Look Like a Line
Navigate the confusing world of horizontal line characters — hyphen-minus, en dash, em dash, minus sign, and horizontal bar.
Sep 26, 2023Variation Selectors: How Unicode Controls Text vs Emoji Display
Understand Unicode variation selectors — VS15 for text presentation, VS16 for emoji presentation, and how they control whether ☺ or 😊 appears.
Sep 19, 2023Multiplication Sign (×) vs Letter X: Spot the Difference
Distinguish the multiplication sign (×, U+00D7) from lowercase x and uppercase X — visual comparison, Unicode properties, and proper usage in math.
Sep 12, 2023Ellipsis (…) vs Three Dots (...): One Character or Three?
Compare the Unicode ellipsis character (…) with three period characters (...) — typographic differences, CSS text-overflow, and when each is appropriate.
Aoû 29, 2023Curly Quotes vs Straight Quotes: Typography's Most Common Mix-Up
Understand the difference between smart quotes (“ ”) and straight quotes (" ") — when to use each, code vs prose, and auto-conversion pitfalls.
Aoû 15, 2023En Dash vs Em Dash: When to Use – and —
Learn the difference between en dash (–) and em dash (—) — usage rules, typing methods, HTML entities, and CSS implementation.
Aoû 1, 2023Grapheme Clusters: Why String Length Is More Complicated Than You Think
Understand grapheme clusters — why 'café' can be 4 or 5 code points, why emoji have .length 2+ in JavaScript, and how to count what users actually see.
Jui 20, 2023Code Point vs Character vs Glyph: The Three Levels of Text
Understand the three levels of text representation — code points (numbers), characters (abstract identities), and glyphs (visual shapes in fonts).
Mai 2, 2023What Is a Code Point? Understanding Unicode's U+ Notation
Learn what Unicode code points are — the U+ notation system, how code points differ from characters and glyphs, and how to find any character's code point.
Avr 4, 2023