SymbolFYI

Encoding Survival Guide

A 6-part practical series on character encoding — UTF-8 byte structure, mojibake diagnosis, encoding detection, and the Unicode sandwich.

  1. 1

    UTF-8: The Complete Guide to the Web's Dominant Encoding

    Everything about UTF-8 — how it works, why it won, byte patterns, BOM handling, validation, and common pitfalls for developers.

  2. 2

    Mojibake: Why Text Turns to Garbage and How to Fix It

    Understand mojibake — garbled text from encoding mismatches. Learn to diagnose, fix, and prevent encoding errors in files, databases, and web applications.

  3. 3

    Character Encoding Detection: How Browsers and Tools Guess Your Encoding

    How encoding detection works — the algorithm browsers use, statistical detectors like chardet, BOM sniffing, and why detection is never 100% reliable.

  4. 4

    UTF-16 and Surrogate Pairs: Why JavaScript Strings Are Complicated

    Understand UTF-16 encoding and surrogate pairs — why emoji have .length 2 in JavaScript, how to handle supplementary characters, and when UTF-16 matters.

  5. 5

    Legacy Encodings: Latin-1, Windows-1252, Shift-JIS, and When You Still Need Them

    A practical guide to legacy character encodings — when you'll encounter Latin-1, Windows-1252, Shift-JIS, EUC-KR, and how to convert them to UTF-8.

  6. 6

    Punycode and IDN: How Unicode Domain Names Work

    How Internationalized Domain Names work — Punycode encoding, IDNA 2003 vs 2008, homograph attacks, and implementing IDN support in your applications.