SymbolFYI

Glosario

Términos de Unicode, codificación y tipografía explicados.

ASCII

American Standard Code for Information Interchange — a 7-bit encoding for 128 characters including English letters, digits, and control characters.

Encoding

Byte Order Mark (BOM)

A special Unicode character (U+FEFF) at the start of a file indicating its byte order and encoding format.

Encoding

Character Set (Charset)

A defined set of characters recognized by a computing system. Often used interchangeably with 'encoding' though technically different.

Encoding

Latin-1 (ISO 8859-1)

A single-byte encoding for Western European languages covering 256 characters (U+0000–U+00FF).

Encoding

Mojibake

Garbled text that results from decoding data with the wrong character encoding. Common when mixing Latin-1 and UTF-8.

Encoding

Replacement Character

The diamond-question mark character (U+FFFD, �) displayed when a decoder encounters an invalid or unrecognizable byte sequence.

Encoding

Surrogate Pair

A pair of 16-bit code units in UTF-16 that together represent a single character outside the Basic Multilingual Plane (above U+FFFF).

Encoding

UTF-16

A character encoding that uses 2 or 4 bytes per character. Used internally by JavaScript and Java.

Encoding

UTF-32

A fixed-width encoding using 4 bytes per character, simple but memory-intensive.

Encoding

UTF-8

A variable-width character encoding that uses 1 to 4 bytes to represent Unicode code points. The dominant encoding on the web.

Encoding

Unicode Escape Sequence

A way to represent characters by their code point in programming languages (\u2603 in JS/Java, \u{2603} in ES6+, \U00002603 in Python).

Encoding

Unicode Normalization

The process of converting Unicode text to a standard form (NFC, NFD, NFKC, NFKD) to ensure consistent comparison and storage.

Encoding

Windows-1252

A superset of Latin-1 used by default in legacy Windows applications, with extra characters in the 0x80–0x9F range.

Encoding