शब्दावली
Unicode, एन्कोडिंग और टाइपोग्राफी शब्दों की व्याख्या।
ASCII
American Standard Code for Information Interchange — a 7-bit encoding for 128 characters including English letters, digits, and control characters.
EncodingByte Order Mark (BOM)
A special Unicode character (U+FEFF) at the start of a file indicating its byte order and encoding format.
EncodingCharacter Set (Charset)
A defined set of characters recognized by a computing system. Often used interchangeably with 'encoding' though technically different.
EncodingLatin-1 (ISO 8859-1)
A single-byte encoding for Western European languages covering 256 characters (U+0000–U+00FF).
EncodingMojibake
Garbled text that results from decoding data with the wrong character encoding. Common when mixing Latin-1 and UTF-8.
EncodingReplacement Character
The diamond-question mark character (U+FFFD, �) displayed when a decoder encounters an invalid or unrecognizable byte sequence.
EncodingSurrogate Pair
A pair of 16-bit code units in UTF-16 that together represent a single character outside the Basic Multilingual Plane (above U+FFFF).
EncodingUTF-16
A character encoding that uses 2 or 4 bytes per character. Used internally by JavaScript and Java.
EncodingUTF-32
A fixed-width encoding using 4 bytes per character, simple but memory-intensive.
EncodingUTF-8
A variable-width character encoding that uses 1 to 4 bytes to represent Unicode code points. The dominant encoding on the web.
EncodingUnicode Escape Sequence
A way to represent characters by their code point in programming languages (\u2603 in JS/Java, \u{2603} in ES6+, \U00002603 in Python).
EncodingUnicode Normalization
The process of converting Unicode text to a standard form (NFC, NFD, NFKC, NFKD) to ensure consistent comparison and storage.
EncodingWindows-1252
A superset of Latin-1 used by default in legacy Windows applications, with extra characters in the 0x80–0x9F range.
Encoding