शब्दावली

Unicode, एन्कोडिंग और टाइपोग्राफी शब्दों की व्याख्या।

सभी Encoding Unicode Standard Typography Input Methods Web & HTML Accessibility Programming & Dev

ASCII

American Standard Code for Information Interchange — a 7-bit encoding for 128 characters including English letters, digits, and control characters.

A special Unicode character (U+FEFF) at the start of a file indicating its byte order and encoding format.

A defined set of characters recognized by a computing system. Often used interchangeably with 'encoding' though technically different.

A single-byte encoding for Western European languages covering 256 characters (U+0000–U+00FF).

Garbled text that results from decoding data with the wrong character encoding. Common when mixing Latin-1 and UTF-8.

The diamond-question mark character (U+FFFD, �) displayed when a decoder encounters an invalid or unrecognizable byte sequence.

A pair of 16-bit code units in UTF-16 that together represent a single character outside the Basic Multilingual Plane (above U+FFFF).

A way to represent characters by their code point in programming languages (\u2603 in JS/Java, \u{2603} in ES6+, \U00002603 in Python).

The process of converting Unicode text to a standard form (NFC, NFD, NFKC, NFKD) to ensure consistent comparison and storage.

A character encoding that uses 2 or 4 bytes per character. Used internally by JavaScript and Java.

A fixed-width encoding using 4 bytes per character, simple but memory-intensive.

A variable-width character encoding that uses 1 to 4 bytes to represent Unicode code points. The dominant encoding on the web.

A superset of Latin-1 used by default in legacy Windows applications, with extra characters in the 0x80–0x9F range.