SymbolFYI

How to Use the SymbolFYI Encoding Converter

Tools Guides जून 17, 2025

The Encoding Converter is SymbolFYI's dedicated character conversion tool. Paste a character, enter a code point, or type a hex value — the tool instantly expands it into every encoding format you might need: UTF-8 bytes, UTF-16 code units, HTML entities, CSS escapes, Python string literals, JSON escapes, and more. This guide covers every input method, output format, and practical workflow the tool supports.

What the Encoding Converter Does

Encoding conversion is something developers do constantly, often by memory or with scattered bookmarks. What's the UTF-8 byte sequence for €? What's the CSS escape for the non-breaking space? What HTML entity should I use for ™?

The Encoding Converter answers all of these in a single view. Type once, get every format simultaneously, and copy whichever one your workflow requires. It eliminates the need to manually convert between decimal and hexadecimal, look up named HTML entities, or guess at Python escape syntax.

Beyond single-character lookup, the tool also handles batch conversion — paste a whole string and it breaks it into individual characters, showing the conversion table for each one. This is invaluable for debugging encoding issues in text that's been through multiple systems.

Input Methods

The Encoding Converter accepts input in several formats, and it detects which format you've entered automatically.

Paste the Character Directly

The simplest input: click the character input field and paste (or type) the character you want to convert. Works for any character your keyboard or clipboard can produce — accented letters, emoji, CJK characters, mathematical symbols, everything.

For emoji and other characters outside the Basic Multilingual Plane, the tool correctly handles the fact that these require more than one UTF-16 code unit (a surrogate pair) and more than two UTF-8 bytes. The output will show the full byte sequences, not just the base plane representation.

unicode-code-point">Enter a Unicode Code Point

If you know the code point, enter it in the format U+XXXX or just the hex digits XXXX. The tool accepts both uppercase and lowercase hex. For code points above U+FFFF (supplementary characters), use five or six hex digits: U+1F600 for the grinning face emoji.

You can also enter the code point as a decimal number. If the tool sees a number without hex indicators, it interprets it as decimal. So entering 169 gives you U+00A9 (©), and entering 0xA9 or A9 gives you the same character via hex.

Type a Named HTML Entity

Enter an HTML entity name — including the ampersand and semicolon, such as — or © — and the tool resolves it to the underlying character and expands into all other formats. This is useful when you have an entity from HTML source and need the equivalent CSS escape or Python literal.

Not all entity names are in common knowledge; the tool accepts any named entity from the HTML5 specification, including less-common ones like « (left-pointing double angle quotation mark, «) and   (thin space).

Enter Bytes in Hex

If you're looking at raw byte data — perhaps from a network packet, a database dump, or a hex editor — you can enter the UTF-8 byte sequence directly. Format the bytes as space-separated hex pairs: C2 A9 for ©, or E2 80 94 for the em dash. The tool decodes the byte sequence and shows you the character plus all other representations.

Output Formats

For each character, the Encoding Converter produces a complete conversion table. Every row has a one-click copy button.

UTF-8 Bytes

The UTF-8 byte sequence displayed as both hex octets and as decimal values:

© (U+00A9): C2 A9 (two bytes)
€ (U+20AC): E2 80 AC (three bytes)
😀 (U+1F600): F0 9F 98 80 (four bytes)

Understanding byte length matters when your system imposes byte limits rather than character limits — MySQL's VARCHAR(255) is 255 bytes in utf8mb4, meaning a 255-character limit in ASCII drops to 63 characters for four-byte emoji.

UTF-16 Code Units

UTF-16 represents code points below U+10000 as a single 16-bit code unit, and code points at U+10000 and above as a pair of 16-bit surrogate values. The output shows both the code unit value(s) in hex and the byte representations in both little-endian and big-endian byte order:

© (U+00A9): 00A9 (LE: A9 00, BE: 00 A9)
😀 (U+1F600): D83D DE00 (surrogate pair)

This is particularly relevant in JavaScript, where strings are internally UTF-16, and .length counts code units — so "😀".length === 2 in JavaScript despite being a single emoji.

HTML Named Entity

The standard named entity from the HTML specification, if one exists. Not every character has a named entity — only a curated subset defined in the HTML5 spec does. When no named entity exists, this field shows "none" and the numeric forms below are your fallback.

Common named entities: © & < >   — « ™

HTML Decimal Entity

The numeric character reference in decimal form: © for ©. Always available regardless of whether a named entity exists. Useful for editors or CMS platforms that sanitize HTML and may strip named entities.

HTML Hex Entity

The numeric character reference in hexadecimal form: © for ©. Equivalent to the decimal form; preference is stylistic. Some teams prefer hex because it directly corresponds to the Unicode code point.

CSS Escape

The CSS escape sequence used in stylesheets, most commonly in the content property for pseudo-elements:

.icon::before {
  content: "\A9"; /* © copyright sign */
}

CSS escapes use a backslash followed by the hex code point (without U+ prefix), optionally followed by a space when the next character could be interpreted as a hex digit. The Encoding Converter outputs the escape with the trailing space already included when necessary, so you can paste it safely.

Python String Escape

Two Python escape forms are shown:

  • \uXXXX — four-hex-digit form, valid for code points U+0000–U+FFFF
  • \UXXXXXXXX — eight-hex-digit form, required for supplementary characters above U+FFFF
"\u00A9"      # © — works in Python 2 and 3
"\U0001F600"  # 😀 — supplementary character (emoji)

For characters in the ASCII range, the output also shows the simpler \xNN form.

JSON Escape

JSON escape sequences follow the same \uXXXX format as Python for the Basic Multilingual Plane. For supplementary characters (code points above U+FFFF), JSON requires a surrogate pair escape — since JSON itself has no \UXXXXXXXX form:

"\u00A9"           // ©
"\uD83D\uDE00"     // 😀 (surrogate pair in JSON)

The Encoding Converter correctly generates the surrogate pair form for supplementary characters, which is a common source of bugs when developers try to hand-write JSON escapes for emoji.

URL Percent Encoding

The percent-encoded form used in URLs, encoding each UTF-8 byte as %XX:

© → %C2%A9
€ → %E2%80%AC

Both the encoded form (safe for embedding in a URL) and the URL-decoded form are shown. Spaces appear as %20 (never + in path components, though + appears in query strings in some frameworks).

Batch Conversion

For longer strings, switch from Single Character mode to Text mode using the toggle at the top of the tool. In Text mode:

  1. Paste or type a string of any length into the input area
  2. The tool processes the string as a sequence of Unicode code points (properly handling multi-byte characters and emoji)
  3. A scrollable table shows each character as a row with its code point, UTF-8 bytes, HTML entity, and one-click copy
  4. Summary statistics appear above the table: total characters, total code points, total UTF-8 bytes, and character count by Unicode plane

Batch mode is most useful for auditing text that has passed through multiple systems — a string that came from a PDF, was stored in a database, emailed, and pasted into a web form may contain invisible characters, unexpected whitespace types, or encoding artifacts. Expanding each character individually makes these visible.

Common Workflows

Finding the HTML Entity for a Symbol

You need to put a trademark symbol (™) in HTML. Open the Encoding Converter, paste into the input field, and the output immediately shows ™ in the named entity row. If you'd prefer numeric: ™ (decimal) or ™ (hex). Copy whichever your template uses.

Getting the CSS Escape for a content Property

You're adding a decorative bullet to a list using CSS pseudo-elements and want to use the black right-pointing triangle (▶). Paste the character into the Encoding Converter; the CSS escape row shows \25B6. Use it:

li::before {
  content: "\25B6 ";
  margin-right: 0.4em;
}

The trailing space in the CSS escape ensures the following space character is treated as a literal space, not a hex continuation.

mojibake">Debugging Mojibake

Mojibake — garbled text from encoding mismatch — produces characteristic byte sequences when decoded as the wrong encoding. If you see é where you expected é, that's UTF-8 bytes being decoded as Latin-1. Use the Encoding Converter's byte input: enter C3 A9 (the UTF-8 encoding of é) and confirm that those bytes, decoded correctly as UTF-8, produce é. Then look at how é encodes — those are the Latin-1 representations of the same bytes. This helps you identify and fix the mismatch.

Preparing Database-Safe Text

You want to store emoji in a MySQL database. MySQL's utf8 charset (confusingly) only supports up to three-byte UTF-8 characters — emoji require four bytes and will be silently truncated or rejected. Use the Encoding Converter to check: enter the emoji, look at the UTF-8 bytes row, and count the bytes. Four bytes means you need utf8mb4 charset. The tool makes this check instant.

Verifying JSON Serialization

You're generating JSON programmatically and want to confirm your string escaping is correct before deploying. Enter each character of concern into the converter and compare the JSON escape shown to what your serialization library produces. If they match, you're fine. If they don't — especially for emoji — you may have a library that doesn't properly handle surrogate pairs.

Understanding Why Formats Differ

A character has one identity — its Unicode code point — but many representations. The Encoding Converter makes this concrete:

  • UTF-8 and UTF-16 are storage encodings: byte-level representations optimized for different contexts (UTF-8 for web and files, UTF-16 for internal use in Windows and JavaScript engines)
  • HTML entities are text-safe representations: ways to include characters in HTML without the parser misinterpreting them as markup
  • CSS and Python escapes are syntax-safe representations: ways to include characters in source code without the language parser misinterpreting them
  • URL percent encoding is transport-safe: ways to include arbitrary bytes in URL strings without confusing parsers

Every format represents the same character. The Encoding Converter lets you see all representations at once so you can choose the right one for each context without guessing.

For deeper background on encoding concepts, the SymbolFYI glossary covers encoding, UTF-8, and HTML entities in detail. The Character Analyzer complements this tool when you need per-character analysis of a multi-character string.

संबंधित प्रतीक

संबंधित शब्दावली

संबंधित टूल

और गाइड