SymbolFYI

URL Encoding (Percent-Encoding)

Web & HTML

Definition

A method of encoding special characters in URLs by replacing them with % followed by two hex digits of their UTF-8 byte values.

Percent-Encoding (URL Encoding)

Percent-encoding (also called URL encoding) is a method for representing arbitrary bytes in a URI by replacing each byte with a percent sign followed by two hexadecimal digits representing the byte value. It is defined by RFC 3986 as the standard mechanism for including characters that are not allowed in URIs or that would be ambiguous in URI syntax.

Basic Mechanism

Each byte is encoded as %XX where XX is the uppercase hexadecimal representation of the byte value:

space   → %20
!       → %21
#       → %23
$       → %24
%       → %25
&       → %26
+       → %2B
/       → %2F
?       → %3F
@       → %40
[       → %5B
]       → %5D

Unicode Characters in URLs

For non-ASCII characters, percent-encoding applies to the UTF-8 byte sequence, not the Unicode code point directly:

# é (U+00E9) in UTF-8 is the two bytes 0xC3 0xA9
'é' → '%C3%A9'

# ☃ (U+2603 SNOWMAN) in UTF-8 is three bytes: 0xE2 0x98 0x83
'☃' → '%E2%98%83'

# 😀 (U+1F600) in UTF-8 is four bytes: 0xF0 0x9F 0x98 0x80
'😀' → '%F0%9F%98%80'

Safe vs. Reserved Characters

RFC 3986 defines:

Unreserved characters (never encoded): A-Z, a-z, 0-9, -, _, ., ~
Reserved characters (have special URI meaning): :, /, ?, #, [, ], @, !, $, &, ', (, ), *, +, ,, ;, =
Everything else: Must be percent-encoded

Reserved characters may or may not be encoded depending on their position in the URI and whether they are being used for their reserved purpose or as data.

Application/x-www-form-urlencoded

HTML form data uses a variant of percent-encoding defined by the application/x-www-form-urlencoded content type, which differs from RFC 3986 in one key way: spaces are encoded as + rather than %20:

hello world → hello+world (form encoding)
hello world → hello%20world (RFC 3986 percent-encoding)

Encoding in Practice

Python

from urllib.parse import quote, quote_plus, urlencode, unquote

# Percent-encode a string (RFC 3986)
quote('hello world')        # 'hello%20world'
quote('café')               # 'caf%C3%A9'
quote('/path/file', safe='/')  # '/path/file' (safe chars not encoded)

# Form encoding (space → +)
quote_plus('hello world')   # 'hello+world'

# Encode query parameters
urlencode({'q': 'snow ☃', 'lang': 'en'})
# 'q=snow+%E2%98%83&lang=en'

# Decode
unquote('%C3%A9')           # 'é'

JavaScript

// Encode URI components (encodes everything except unreserved chars)
encodeURIComponent('hello world')  // 'hello%20world'
encodeURIComponent('café')         // 'caf%C3%A9'
encodeURIComponent('snow ☃')       // 'snow%20%E2%98%83'

// Encode a complete URI (preserves :, /, ?, # etc.)
encodeURI('https://example.com/café?q=snow ☃')
// 'https://example.com/caf%C3%A9?q=snow%20%E2%98%83'

// Decode
decodeURIComponent('%C3%A9')  // 'é'

Double Encoding Pitfall

A common mistake is encoding an already-encoded string, resulting in %25-encoded percent signs:

// First encoding
encodeURIComponent('hello world')  // 'hello%20world'

// Accidental double encoding
encodeURIComponent('hello%20world')  // 'hello%2520world' — WRONG

Always encode raw data exactly once before including it in a URI.

Related Terms