The charset parameter is a declaration in HTTP headers and HTML markup that tells a browser or parser which character encoding to use when converting raw bytes into text. Without this declaration, software must guess the encoding, which often leads to garbled text (mojibake). Explicit charset declarations are a foundational part of correct web content delivery.
Where charset Appears
HTTP Content-Type Header
The most authoritative place to declare encoding is the HTTP response header:
Content-Type: text/html; charset=utf-8
Content-Type: text/plain; charset=windows-1252
Content-Type: application/json; charset=utf-8
When present, the HTTP header charset takes precedence over any in-document declaration for HTML.
HTML <meta> Tag
In HTML5, the <meta charset> tag provides an in-document fallback:
<!DOCTYPE html>
<html lang='en'>
<head>
<meta charset='utf-8'>
<title>Page Title</title>
</head>
This must appear within the first 1,024 bytes of the document so that the browser can determine the encoding before parsing any further content. In HTML5, charset='utf-8' is the only recommended value.
The older HTML4 syntax is still valid but verbose:
<meta http-equiv='Content-Type' content='text/html; charset=utf-8'>
XML Declaration
XML documents can declare encoding in the processing instruction:
<?xml version='1.0' encoding='utf-8'?>
For UTF-8 and UTF-16 (with BOM), the XML declaration is optional but recommended.
MIME Charset Names
Charset names used in HTTP and HTML are defined by the IANA Character Sets registry. Common names:
| MIME Name | Encoding |
|---|---|
utf-8 |
UTF-8 |
utf-16 |
UTF-16 with BOM |
iso-8859-1 |
Latin-1 (treated as Windows-1252 by browsers) |
windows-1252 |
Windows-1252 |
euc-kr |
Korean (EUC-KR) |
shift_jis |
Japanese (Shift-JIS) |
Names are case-insensitive: UTF-8, utf-8, and Utf-8 are equivalent.
Priority Order for HTML Encoding Detection
Browsers follow a defined priority order when determining encoding:
- HTTP
Content-Type: charset(highest priority) - Byte Order Mark (BOM) at start of document
<meta charset>or<meta http-equiv='Content-Type'>pragma- Browser sniffing / user override (lowest priority)
Accessing charset in Code
import urllib.request
with urllib.request.urlopen('https://example.com') as response:
content_type = response.headers.get_content_charset()
print(content_type) # 'utf-8' (or None if not declared)
html = response.read().decode(content_type or 'utf-8')
// fetch API: charset is embedded in Content-Type
const response = await fetch('https://example.com');
const contentType = response.headers.get('content-type');
console.log(contentType); // 'text/html; charset=utf-8'
const charset = contentType.split('charset=')[1];
Best Practice
Always declare charset=utf-8 explicitly in both the HTTP header and the <meta charset> tag. Relying on browser sniffing is unreliable and can introduce security vulnerabilities -- some sniffing heuristics can be exploited via specially crafted content to misinterpret the encoding, enabling cross-site scripting attacks (the UTF-7 XSS vector is a historical example).