Internationalized Domain Names (IDN)
Internationalized Domain Names (IDN) extend the Domain Name System (DNS) to support domain names containing non-ASCII characters — enabling web addresses in Arabic, Chinese, Cyrillic, Devanagari, and dozens of other scripts. The technical standard is defined in RFC 5890–5894.
The Problem IDN Solves
DNS was designed in the 1980s when only ASCII characters were considered. Domain labels (the parts between dots) were restricted to letters a-z, digits 0-9, and hyphens. This excluded the vast majority of the world's writing systems from being used in domain names.
IDN allows domains such as:
- münchen.de (German, with umlaut)
- 中文.com (Chinese)
- الإمارات.ae (Arabic, the ccTLD for the UAE)
- भारत.भारत (Hindi)
Punycode Encoding
Because DNS infrastructure only handles ASCII, IDN labels are converted to an ASCII-compatible encoding called Punycode (RFC 3492). Punycode-encoded labels begin with the ACE prefix xn--.
Examples:
| Unicode Domain | Punycode Equivalent |
|---|---|
münchen.de |
xn--mnchen-3ya.de |
中文.com |
xn--fiq228c.com |
مثال.com |
xn--mgbh0fb.com |
# Python: encode/decode IDN
domain = 'münchen.de'
encoded = domain.encode('idna').decode('ascii')
print(encoded) # xn--mnchen-3ya.de
decoded = encoded.encode('ascii').decode('idna')
print(decoded) # münchen.de
How It Works End-to-End
- User types
münchen.dein their browser - Browser converts it to
xn--mnchen-3ya.de(Punycode) - DNS lookup proceeds using the ASCII form
- Browser displays the Unicode form in the address bar
Internationalized Top-Level Domains
IDN support extends to top-level domains (TLDs) themselves. ICANN approved Internationalized TLDs starting in 2010:
.中国(China) →xn--fiqs8s.рф(Russia,.rf) →xn--p1acf.भारत(India) →xn--h2brj9c.مصر(Egypt) →xn--wgbh1c
Security Concerns: Homograph Attacks
IDN introduces a significant security risk: homograph attacks, where malicious domains use characters from different scripts that look visually identical to Latin letters.
For example, Cyrillic а (U+0430) looks identical to Latin a (U+0061). A phishing domain could use аpple.com (Cyrillic а) to impersonate apple.com.
Browsers defend against this by: - Displaying Punycode for suspicious mixed-script domains - Applying registrar policies that restrict mixed-script registrations - Showing warnings for confusable characters
IDNA Versions
Two versions of the IDN standard exist:
- IDNA2003 (RFC 3490): Original standard, uses NAMEPREP profile of stringprep
- IDNA2008 (RFC 5891): Updated standard, stricter rules, better Unicode alignment
Python's encodings.idna module implements IDNA2003. For IDNA2008, use the idna package:
import idna # pip install idna
print(idna.encode('münchen.de')) # b'xn--mnchen-3ya.de'
print(idna.decode('xn--mnchen-3ya.de')) # münchen.de