SymbolFYI

Internationalized Domain Name (IDN)

Web & HTML
Definition

A domain name containing non-ASCII characters, encoded via Punycode for DNS compatibility (e.g., münchen.de → xn--mnchen-3ya.de).

Internationalized Domain Names (IDN)

Internationalized Domain Names (IDN) extend the Domain Name System (DNS) to support domain names containing non-ASCII characters — enabling web addresses in Arabic, Chinese, Cyrillic, Devanagari, and dozens of other scripts. The technical standard is defined in RFC 5890–5894.

The Problem IDN Solves

DNS was designed in the 1980s when only ASCII characters were considered. Domain labels (the parts between dots) were restricted to letters a-z, digits 0-9, and hyphens. This excluded the vast majority of the world's writing systems from being used in domain names.

IDN allows domains such as: - münchen.de (German, with umlaut) - 中文.com (Chinese) - الإمارات.ae (Arabic, the ccTLD for the UAE) - भारत.भारत (Hindi)

Punycode Encoding

Because DNS infrastructure only handles ASCII, IDN labels are converted to an ASCII-compatible encoding called Punycode (RFC 3492). Punycode-encoded labels begin with the ACE prefix xn--.

Examples:

Unicode Domain Punycode Equivalent
münchen.de xn--mnchen-3ya.de
中文.com xn--fiq228c.com
مثال.com xn--mgbh0fb.com
# Python: encode/decode IDN
domain = 'münchen.de'
encoded = domain.encode('idna').decode('ascii')
print(encoded)  # xn--mnchen-3ya.de

decoded = encoded.encode('ascii').decode('idna')
print(decoded)  # münchen.de

How It Works End-to-End

  1. User types münchen.de in their browser
  2. Browser converts it to xn--mnchen-3ya.de (Punycode)
  3. DNS lookup proceeds using the ASCII form
  4. Browser displays the Unicode form in the address bar

Internationalized Top-Level Domains

IDN support extends to top-level domains (TLDs) themselves. ICANN approved Internationalized TLDs starting in 2010:

  • .中国 (China) → xn--fiqs8s
  • .рф (Russia, .rf) → xn--p1acf
  • .भारत (India) → xn--h2brj9c
  • .مصر (Egypt) → xn--wgbh1c

Security Concerns: Homograph Attacks

IDN introduces a significant security risk: homograph attacks, where malicious domains use characters from different scripts that look visually identical to Latin letters.

For example, Cyrillic а (U+0430) looks identical to Latin a (U+0061). A phishing domain could use аpple.com (Cyrillic а) to impersonate apple.com.

Browsers defend against this by: - Displaying Punycode for suspicious mixed-script domains - Applying registrar policies that restrict mixed-script registrations - Showing warnings for confusable characters

IDNA Versions

Two versions of the IDN standard exist:

  • IDNA2003 (RFC 3490): Original standard, uses NAMEPREP profile of stringprep
  • IDNA2008 (RFC 5891): Updated standard, stricter rules, better Unicode alignment

Python's encodings.idna module implements IDNA2003. For IDNA2008, use the idna package:

import idna  # pip install idna

print(idna.encode('münchen.de'))     # b'xn--mnchen-3ya.de'
print(idna.decode('xn--mnchen-3ya.de'))  # münchen.de

Related Symbols

Related Terms

Related Tools

Related Guides