SymbolFYI

Punycode

Web & HTML
Определение

An encoding syntax for representing Unicode strings with ASCII characters, used in Internationalized Domain Names.

Punycode and Internationalized Domain Names

Punycode is an encoding algorithm (RFC 3492) that converts Unicode strings into ASCII-compatible encoding (ACE), enabling the use of non-ASCII characters in domain names. It is the foundation of Internationalized Domain Names in Applications (IDNA), the system that allows domain names like münchen.de or 日本語.jp to function on the internet.

Why Punycode Exists

The Domain Name System (DNS) was designed when ASCII was the only character encoding in widespread use. DNS labels (the dot-separated parts of a domain name) are technically restricted to letters, digits, and hyphens (LDH). To accommodate non-ASCII scripts while remaining compatible with existing DNS infrastructure, IDNA encodes Unicode domain labels into ASCII strings that DNS can handle.

Encoding Format

Punycode-encoded domain labels start with the prefix xn-- followed by the encoded string:

münchen.de    →  xn--mnchen-3ya.de
日本語.jp      →  xn--wgv71a309e.jp
püré.com      →  xn--pr-mja3e.com
москва.рф     →  xn--c1acohy2ayh.xn--p1acf

The encoding separates ASCII characters (which are kept as-is) from non-ASCII characters, then encodes the Unicode positions using a base-36 number system.

How IDNA Works in Practice

When a user types münchen.de in a browser:

  1. The browser's URL parser detects non-ASCII characters in the hostname
  2. It encodes each label using Punycode: münchenxn--mnchen-3ya
  3. The DNS query is sent using the Punycode form: xn--mnchen-3ya.de
  4. The address bar displays the original Unicode form (IDN display)

IDNA 2003 vs. IDNA 2008

Two versions of the IDNA standard exist with different character validity rules:

  • IDNA 2003 (RFC 3490): Allowed more characters, used NAMEPREP profile
  • IDNA 2008 (RFC 5891): Stricter rules, better defined, used by most modern registrars

Some edge cases (like ß vs. ss) behave differently between versions.

Punycode in Python

import encodings.idna

# Encode to Punycode
'münchen'.encode('punycode').decode('ascii')
# → 'mnchen-3ya'

# IDNA encoding (adds xn-- prefix)
'münchen.de'.encode('idna').decode('ascii')
# → 'xn--mnchen-3ya.de'

# Decode from IDNA
'xn--mnchen-3ya.de'.encode('ascii').decode('idna')
# → 'münchen.de'

# Using the idna library (pip install idna) for IDNA 2008
import idna
idna.encode('münchen.de')
# → b'xn--mnchen-3ya.de'

Punycode in JavaScript

// Node.js (built-in 'punycode' module, deprecated but available)
const punycode = require('punycode/');
connycode.toASCII('münchen.de');   // 'xn--mnchen-3ya.de'
punycode.toUnicode('xn--mnchen-3ya.de');  // 'münchen.de'

// URL API handles IDNA automatically
new URL('https://münchen.de/').hostname;  // 'xn--mnchen-3ya.de'

Security: IDN Homograph Attacks

Punycode and IDNA enable a class of attacks where visually similar characters from different scripts are used to spoof domain names. For example, аpple.com with a Cyrillic а (U+0430) looks identical to apple.com with a Latin a (U+0061) but resolves to a different domain. Browsers mitigate this by displaying Punycode for mixed-script or potentially spoofing domains; see the IDN homograph entry for details.

Похожие символы

Связанные термины

Связанные инструменты

Связанные руководства