SymbolFYI

IDN 호모그래프 공격 (IDN Homograph Attack)

Programming & Dev
정의

정당한 사이트를 사칭하기 위해 도메인 이름에 시각적으로 유사한 유니코드 문자를 사용하는 피싱 기법.

IDN Homograph Attacks

An IDN (Internationalized Domain Name) homograph attack exploits the visual similarity between Unicode characters from different scripts to register domain names that appear identical to legitimate ones. These attacks are enabled by the Punycode encoding system that allows Unicode characters in domain names and are among the most sophisticated phishing techniques.

How Homograph Attacks Work

Many Unicode characters look visually identical or nearly identical to common Latin characters:

Latin a  (U+0061)  vs.  Cyrillic а  (U+0430)  → identical in many fonts
Latin e  (U+0065)  vs.  Cyrillic е  (U+0435)  → identical in many fonts
Latin o  (U+006F)  vs.  Cyrillic о  (U+043E)  → identical in many fonts
Latin p  (U+0070)  vs.  Cyrillic р  (U+0440)  → identical in many fonts
Latin c  (U+0063)  vs.  Cyrillic с  (U+0441)  → identical in many fonts

An attacker can register аррӏе.com (all Cyrillic characters) which looks identical to apple.com but resolves to a completely different IP address:

import encodings.idna

legitimate = 'apple.com'
attack = '\u0430\u0440\u0440\u04CF\u0435.com'  # Cyrillic apple

# They look the same but are different
print(legitimate == attack)  # False

# Punycode reveals the difference
print(attack.encode('idna'))  # b'xn--80ak6aa92e.com'
print(legitimate.encode('idna'))  # b'apple.com'

Browser Defenses

Browsers have implemented various strategies to mitigate homograph attacks:

Chrome

Chrome displays domains in Punycode form (xn--...) when the domain contains: - Characters from scripts not used in the user's top preferred languages - Mixed-script labels (e.g., Latin + Cyrillic in the same label) - Characters that are confusable with ASCII based on the IDNA confusables list

Firefox

Firefox uses an IDN display algorithm that compares domains against a whitelist of TLDs with strong registry policies and shows Punycode for suspicious combinations.

Safari

Safari displays Punycode for domains with mixed-script characters.

ICANN Policies

The Internet Corporation for Assigned Names and Numbers (ICANN) has established rules for IDN registration:

  • Registry operators must implement "bundle" policies grouping visually similar characters
  • Many registries prohibit mixed-script registrations entirely
  • The ICANN IDN Guidelines prohibit registration of domain names that are confusable with existing delegated TLDs

Detecting Confusable Characters

Unicode provides a confusables database (Unicode Security Mechanisms, UTS #39) listing character pairs that are visually similar:

# Using the 'confusable-homoglyphs' package
from confusable_homoglyphs import confusables

# Check if a string contains potentially confusable characters
confusables.is_dangerous('аррӏе.com', preferred_aliases=['latin'])
# True — contains characters confusable with Latin

# Get confusable character groups
confusables.categories('а')
# Returns groups showing Cyrillic а is confusable with Latin a

Mitigation for Developers

Domain Validation

import unicodedata
import idna

def validate_domain(domain: str) -> bool:
    try:
        # Attempt IDNA encoding
        encoded = idna.encode(domain, alec=True).decode('ascii')

        # Check for mixed scripts in each label
        for label in domain.rstrip('.').split('.'):
            scripts = set()
            for char in label:
                name = unicodedata.name(char, '')
                if 'LATIN' in name: scripts.add('latin')
                elif 'CYRILLIC' in name: scripts.add('cyrillic')
                elif 'GREEK' in name: scripts.add('greek')

            if len(scripts) > 1:
                return False  # Mixed script — suspicious

        return True
    except (idna.core.InvalidCodepoint, UnicodeError):
        return False

Display in Applications

When displaying user-supplied URLs, always show both the Unicode and Punycode forms for non-ASCII domains, or consistently show the Punycode form to prevent confusion.

관련 기호

관련 도구

관련 가이드