SymbolFYI

IDN Homograph Attack

Programming & Dev

Определение

A phishing technique using visually similar Unicode characters in domain names to impersonate legitimate sites.

IDN Homograph Attacks

An IDN (Internationalized Domain Name) homograph attack exploits the visual similarity between Unicode characters from different scripts to register domain names that appear identical to legitimate ones. These attacks are enabled by the Punycode encoding system that allows Unicode characters in domain names and are among the most sophisticated phishing techniques.

How Homograph Attacks Work

Many Unicode characters look visually identical or nearly identical to common Latin characters:

Latin a  (U+0061)  vs.  Cyrillic а  (U+0430)  → identical in many fonts
Latin e  (U+0065)  vs.  Cyrillic е  (U+0435)  → identical in many fonts
Latin o  (U+006F)  vs.  Cyrillic о  (U+043E)  → identical in many fonts
Latin p  (U+0070)  vs.  Cyrillic р  (U+0440)  → identical in many fonts
Latin c  (U+0063)  vs.  Cyrillic с  (U+0441)  → identical in many fonts

An attacker can register аррӏе.com (all Cyrillic characters) which looks identical to apple.com but resolves to a completely different IP address:

import encodings.idna

legitimate = 'apple.com'
attack = '\u0430\u0440\u0440\u04CF\u0435.com'  # Cyrillic apple

# They look the same but are different
print(legitimate == attack)  # False

# Punycode reveals the difference
print(attack.encode('idna'))  # b'xn--80ak6aa92e.com'
print(legitimate.encode('idna'))  # b'apple.com'

Browser Defenses

Browsers have implemented various strategies to mitigate homograph attacks:

Chrome

Chrome displays domains in Punycode form (xn--...) when the domain contains: - Characters from scripts not used in the user's top preferred languages - Mixed-script labels (e.g., Latin + Cyrillic in the same label) - Characters that are confusable with ASCII based on the IDNA confusables list

Firefox

Firefox uses an IDN display algorithm that compares domains against a whitelist of TLDs with strong registry policies and shows Punycode for suspicious combinations.

Safari

Safari displays Punycode for domains with mixed-script characters.

ICANN Policies

The Internet Corporation for Assigned Names and Numbers (ICANN) has established rules for IDN registration:

Registry operators must implement "bundle" policies grouping visually similar characters
Many registries prohibit mixed-script registrations entirely
The ICANN IDN Guidelines prohibit registration of domain names that are confusable with existing delegated TLDs

Detecting Confusable Characters

Unicode provides a confusables database (Unicode Security Mechanisms, UTS #39) listing character pairs that are visually similar:

# Using the 'confusable-homoglyphs' package
from confusable_homoglyphs import confusables

# Check if a string contains potentially confusable characters
confusables.is_dangerous('аррӏе.com', preferred_aliases=['latin'])
# True — contains characters confusable with Latin

# Get confusable character groups
confusables.categories('а')
# Returns groups showing Cyrillic а is confusable with Latin a

Mitigation for Developers

Domain Validation

import unicodedata
import idna

def validate_domain(domain: str) -> bool:
    try:
        # Attempt IDNA encoding
        encoded = idna.encode(domain, alec=True).decode('ascii')

        # Check for mixed scripts in each label
        for label in domain.rstrip('.').split('.'):
            scripts = set()
            for char in label:
                name = unicodedata.name(char, '')
                if 'LATIN' in name: scripts.add('latin')
                elif 'CYRILLIC' in name: scripts.add('cyrillic')
                elif 'GREEK' in name: scripts.add('greek')

            if len(scripts) > 1:
                return False  # Mixed script — suspicious

        return True
    except (idna.core.InvalidCodepoint, UnicodeError):
        return False

Display in Applications

When displaying user-supplied URLs, always show both the Unicode and Punycode forms for non-ASCII domains, or consistently show the Punycode form to prevent confusion.