Latin-1, formally defined as ISO 8859-1, is an 8-bit character encoding standard that covers Western European languages. It uses all 256 possible byte values (0x00-0xFF) and was the dominant encoding for Western European content on the web before UTF-8 became universal. Latin-1 holds a unique place in Unicode history: its 256 characters map exactly to the first 256 Unicode code points.
Character Layout
Latin-1 divides its 256 code points into four groups:
| Range | Decimal | Content |
|---|---|---|
| C0 controls | 0-31 | Same as ASCII control characters |
| Basic Latin | 32-127 | Identical to ASCII |
| C1 controls | 128-159 | Non-printable control characters |
| Latin-1 supplement | 160-255 | Accented letters and symbols |
The Latin-1 supplement (0xA0-0xFF) adds characters needed for Western European languages:
| Hex | Character | Name |
|---|---|---|
| 0xA0 | (space) | Non-breaking space |
| 0xA9 | (c) | Copyright sign |
| 0xAE | (R) | Registered sign |
| 0xC0-0xD6 | A-O with marks | Uppercase accented letters |
| 0xE0-0xF6 | a-o with marks | Lowercase accented letters |
| 0xFF | y-umlaut | Latin small letter y with diaeresis |
Latin-1 and Unicode
The first 256 Unicode code points are identical to Latin-1. U+00A9 is the copyright sign, U+00E9 is e-acute -- the same assignments as Latin-1 bytes 0xA9 and 0xE9. This deliberate alignment means that any Latin-1 byte can be interpreted as a Unicode code point without a lookup table.
However, Latin-1 bytes and UTF-8 bytes are not the same for the 0x80-0xFF range. In UTF-8, values 0x80-0xFF signal multi-byte sequences. The byte 0xE9 in Latin-1 is e-acute, but in UTF-8 it is the start of a 3-byte sequence:
# Latin-1 vs UTF-8 for e-acute (U+00E9)
print(b'\xe9'.decode('latin-1')) # correct single-byte decode
try:
b'\xe9'.decode('utf-8') # raises UnicodeDecodeError
except UnicodeDecodeError as e:
print(e) # incomplete multibyte sequence
# e-acute in UTF-8 requires two bytes
print('\u00e9'.encode('utf-8').hex()) # 'c3a9'
print('\u00e9'.encode('latin-1').hex()) # 'e9'
Decoding Any Byte Sequence
Because Latin-1 maps every possible byte value to a character, it never raises a decoding error. This makes it a useful 'lossless' encoding for manipulating arbitrary binary data as text:
# Read arbitrary bytes as Latin-1 without errors
binary_data = bytes(range(256))
text = binary_data.decode('latin-1') # always succeeds
back = text.encode('latin-1') # lossless round-trip
print(back == binary_data) # True
This property is exploited by the email package and some HTTP libraries when they need to pass bytes through a text interface.
Comparing with UTF-8
# Same visible character, different bytes
char = '\u00e9' # e with acute accent
utf8_bytes = char.encode('utf-8')
latin1_bytes = char.encode('latin-1')
print(utf8_bytes.hex()) # 'c3a9' (2 bytes)
print(latin1_bytes.hex()) # 'e9' (1 byte)
print(len(utf8_bytes)) # 2
print(len(latin1_bytes)) # 1
Limitations
Latin-1 cannot represent characters outside its 256-character range. Languages like Polish, Czech, and Romanian require characters not in the set. Even within Western European languages, the Euro sign (U+20AC) is absent -- it was introduced in 1999 after Latin-1 was standardized. Windows-1252 added the Euro sign at byte 0x80. For any new project, UTF-8 should be used; Latin-1 appears today primarily in legacy systems, old email messages, and HTTP responses where charset=iso-8859-1 was declared.