The History of ASCII: How 128 Characters Shaped Computing

History History of Symbols Nov 14, 2023

● 1. The History of ASCII: How 128 Characters Shaped Computing
○ 2. The History of Unicode: From Babel to a Universal Character Set
○ 3. The History of Emoji: From Japanese Pagers to Universal Language
○ 4. The History of Typography Marks: From Gutenberg to Unicode
○ 5. Braille in Unicode: How a Tactile System Became Digital Text
○ 6. Mathematical Notation in Unicode: From Clay Tablets to Code Points

Daftar Isi

Few decisions in computing history have proven as durable as the choices made by a small committee in the early 1960s. The American Standard Code for Information Interchange — ASCII — was designed to help different machines talk to each other. It ended up shaping the architecture of computing itself, and its influence persists in every text file, web page, and programming language in use today.

Before ASCII: The Chaos of Competing Codes

To understand why ASCII mattered, you need to understand the world it replaced.

By the late 1950s, every computer manufacturer had invented its own system for representing text. IBM used several incompatible schemes across its product lines. The Remington Rand UNIVAC had its own character set. The Burroughs Corporation used another. If you wanted to send data between machines — or even between different models from the same company — you often needed custom translation software, or you simply reprinted and retyped the data by hand.

This was not a theoretical problem. Banks, airlines, and the US government were increasingly trying to share information between computers, and the incompatibility was costing real money. The American Standards Association (ASA, later ANSI) convened a committee in 1960 to solve it.

The Telegraph Heritage

The committee did not start from scratch. They had a century of accumulated wisdom to draw from: the telegraph industry.

Samuel Morse's code from the 1840s was the first serious attempt to encode text as a sequence of signals. But the most influential predecessor to ASCII was the Baudot code, developed by French telegraph engineer Émile Baudot in 1870. Baudot used a 5-bit code, giving 32 possible values — enough for the letters of the alphabet plus some control characters, using a shift mechanism to access two separate tables of characters.

Baudot's code was standardized internationally in the early 20th century and became the basis for the teletype (TTY) machines that dominated data communication well into the computer era. When the ASA committee sat down to design ASCII, they were acutely aware that their new standard needed to remain compatible with the installed base of teletype equipment and its operators.

The Murray variant of the Baudot code — widely used in teleprinters — influenced several of ASCII's control character definitions, including the Carriage Return (CR) and Line Feed (LF) characters that would become one of ASCII's most enduring, and most confusing, legacies.

The 1963 Standard: Design Decisions That Lasted Decades

ASCII was first published in 1963, with a significant revision in 1967 and a final stable form in 1968. The committee made several key architectural choices that would echo for sixty years.

Why 7 Bits?

The most fundamental decision was to use 7 bits, giving 128 possible code points (0–127). This was a deliberate compromise.

Five bits (the Baudot standard) was too few — the committee wanted to include both uppercase and lowercase letters, all ten digits, punctuation, and control characters without requiring a shift mechanism. Six bits gave 64 values — still not enough. Eight bits would have been more future-proof, but in 1963, a single bit was not trivial: transmission bandwidth was expensive, and punched paper tape (a dominant storage medium) used 7 or 8 tracks. Using 7 bits left the 8th bit available as a parity bit for error checking, which was considered a practical necessity.

The 7-bit choice was, in retrospect, the decision that would force the invention of all the competing 8-bit extensions that followed — from IBM's EBCDIC to the ISO 8859 series to eventually Unicode.

The Structure of 128 Characters

The 128 code points were organized with deliberate logic:

Code points 0–31 and 127: Control characters. These 33 positions encode commands rather than printable symbols. NUL (0), SOH (1), STX (2), ETX (3) — these were designed for the precise control of teletype machines and data transmission protocols. Some, like BEL (7, which literally rang a bell on teletype machines), feel archaic. Others, like HT (9, horizontal tab), DEL (127), and ESC (27), remain in active use.

Code points 32–126: Printable characters. Space (32), the digits 0–9 (48–57), uppercase A–Z (65–90), lowercase a–z (97–122), and a carefully chosen set of punctuation marks fill this range.

The placement was not accidental. Note that uppercase 'A' is 65 (binary 1000001) and lowercase 'a' is 97 (binary 1100001). They differ by exactly one bit — bit 5. This means case conversion in ASCII is a single bitwise operation: set bit 5 to get lowercase, clear it for uppercase. Similarly, the digit '0' is at 48, so converting a digit character to its numeric value requires subtracting 48 (or clearing the upper bits).

These structural regularities were explicitly designed in to make ASCII easy to process on the hardware of the era.

The CR+LF Problem

One of ASCII's most persistent legacies — and most common sources of cross-platform frustration — is the handling of line endings.

In the teletype world, moving to the next line required two distinct physical operations: the Carriage Return (moving the print head back to the left margin) and the Line Feed (advancing the paper by one line). Both operations were slow enough on mechanical equipment that a convention emerged: always send CR before LF to give the machine time to execute the carriage return before the next character arrived.

ASCII formalized this as two separate control characters: CR (code point 13) and LF (code point 10). Different operating systems later made different choices about which to use for line endings: - MS-DOS and Windows adopted the full CR+LF (0x0D 0x0A) from the teletype tradition - Unix and Linux used LF alone (0x0A) - Early classic Mac OS used CR alone (0x0D)

This divergence, rooted in a decision made for mechanical telegraphs in the 1860s, is still causing compatibility headaches in text files and shell scripts today.

The Characters That Didn't Make the Cut

The 128-character constraint forced painful omissions. ASCII has no accented characters — no é, ü, ñ, or ç. It has no currency symbols beyond the dollar sign. It has no fraction characters, no mathematical operators beyond the basics, and no characters from any writing system other than the Latin alphabet.

These omissions were not oversights. The committee was explicitly designing for American English business computing. International use was acknowledged but deferred — the assumption was that national standards bodies would create regional variants, which is exactly what happened through the ISO 646 standard (which specified that certain ASCII positions could be replaced by national characters, leading to the long history of \ appearing as ¥ in Japanese systems).

The dollar sign is at position 36. There is no pound sterling, no yen, no deutsche mark. The at-sign (@) is at position 64. The number sign (#) is at 35. The section sign (§) and paragraph mark (¶) do not appear at all.

ASCII Art: Creativity Within Constraints

The severe limitations of ASCII — specifically the absence of graphics capabilities on most terminals — gave birth to an entire creative tradition.

ASCII art emerged in the 1960s and 1970s as programmers and terminal users discovered that the printable characters could be arranged spatially to suggest images. The relative density of characters like @, #, W, M, ., and space allowed for the illusion of shading and form. Early examples were purely functional — flowcharts and diagrams in documentation that could only be transmitted as text. But by the 1980s, ASCII art had become a genuine folk art form, practiced in bulletin board systems (BBS), email signatures, and Usenet posts.

The tradition produced iconic works: the famous "trollface" and other early internet meme imagery was rendered in ASCII before image formats were widespread. ANSI art extended the tradition by using the color capabilities of the ANSI escape codes, creating elaborate full-color images that could be displayed on text terminals.

This culture foreshadowed the emoji era — the human desire to embed expressive images in text predates Unicode by decades.

ASCII's Displacement — and Survival

By the 1980s, ASCII's 128-character limit was visibly insufficient for global computing. IBM introduced EBCDIC (Extended Binary Coded Decimal Interchange Code) for its mainframes — a competing 8-bit scheme that remains in use on IBM mainframes today, a deliberate compatibility break that still complicates mainframe data exchange.

More consequentially, the ISO 8859 series of standards created a family of 8-bit extensions to ASCII, each covering a different regional script: ISO 8859-1 (Latin-1) for Western European languages, ISO 8859-5 for Cyrillic, ISO 8859-6 for Arabic, and so on. These preserved the lower 128 ASCII positions exactly, which meant any ASCII-compatible software could handle ASCII content from any ISO 8859 document — but mixing encodings in a single document was impossible.

Microsoft Windows added its own "Windows code pages" (like Windows-1252, often mislabeled as Latin-1 but subtly different) and the fragmentation deepened.

Unicode, first published in 1991, was the decisive response to this chaos. And critically, Unicode made a choice that guaranteed ASCII's permanence: Unicode code points U+0000 through U+007F are identical to ASCII. Every ASCII character maps to the same code point in Unicode. The most common Unicode encoding, UTF-8, was specifically designed so that any pure ASCII file is also a valid UTF-8 file, byte-for-byte identical.

This backward compatibility was not accidental — it was the political and technical key to Unicode adoption. The installed base of ASCII text, source code, and configuration files could be incorporated into the Unicode world without modification.

ASCII's Enduring Legacy

Today, ASCII is invisible in the sense that nobody thinks about it consciously — and visible in the sense that it structures everything we type.

Every Python or JavaScript source file is essentially ASCII text with Unicode extensions. The HTTP protocol communicates in ASCII. Email headers are ASCII. Domain names were historically ASCII (with internationalized domain names as a later addition). The commands you type in a terminal are ASCII.

The 128-character standard designed for teletype machines in 1963 became, through Unicode's deliberate backward compatibility, the permanent foundation of the world's largest character encoding system. The original committee could not have imagined text messages, web pages, or emoji — but the code points they assigned to 'A', 'a', '0', and '@' are still exactly where they put them.

Explore the full ASCII character table, including every control character and printable symbol, in our Symbol Table tool.

Key Dates

Year	Event
1870	Baudot 5-bit telegraph code developed
1960	ASA X3.2 committee formed to standardize character codes
1963	ASCII first published (ASA X3.4-1963)
1967	Major revision adds lowercase letters, final form established
1968	ASCII mandated for all US government computers
1981	IBM PC ships with ASCII-compatible character set
1991	Unicode 1.0 published, preserving ASCII as U+0000–U+007F
1993	UTF-8 designed with ASCII backward compatibility
2008	UTF-8 surpasses ASCII as most common web encoding

Next in Series: The fragmentation that ASCII's limitations caused — dozens of incompatible regional encodings — eventually forced a reckoning. Read how a small group at Xerox and Apple set out to fix it in The History of Unicode: From Babel to a Universal Character Set.