SymbolFYI

Input Method Editors (IME): How CJK Text Input Works

How-To Keyboard Mastery 十月 15, 2024

Japanese, Chinese, and Korean writing systems contain thousands of characters — far more than any keyboard could represent with dedicated keys. Input Method Editors (IMEs) solve this problem through a conversion workflow: you type phonetic or structural codes using a standard keyboard, and the IME converts your input into the correct characters. Understanding how this works makes you a better developer and a more effective user.

Why IME Exists

The fundamental problem is one of scale. A typical Japanese text requires access to: - 46 hiragana characters - 46 katakana characters - ~2,136 everyday kanji (joyo kanji) - Thousands more kanji in specialized use - Latin characters, numbers, punctuation

A Chinese input user needs access to the ~6,500 characters in the GB2312 standard, or up to 70,000+ in full Unicode CJK coverage. Korean hangul, while having only about 40 base letters, produces ~11,000 possible syllable blocks through combination.

Standard 104-key keyboards simply cannot have a key for each character. IME inserts an intelligent conversion layer between the keyboard and the text field.

The IME Workflow

Every IME follows the same basic pipeline, regardless of language:

Keyboard input → Pre-edit buffer → Candidate selection → Committed text
  1. Pre-edit phase: You type phonetic or structural keys. The input appears in an "underlined" or highlighted state — it has not been committed to the document yet. This is called the composition string or pre-edit string.

  2. Candidate selection: When you pause or press a trigger key, the IME shows a list of candidate characters or words that match your input. You select from the list.

  3. Commit: Once you confirm a selection (usually with Enter or by selecting a candidate number), the pre-edit text is replaced by the final character(s) and inserted into the document.

The visual appearance of the pre-edit string — and the candidate list UI — is controlled by the IME software, not by the application.

Japanese IME

Japanese has three writing systems used simultaneously: hiragana (phonetic, for native words), katakana (phonetic, for foreign loanwords), and kanji (Chinese-derived logographic characters). Japanese IME handles the conversion between them.

The Romaji-to-Kana-to-Kanji Pipeline

The standard Japanese input method uses romaji (Romanized Japanese) as the entry format:

  1. Type romaji: Type nihon on the keyboard
  2. Automatic kana conversion: The IME converts to にほん (hiragana) as you type
  3. Kanji conversion: Press Space. The IME shows candidates: 日本, 二本, 煮本, etc.
  4. Select candidate: Press Space again to cycle, or press the candidate number
  5. Commit: Press Enter to confirm and insert the text

Romaji Conversion Rules

The IME follows specific romaji-to-kana mappings:

Romaji Kana
ka
ki
ku
ke
ko
sha しゃ
chi
tsu
nn ん (end of syllable)
tt doubles next consonant (っ)

Double letters indicate geminate consonants: kk → っk, ss → っs.

Input Modes

Japanese IME typically operates in multiple modes accessible via key or toolbar: - Hiragana mode: romaji converts to hiragana (default) - Katakana mode: romaji converts to katakana (for loanwords: ko-hi- → コーヒー) - Alphanumeric mode: type regular Latin characters - Full-width alphanumeric: Latin characters at full CJK character width (ABCD)

Windows Japanese IME

On Windows, the Japanese IME is built-in. It appears in the system tray as 「A」(alphanumeric) or 「あ」(hiragana). Toggle between modes with the Alt+` or the 半角/全角 (hankaku/zenkaku) key.

On macOS, the built-in Hiragana keyboard follows the same romaji workflow. Press Enter to commit, Escape to cancel and revert to the romaji you typed.

Chinese IME

Chinese input has two main approaches depending on the language variant:

Mandarin (Pinyin Input)

Pinyin input is the dominant method for Simplified Chinese (Mainland China). You type standard pinyin romanization and the IME converts to Chinese characters.

  1. Type pinyin: Type zhongwen (中文)
  2. Candidates appear: The IME shows a candidate list of characters and words matching this pronunciation
  3. Select: Press the number key (1–9) or navigate with arrow keys
  4. Commit: Confirm with Enter

Modern Chinese IMEs are highly intelligent. They use statistical language models to predict the most likely character based on context, so common words like 中文 (Chinese language) or 你好 (hello) appear as the first candidate without any selection effort — just press Space or Enter.

Word-by-word vs. sentence-level input: Modern IMEs like Microsoft Pinyin and Google Pinyin support sentence-level input — type an entire sentence in pinyin and the IME converts the whole sentence at once, dramatically reducing selection effort.

Cantonese and Jyutping

Cantonese input uses Jyutping romanization (similar to Pinyin but for Cantonese phonology). The workflow is identical: type jyutping → candidate list → select → commit.

Traditional Chinese (Bopomofo / Zhuyin)

Taiwan's traditional input method uses Bopomofo (注音符號), a phonetic script typed with a special key layout. On standard keyboards:

Bopomofo symbol Key
ㄅ (b) 1
ㄆ (p) q
ㄇ (m) a
ㄈ (f) z
ㄉ (d) 2

Many modern Taiwanese users also use Cangjie (倉頡) — a structural input method based on character components — or simply pinyin with traditional output.

Stroke Input

For users who don't know the pronunciation of a character, stroke input lets you specify characters by their strokes: horizontal (横), vertical (竖), left-falling (撇), right-falling (捺), and turning (折). This is common on mobile but also available on desktop.

Korean Hangul Composition

Korean uses a fundamentally different composition model from CJK IMEs. Hangul characters are composed on the fly from individual letters (jamo) using combinatorial rules — no candidate list is needed for most text.

How Hangul Syllable Blocks Form

Each Korean syllable is a block combining: - An initial consonant (초성, choseong) - A vowel (중성, jungseong) - An optional final consonant (종성, jongseong)

When you type, the IME builds syllable blocks in real time:

You type Result displayed
r
r, k 가 (ㄱ + ㅏ)
r, k, s 각 (ㄱ + ㅏ + ㄱ)
r, k, s, e 가게 (ㄱ+ㅏ+ㄱ, ㄱ+ㅔ) — the final ㄱ becomes the initial of the next syllable

The keyboard layout matters: QWERTY hangul maps ㄱ to r, ㄴ to s, ㄷ to e, and so on. There's also the Dubeolsik (두벌식) standard layout, where the left hand types consonants and the right hand types vowels.

No Candidate List for Most Text

Because hangul syllable composition is deterministic (one pronunciation = one written form), there's no ambiguity requiring a candidate selection step. You type and characters appear immediately.

Hanja (Chinese-derived characters occasionally used in Korean) still requires candidate selection. Press the Hanja key (or F9) while a syllable is selected to see Hanja candidates.

IME on the Web

IME integration with web applications is a critical topic for front-end developers. The challenge: during the pre-edit phase, text is in the browser's input buffer but not yet "committed" to the DOM value. Handling this incorrectly causes bugs in real-time features.

The compositionstart / compositionend Events

Browsers fire three special events during IME composition:

element.addEventListener('compositionstart', (e) => {
  // User started composing (pressed first key in IME mode)
  // e.data: initial composition string (often empty)
  console.log('Composition started:', e.data);
});

element.addEventListener('compositionupdate', (e) => {
  // Composition string changed (each keystroke during pre-edit)
  console.log('Composition update:', e.data);
});

element.addEventListener('compositionend', (e) => {
  // User committed text (pressed Enter or selected candidate)
  // e.data: the final committed text
  console.log('Composition ended:', e.data);
});

The isComposing Guard Pattern

The most important defensive pattern for IME-aware input handling is checking event.isComposing:

element.addEventListener('keydown', (e) => {
  // WRONG: fires during pre-edit, causing double-submission or premature action
  if (e.key === 'Enter') {
    handleSubmit();
  }
});

element.addEventListener('keydown', (e) => {
  // CORRECT: ignore keydown events during IME composition
  if (e.isComposing || e.keyCode === 229) return;

  if (e.key === 'Enter') {
    handleSubmit();
  }
});

The keyCode === 229 check is a legacy fallback: older browsers used keyCode 229 for all keydown events during composition (the "Process" key code). Modern browsers set isComposing reliably, but the legacy check provides backwards compatibility.

Bug: Enter key triggers form submission during candidate selection

Japanese users frequently press Enter to confirm kanji selection — but if the form listens to Enter on keydown without an isComposing check, it submits the form before the user has finished composing their text.

Fix: Guard all Enter key handlers with if (e.isComposing) return;

Bug: Real-time search fires for each composition keystroke

A search-as-you-type feature that fires on every input event will send a request for , then にほ, then にほん before the user has even confirmed they want the kanji form. This wastes requests and causes confusing behavior.

Fix: Only process the input value after compositionend, or debounce heavily and ignore events where isComposing is true.

Bug: Character counter counts pre-edit characters

A character counter that reads element.value directly during composition will count the unfinished romaji in the pre-edit buffer, giving a wrong count until composition ends.

Fix: Listen for compositionend in addition to input, and/or check isComposing before updating the counter.

React and IME

React's synthetic event system adds complexity. The onChange event in React fires on every input event, including during composition. For proper IME handling:

function IMEAwareInput() {
  const [value, setValue] = useState('');
  const isComposingRef = useRef(false);

  return (
    <input
      value={value}
      onCompositionStart={() => { isComposingRef.current = true; }}
      onCompositionEnd={(e) => {
        isComposingRef.current = false;
        // Optionally trigger your action here with final value
      }}
      onChange={(e) => {
        setValue(e.target.value);
        // Only trigger real-time actions if not composing
        if (!isComposingRef.current) {
          handleChange(e.target.value);
        }
      }}
      onKeyDown={(e) => {
        if (e.nativeEvent.isComposing) return;
        if (e.key === 'Enter') handleSubmit();
      }}
    />
  );
}

Note that e.isComposing in React's synthetic event may not always be reliable — use e.nativeEvent.isComposing for the underlying DOM event.

IME Software Overview

Windows

IME Languages Notes
Microsoft IME Japanese, Chinese (Simplified/Traditional), Korean Built-in, high quality
Google Japanese Input Japanese High accuracy, large vocabulary
ATOK Japanese Professional-grade, subscription
Sogou Pinyin Chinese Simplified Popular in China, advanced prediction

macOS

IME Languages Notes
Built-in Hiragana Japanese Good quality, seamlessly integrated
Built-in Pinyin Chinese Simplified Decent basic input
Built-in Bopomofo Chinese Traditional Full Zhuyin support
Built-in Korean Korean Supports Dubeolsik and Sebeolsik

Linux

IME Framework Popular Engines Notes
IBus ibus-anthy (Japanese), ibus-rime (Chinese), ibus-hangul (Korean) Default on GNOME
Fcitx5 fcitx5-mozc (Japanese), fcitx5-chinese-addons, fcitx5-hangul KDE preferred

Testing IME Compatibility

If you build web applications that may be used by CJK audiences, test with actual IME:

  1. Enable a Japanese or Chinese IME on your test machine
  2. Find a text input in your application
  3. Type some romaji and observe the pre-edit state
  4. Press Space to open the candidate list
  5. Select a candidate with a number key
  6. Check that your app behaved correctly — no duplicate submissions, no premature processing, correct character counting

This workflow reveals the majority of IME-related bugs within minutes.


Next in Series: Dead keys are the mechanism behind accent input on many keyboards — and they work differently from IME composition. See Dead Keys: How to Type Accented Characters Without a Special Keyboard for a complete explanation.

相关符号

相关术语

相关工具

更多指南