SymbolFYI

Common Locale Data Repository (CLDR)

Unicode Standard
Definisi

A project providing locale-specific formatting rules for dates, currencies, and language names used worldwide.

What Is CLDR?

The Common Locale Data Repository (CLDR) is a project maintained by the Unicode Consortium that provides the world's largest and most widely used collection of locale-specific data for software internationalization (i18n). CLDR supplies the data that enables applications to correctly format dates, times, numbers, currencies, sort orders, and measurement units for users in any language, region, and cultural context.

CLDR data underpins the internationalization infrastructure of essentially every major platform: iOS and macOS, Android, Windows, Google services, IBM ICU, and most modern programming language i18n libraries all draw on CLDR.

What CLDR Contains

Number Formatting

How numbers are formatted varies dramatically by locale:

Locale Number Formatted
en-US 1234567.89 1,234,567.89
de-DE 1234567.89 1.234.567,89
fr-FR 1234567.89 1 234 567,89
hi-IN 1234567.89 12,34,567.89

Date and Time Formatting

Date format order, month/weekday names, era names, time separator characters, and 12 vs. 24 hour conventions all come from CLDR.

Currency Formatting

Currency symbol placement, spacing, and the number of decimal digits per currency code (e.g., JPY uses 0 decimal places, KWD uses 3).

Plural Rules

Languages differ in how they categorize noun plurality. English has two categories (one/other). Arabic has six (zero/one/two/few/many/other). Russian uses a different three-way system. CLDR's plural rules data is essential for grammatically correct message formatting.

Collation (Sort Order)

Alphabetical sort order varies by language: Swedish sorts Å after Z, not next to A; Traditional Chinese has radical-stroke sort order; German has a special phonebook sort that expands Ä to AE.

Transliterations

CLDR includes transform rules for converting text between scripts, such as converting Cyrillic to Latin, Devanagari to Latin (romanization), or Japanese kana to romaji.

Using CLDR Data in Code

// The Intl API in JavaScript is backed by CLDR data

// Number formatting
const formatter = new Intl.NumberFormat('de-DE', {
  style: 'currency',
  currency: 'EUR'
});
console.log(formatter.format(1234567.89));  // '1.234.567,89 €'

// Date formatting
const dateFormatter = new Intl.DateTimeFormat('ja-JP', {
  year: 'numeric',
  month: 'long',
  day: 'numeric'
});
console.log(dateFormatter.format(new Date('2024-09-01'))); // '2024年9月1日'

// Plural rules
const plural = new Intl.PluralRules('ar');  // Arabic
console.log(plural.select(0));   // 'zero'
console.log(plural.select(1));   // 'one'
console.log(plural.select(2));   // 'two'
console.log(plural.select(5));   // 'few'
console.log(plural.select(11));  // 'many'
# Python: babel library uses CLDR data
# pip install babel
from babel.numbers import format_currency
from babel.dates import format_date
from datetime import date

print(format_currency(1234567.89, 'EUR', locale='de_DE'))  # '1.234.567,89\xa0€'
print(format_currency(1234567.89, 'USD', locale='en_US'))  # '$1,234,567.89'
print(format_date(date(2024, 9, 1), format='long', locale='ja_JP'))  # '2024年9月1日'

CLDR Release Cycle

CLDR releases follow a semi-annual schedule (approximately March and September), synchronized roughly with Unicode Standard releases. Each CLDR release contains data for 500+ locales. Community contributions of locale data are accepted through the Survey Tool at cldr.unicode.org, where native speakers can review and correct data for their language.

CLDR Locale Identifiers

CLDR uses BCP 47 language tags as locale identifiers: en-US (English, United States), zh-Hant-TW (Traditional Chinese, Taiwan), sr-Latn (Serbian in Latin script). These tags combine ISO 639 language codes, ISO 15924 script codes, and ISO 3166 region codes into a structured hierarchy.

Simbol Terkait

Istilah Terkait

Panduan Terkait