SymbolFYI

Bidirectional Text (Bidi)

Unicode Standard
Definição

Text that mixes left-to-right and right-to-left writing directions, requiring the Unicode Bidirectional Algorithm for proper display.

What Is Bidirectional Text?

Bidirectional text (often abbreviated as bidi) refers to text that contains a mixture of left-to-right (LTR) and right-to-left (RTL) content. Languages such as Arabic, Hebrew, Persian, and Urdu are written right-to-left, while most European languages and CJK scripts are written left-to-right. When these appear together in a single paragraph — for example, an English sentence mentioning an Arabic name, or a Hebrew web page containing a URL — the rendering system must determine the correct visual order for each character.

Unicode addresses this with the Unicode Bidirectional Algorithm (UBA), defined in Unicode Standard Annex #9.

The Bidirectional Algorithm

The UBA works by analyzing the Bidi Category property of each character and applying a set of rules to determine the visual ordering of characters in a line. Key Bidi categories include:

Category Code Example
Strong Left L Latin letters, CJK
Strong Right R Hebrew letters
Arabic Letter AL Arabic letters
Weak EN, AN, ET Digits, currency signs
Neutral WS, ON Spaces, punctuation
Explicit LRE, RLE, PDF Formatting characters

Bidi Control Characters

Unicode provides explicit formatting characters to override or control bidi behavior:

Character Code Point Purpose
LRM U+200E Left-to-Right Mark
RLM U+200F Right-to-Left Mark
LRE U+202A Left-to-Right Embedding
RLE U+202B Right-to-Left Embedding
LRO U+202D Left-to-Right Override
RLO U+202E Right-to-Left Override
PDF U+202C Pop Directional Formatting
LRI U+2066 Left-to-Right Isolate
RLI U+2067 Right-to-Left Isolate
FSI U+2068 First Strong Isolate
PDI U+2069 Pop Directional Isolate

HTML and CSS Bidi

<!-- HTML dir attribute -->
<p dir="rtl">مرحبا بالعالم</p>
<p dir="ltr">Hello, world</p>
<p dir="auto">Content with auto-detected direction</p>

<!-- CSS direction and unicode-bidi -->
<style>
.rtl-block {
  direction: rtl;
  unicode-bidi: embed;
}
</style>

<!-- bdi element for user-generated content -->
<p>Posted by <bdi>القارئ العربي</bdi>: Great article!</p>

The Bidi Spoofing Attack

Bidi control characters pose a significant security risk known as the Trojan Source attack (CVE-2021-42574). An attacker can embed RLO (U+202E) inside a source code comment or string literal to make code appear to do something different from what it actually executes. For example:

/* A[U+202E]ecafrus */ access_level = "user";

This could visually display as /* Asurface */ while the actual code is different. Code review tools and IDE security plugins now highlight suspicious bidi characters. Developers should strip or validate unexpected bidi control characters from user-submitted code and file names.

Símbolos relacionados

Termos relacionados

Ferramentas relacionadas

Guias relacionados