What Is Bidirectional Text?
Bidirectional text (often abbreviated as bidi) refers to text that contains a mixture of left-to-right (LTR) and right-to-left (RTL) content. Languages such as Arabic, Hebrew, Persian, and Urdu are written right-to-left, while most European languages and CJK scripts are written left-to-right. When these appear together in a single paragraph — for example, an English sentence mentioning an Arabic name, or a Hebrew web page containing a URL — the rendering system must determine the correct visual order for each character.
Unicode addresses this with the Unicode Bidirectional Algorithm (UBA), defined in Unicode Standard Annex #9.
The Bidirectional Algorithm
The UBA works by analyzing the Bidi Category property of each character and applying a set of rules to determine the visual ordering of characters in a line. Key Bidi categories include:
| Category | Code | Example |
|---|---|---|
| Strong Left | L |
Latin letters, CJK |
| Strong Right | R |
Hebrew letters |
| Arabic Letter | AL |
Arabic letters |
| Weak | EN, AN, ET |
Digits, currency signs |
| Neutral | WS, ON |
Spaces, punctuation |
| Explicit | LRE, RLE, PDF |
Formatting characters |
Bidi Control Characters
Unicode provides explicit formatting characters to override or control bidi behavior:
| Character | Code Point | Purpose |
|---|---|---|
| LRM | U+200E |
Left-to-Right Mark |
| RLM | U+200F |
Right-to-Left Mark |
| LRE | U+202A |
Left-to-Right Embedding |
| RLE | U+202B |
Right-to-Left Embedding |
| LRO | U+202D |
Left-to-Right Override |
| RLO | U+202E |
Right-to-Left Override |
U+202C |
Pop Directional Formatting | |
| LRI | U+2066 |
Left-to-Right Isolate |
| RLI | U+2067 |
Right-to-Left Isolate |
| FSI | U+2068 |
First Strong Isolate |
| PDI | U+2069 |
Pop Directional Isolate |
HTML and CSS Bidi
<!-- HTML dir attribute -->
<p dir="rtl">مرحبا بالعالم</p>
<p dir="ltr">Hello, world</p>
<p dir="auto">Content with auto-detected direction</p>
<!-- CSS direction and unicode-bidi -->
<style>
.rtl-block {
direction: rtl;
unicode-bidi: embed;
}
</style>
<!-- bdi element for user-generated content -->
<p>Posted by <bdi>القارئ العربي</bdi>: Great article!</p>
The Bidi Spoofing Attack
Bidi control characters pose a significant security risk known as the Trojan Source attack (CVE-2021-42574). An attacker can embed RLO (U+202E) inside a source code comment or string literal to make code appear to do something different from what it actually executes. For example:
/* A[U+202E]ecafrus */ access_level = "user";
This could visually display as /* Asurface */ while the actual code is different. Code review tools and IDE security plugins now highlight suspicious bidi characters. Developers should strip or validate unexpected bidi control characters from user-submitted code and file names.