Urdu Punctuation and Special Characters Explained

Technical Reference • 7 min read

Urdu punctuation looks deceptively similar to English punctuation at first glance, several marks are visually mirrored or rotated versions of their Latin counterparts, but treating them as interchangeable causes real problems in both typography and code. This reference covers the punctuation marks specific to Urdu and Arabic-script writing, why they exist, and where they trip people up.

The Arabic Comma (،)

The Urdu comma, ، (Unicode U+060C), is visually a mirror image of the standard English comma, rotated to suit right-to-left reading direction. Using a regular English comma in Urdu text is a common mistake that looks subtly wrong to native readers even though it doesn't break anything functionally, since most fonts will still render the unrotated comma. Beyond the visual mismatch, mixing comma types can also cause subtle text-processing bugs if code is searching specifically for one comma character but the text contains the other.

The Arabic Question Mark (؟)

Similarly, ؟ (U+061F) is a horizontally mirrored version of the English question mark, designed to sit naturally at the end of a right-to-left sentence. As with the comma, using a standard "?" in otherwise correct Urdu text is grammatically understood but visually inconsistent, and it's one of the more common small errors that creeps into bilingual documents and apps when developers copy-paste English punctuation into Urdu strings without checking for the Arabic-specific equivalent.

The Arabic Semicolon (؛)

The Arabic semicolon, ؛ (U+061B), follows the same mirrored-rotation pattern. It's used less frequently in everyday Urdu writing than the comma or question mark, appearing mostly in more formal or literary contexts, but it remains part of the standard Unicode Arabic punctuation block and should still be used over a standard semicolon in fully correct Urdu typesetting.

The Urdu Full Stop

Unlike the comma, semicolon, and question mark, Urdu generally uses the standard period (.) for sentence-ending full stops in everyday writing, rather than a mirrored character, since a period's symmetric dot shape doesn't have a meaningful "direction" to mirror. Some traditional and religious texts use a separate Arabic full stop character, ۔ (U+06D4), particularly common in Quranic and classical text formatting, but in modern everyday Urdu typing and printing, the plain period is overwhelmingly more common.

Urdu-Indic Digits

Urdu has its own set of digit characters, sometimes called Extended Arabic-Indic digits (Unicode block starting around U+06F0), which look different from both standard Western Arabic numerals (0-9) and the Eastern Arabic-Indic digits used in many Arabic-speaking countries. In practice, most everyday Urdu writing, including newspapers, signage, and digital interfaces, uses standard Western numerals (0-9) rather than the Urdu-specific digit set, which is now mostly seen in formal typesetting, religious texts, or stylistic design choices rather than daily use. This is worth knowing if you're building a system that needs to recognize numbers within Urdu text: don't assume only Western digits will appear, but also don't assume Urdu-Indic digits are common in casual content.

Why This Matters for Developers

If you're building a form validator, a text parser, or a search feature that needs to handle Urdu input, hardcoding only the Latin punctuation set (comma, question mark, semicolon) will silently fail to match correctly-typed Urdu text using its proper Arabic-script punctuation. The safer approach is to recognize both the Latin and Arabic-script equivalents wherever your code splits, validates, or searches text, since real-world Urdu input will likely contain a mix of both depending on which keyboard or input method the user typed with. Our Unicode Inspector is a quick way to check exactly which punctuation variant appears in any given piece of text.