[xep-support] Re: RTL script, Arabic-Indic figures and hyphen

From: Alexey Gagarinov <agagarinov@renderx.com>
Date: Sun Aug 19 2012 - 22:27:58 PDT

Hi Benoit,

> The text inside the fo:inline element is not displayed correctly in a PDF rendered by XEP 4.19.
> * I expect, reading right to left: "Arabic-word space Arabic-0 hyphen Arabic-1"
> * but I see in the PDF: "Arabic-word space Arabic-1 hyphen Arabic-0"
> So, 0 and 1 are switched. Other Unicode-compliant softwares display the text on screen as I expect it.

It was a bug in XEP's BIDI algorithm -- FIXED.

> If I replace the hyphen by an Arabic letter, then the figures are in the expected order, but a space or
> comma also give an incorrect order.

I believe, you're wrong about the comma.
XEP (both 4.19 and 4.20 versions) displays the correct order.

The comma (&#x002C;) belongs to CS (Common Number Separator) class in terms of BIDI algorithm.
According to BIDI algorithm (UAX#9, W4 rule):
"A single common separator between two numbers of the same type changes to that type."

In other words, you should treat a single common separator between 2 Arabic (or European) numbers as a part
of that entire number.
I guess it's more obvious for European numbers -- 1,000,000 is a single number, but the same is applied for
Arabic numbers.
Note: '.' is also a CS class char, so 0.1 and 1,000,000 are both single numbers. 0,1 is also is a single
number according to Unicode BIDI algorithm.

   Alexey Gagarinov

(*) To unsubscribe, please visit http://lists.renderx.com/mailman/options/xep-support
(*) By using the Service, you expressly agree to these Terms of Service http://w
Received on Sun Aug 19 22:18:13 2012

This archive was generated by hypermail 2.1.8 : Sun Aug 19 2012 - 22:18:18 PDT