Re: [xep-support] unexpected bidi behaviour with english text in right-to-left mode

From: G. Ken Holman <gkholman@CraneSoftwrights.com>
Date: Wed May 12 2010 - 05:27:27 PDT

At 2010-05-12 10:10 +0200, Gerald Wiesinger wrote:
>we are using a common stylesheet to create documents in various languages,
>one of them: hebrew.
>
>the data is within the XML file and can be any language, pure
>left-to-right, pure right-to-left and mixed with numbers and text.
>
>the only control we use is the writing mode which we set on the highest
>possible container level for hebrew documents to rl-tb and for the others
>lr-tb.
>however, the data which gets fed into the rendering process can be any and
>we dont know and should not know or understand the content.
>
>the problem now is that in case we receive e.g. a english address for a
>hebrew (right-to-left) document we find some (not all) special characters
>like commas, full stops in the wrong sequence.
>
>example:
>
>the address:
>19 LEONARDO DE-VINZI ST.
>TEL AVIV.
>
>gets printed as
>.19 LEONARDO DE-VINZI ST
>.TEL AVIV
>
>i understand that the change to the lr writing mode would solve this

I don't believe that is correct. I believe all the specification
requires is that you embed the text of different sources using the following:

    ...content from source A...
    <bidi-override unicode-bidi="embed">
      ...content from source B...
    </bidi-override>
    ...content from source A...

The embedding protects Unicode characters from being influenced by
surrounding characters.

There is a section of my XSL-FO book and class that discusses this.

>but we can not modify it as the language of the content is unknown.

That's okay ... XSL-FO processors are supposed to recognize Unicode
characters and follow the Unicode bi-directional algorithm which
accommodates embedding.

When a string of Unicode characters from a single source is used,
there is typically no need at all to think of this. But when mixing
characters from different sources (say from the input file and the
stylesheet, or from two places in the input files) embedding solves
many problems simply.

>the occurrence of this is imminent for full-stops after latin characters or
>numbers.

I hope this helps.

. . . . . . . . . . . Ken

--
XSLT/XQuery training:   after http://XMLPrague.cz 2011-03-28/04-01
Vote for your XML training:   http://www.CraneSoftwrights.com/f/i/
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/f/
G. Ken Holman                 mailto:gkholman@CraneSoftwrights.com
Male Cancer Awareness Nov'07  http://www.CraneSoftwrights.com/f/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal
-------------------
(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
in the body of the message to majordomo@renderx.com from the address
you are subscribed from.
(*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/terms-of-service.html
Received on Wed May 12 05:49:46 2010

This archive was generated by hypermail 2.1.8 : Wed May 12 2010 - 05:49:53 PDT