Re: [xep-support] unexpected bidi behaviour with english text in right-to-left mode

From: Gerald Wiesinger <Gerald_Wiesinger@at.ibm.com>
Date: Mon May 17 2010 - 07:02:06 PDT

i think you wrote what best describes whats happening:

> a string of Unicode characters known to work in other tools simply
> doesn't work properly in XEP

we use one and the same stylesheet for latin character and hebrew character
documents. but for hebrew we use a switch on flow level to change the
writing mode.
this to make e.g. the 1st column of a table to be the rightmost for hebrew
and the leftmost fo the others; and make right alignments to left
alignments and vice versa.

information like the addresses are always supplied in the same XML nodes
and from the stylesheet perspective we can not distinguish. we simply rely
on the renderer.

a misunderstanding was not my intention ;-)

____________________________________
Best Regards
Gerald Wiesinger

                                                                           
             "G. Ken Holman"
             <gkholman@CraneSo
             ftwrights.com> To
             Sent by: xep-support@renderx.com
             owner-xep-support cc
             @renderx.com
                                                                   Subject
                                       Re: [xep-support] unexpected bidi
             12.05.2010 17:08 behaviour with english text in
                                       right-to-left mode
                                                                           
             Please respond to
             xep-support@rende
                  rx.com
                                                                           
                                                                           

At 2010-05-12 16:17 +0200, Gerald Wiesinger wrote:
>the data comes from one and the same data source and its either lr or rl
or
>mixed.
>thus, we can not implement embedding levels. this would require analyzing
>the content and split it into fragments which is almost the last on the
>list from a software design point of view.

Unless there is a bug in XEP, then I am not successfully
communicating my thoughts to you. Note that in my message I said
that by "two sources" they can both be from the same input file just
from different places.

>the same data on the database can be accessed using WinSQL or QMF for
>Windows (which has some other bugs) and there the data is shown correctly,
>even notepad does it right.

Fine ... I know these tools implement the Unicode bi-directional algorithm.

It wasn't clear to me if you are using your stylesheet to mix your
input data into your result (which is a typical source of *exactly*
the problem you described), or if you are simply reporting that a
string of Unicode characters known to work in other tools simply
doesn't work properly in XEP (which is typical of a rendering bug and
not a stylesheet problem).

I got the impression you were injecting English addresses into the
Hebrew result. My mistake.

>another problem occurs when the english phrase is closed with a closing
>bracket which becomes a open bracket and is moved to the left most
>position.
>
>e.g.
>first part of the text (embraced text)
>
>gets printed as
>
>(first part of the text (embraced text
>
>because there are unicode based tools and applications out which treat
this
>information correctly i am tempted to consider this as a bug.

If you are not using your stylesheet to combine different parts of
your source, and you are simply streaming the content of your source
onto the page, then I agree it is a rendering bug.

You'll note in my message I said "When a string of Unicode characters
from a single source is used, there is typically no need at all to
think of this." and I stand by that. Thus, if your string of
characters in XSL-FO was the same string of characters in your input,
you should be comfortable in reporting what you see as a rendering problem.

Full stops and parentheses are examples of "weak-direction"
characters and as such are influenced by their neighbouring
characters according to the Unicode Bidirectionality Algorithm which
has to be implemented by the XSL-FO formatter.

I'm sure the XEP developers would welcome a concise example of this
for their review.

. . . . . . . . . . . Ken

>____________________________________
>Best Regards
>Gerald Wiesinger
>
>
>
>
>
>
> "G. Ken Holman"
> <gkholman@CraneSo
> ftwrights.com>
To
> Sent by: xep-support@renderx.com
> owner-xep-support
cc
> @renderx.com
>
Subject
> Re: [xep-support] unexpected bidi
> 12.05.2010 14:27 behaviour with english text in
> right-to-left mode
>
> Please respond to
> xep-support@rende
> rx.com
>
>
>
>
>
>
>At 2010-05-12 10:10 +0200, Gerald Wiesinger wrote:
> >we are using a common stylesheet to create documents in various
languages,
> >one of them: hebrew.
> >
> >the data is within the XML file and can be any language, pure
> >left-to-right, pure right-to-left and mixed with numbers and text.
> >
> >the only control we use is the writing mode which we set on the highest
> >possible container level for hebrew documents to rl-tb and for the
others
> >lr-tb.
> >however, the data which gets fed into the rendering process can be any
and
> >we dont know and should not know or understand the content.
> >
> >the problem now is that in case we receive e.g. a english address for a
> >hebrew (right-to-left) document we find some (not all) special
characters
> >like commas, full stops in the wrong sequence.
> >
> >example:
> >
> >the address:
> >19 LEONARDO DE-VINZI ST.
> >TEL AVIV.
> >
> >gets printed as
> >.19 LEONARDO DE-VINZI ST
> >.TEL AVIV
> >
> >i understand that the change to the lr writing mode would solve this
>
>I don't believe that is correct. I believe all the specification
>requires is that you embed the text of different sources using the
>following:
>
> ...content from source A...
> <bidi-override unicode-bidi="embed">
> ...content from source B...
> </bidi-override>
> ...content from source A...
>
>The embedding protects Unicode characters from being influenced by
>surrounding characters.
>
>There is a section of my XSL-FO book and class that discusses this.
>
> >but we can not modify it as the language of the content is unknown.
>
>That's okay ... XSL-FO processors are supposed to recognize Unicode
>characters and follow the Unicode bi-directional algorithm which
>accommodates embedding.
>
>When a string of Unicode characters from a single source is used,
>there is typically no need at all to think of this. But when mixing
>characters from different sources (say from the input file and the
>stylesheet, or from two places in the input files) embedding solves
>many problems simply.
>
> >the occurrence of this is imminent for full-stops after latin characters
>or
> >numbers.
>
>I hope this helps.
>
>. . . . . . . . . . . Ken

--
XSLT/XQuery training:   after http://XMLPrague.cz 2011-03-28/04-01
Vote for your XML training:   http://www.CraneSoftwrights.com/f/i/
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/f/
G. Ken Holman                 mailto:gkholman@CraneSoftwrights.com
Male Cancer Awareness Nov'07  http://www.CraneSoftwrights.com/f/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal
-------------------
(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
in the body of the message to majordomo@renderx.com from the address
you are subscribed from.
(*) By using the Service, you expressly agree to these Terms of Service
http://www.renderx.com/terms-of-service.html
-------------------
(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
in the body of the message to majordomo@renderx.com from the address
you are subscribed from.
(*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/terms-of-service.html
Received on Mon May 17 07:27:01 2010

This archive was generated by hypermail 2.1.8 : Mon May 17 2010 - 07:27:08 PDT