Re: [xep-support] unexpected bidi behaviour with english text in right-to-left mode

From: G. Ken Holman <gkholman@CraneSoftwrights.com>
Date: Wed May 12 2010 - 08:08:40 PDT

At 2010-05-12 16:17 +0200, Gerald Wiesinger wrote:
>the data comes from one and the same data source and its either lr or rl or
>mixed.
>thus, we can not implement embedding levels. this would require analyzing
>the content and split it into fragments which is almost the last on the
>list from a software design point of view.

Unless there is a bug in XEP, then I am not successfully
communicating my thoughts to you. Note that in my message I said
that by "two sources" they can both be from the same input file just
from different places.

>the same data on the database can be accessed using WinSQL or QMF for
>Windows (which has some other bugs) and there the data is shown correctly,
>even notepad does it right.

Fine ... I know these tools implement the Unicode bi-directional algorithm.

It wasn't clear to me if you are using your stylesheet to mix your
input data into your result (which is a typical source of *exactly*
the problem you described), or if you are simply reporting that a
string of Unicode characters known to work in other tools simply
doesn't work properly in XEP (which is typical of a rendering bug and
not a stylesheet problem).

I got the impression you were injecting English addresses into the
Hebrew result. My mistake.

>another problem occurs when the english phrase is closed with a closing
>bracket which becomes a open bracket and is moved to the left most
>position.
>
>e.g.
>first part of the text (embraced text)
>
>gets printed as
>
>(first part of the text (embraced text
>
>because there are unicode based tools and applications out which treat this
>information correctly i am tempted to consider this as a bug.

If you are not using your stylesheet to combine different parts of
your source, and you are simply streaming the content of your source
onto the page, then I agree it is a rendering bug.

You'll note in my message I said "When a string of Unicode characters
from a single source is used, there is typically no need at all to
think of this." and I stand by that. Thus, if your string of
characters in XSL-FO was the same string of characters in your input,
you should be comfortable in reporting what you see as a rendering problem.

Full stops and parentheses are examples of "weak-direction"
characters and as such are influenced by their neighbouring
characters according to the Unicode Bidirectionality Algorithm which
has to be implemented by the XSL-FO formatter.

I'm sure the XEP developers would welcome a concise example of this
for their review.

. . . . . . . . . . . Ken

>____________________________________
>Best Regards
>Gerald Wiesinger
>
>
>
>
>
>
> "G. Ken Holman"
> <gkholman@CraneSo
> ftwrights.com> To
> Sent by: xep-support@renderx.com
> owner-xep-support cc
> @renderx.com
> Subject
> Re: [xep-support] unexpected bidi
> 12.05.2010 14:27 behaviour with english text in
> right-to-left mode
>
> Please respond to
> xep-support@rende
> rx.com
>
>
>
>
>
>
>At 2010-05-12 10:10 +0200, Gerald Wiesinger wrote:
> >we are using a common stylesheet to create documents in various languages,
> >one of them: hebrew.
> >
> >the data is within the XML file and can be any language, pure
> >left-to-right, pure right-to-left and mixed with numbers and text.
> >
> >the only control we use is the writing mode which we set on the highest
> >possible container level for hebrew documents to rl-tb and for the others
> >lr-tb.
> >however, the data which gets fed into the rendering process can be any and
> >we dont know and should not know or understand the content.
> >
> >the problem now is that in case we receive e.g. a english address for a
> >hebrew (right-to-left) document we find some (not all) special characters
> >like commas, full stops in the wrong sequence.
> >
> >example:
> >
> >the address:
> >19 LEONARDO DE-VINZI ST.
> >TEL AVIV.
> >
> >gets printed as
> >.19 LEONARDO DE-VINZI ST
> >.TEL AVIV
> >
> >i understand that the change to the lr writing mode would solve this
>
>I don't believe that is correct. I believe all the specification
>requires is that you embed the text of different sources using the
>following:
>
> ...content from source A...
> <bidi-override unicode-bidi="embed">
> ...content from source B...
> </bidi-override>
> ...content from source A...
>
>The embedding protects Unicode characters from being influenced by
>surrounding characters.
>
>There is a section of my XSL-FO book and class that discusses this.
>
> >but we can not modify it as the language of the content is unknown.
>
>That's okay ... XSL-FO processors are supposed to recognize Unicode
>characters and follow the Unicode bi-directional algorithm which
>accommodates embedding.
>
>When a string of Unicode characters from a single source is used,
>there is typically no need at all to think of this. But when mixing
>characters from different sources (say from the input file and the
>stylesheet, or from two places in the input files) embedding solves
>many problems simply.
>
> >the occurrence of this is imminent for full-stops after latin characters
>or
> >numbers.
>
>I hope this helps.
>
>. . . . . . . . . . . Ken

--
XSLT/XQuery training:   after http://XMLPrague.cz 2011-03-28/04-01
Vote for your XML training:   http://www.CraneSoftwrights.com/f/i/
Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/f/
G. Ken Holman                 mailto:gkholman@CraneSoftwrights.com
Male Cancer Awareness Nov'07  http://www.CraneSoftwrights.com/f/bc
Legal business disclaimers:  http://www.CraneSoftwrights.com/legal
-------------------
(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
in the body of the message to majordomo@renderx.com from the address
you are subscribed from.
(*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/terms-of-service.html
Received on Wed May 12 08:19:47 2010

This archive was generated by hypermail 2.1.8 : Wed May 12 2010 - 08:19:48 PDT