RE: [xep-support] linefeed normalization

From: Victor Mote (vic@portagepub.com)
Date: Mon Apr 05 2004 - 09:32:21 PDT

  • Next message: Jirka Kosek: "Re: [xep-support] linefeed normalization"

     Nikolai Grigoriev wrote:

    > the question of U+2028 is a complicated one. The XSL-FO spec
    > does not constrain its processing in any way. There is no
    > mention that it is subject to normalization, but equally no
    > indication that it is expected to produce a line break at
    > all. The effects of this character are therefore not
    > well-defined: I doubt whether it can be considered a valid linefeed.

    I agree that it is not well-defined.

    > In XEP, we treat U+000A, U+000D, and U+2028 as complete
    > equivalents. (This refers to the data that come to the
    > formatter, after linefeed normalization in the processor).
    > The logic is
    > straightforward: a character is either a linefeed or not;
    > linefeeds terminate lines and are subject to the effects of
    > linefeed-treatment; non-linefeeds do neither of these.

    Ken's distinction between a linefeed and a LINE SEPARATOR is relevant here I
    think. They are different concepts. The fact that linefeed-treatment is
    supposed to *only* affect U+000A, but affects U+2028 in XEP indicates some
    misunderstanding.

    > One can argue if this is a correct behaviour. However, I
    > believe that it is inherently unsafe to rely on Unicode text
    > flow control characters in systems that have their own markup
    > to express the same semantics. There is no reason to use

    I mostly agree with this. However, there really is no semantic in XSL-FO
    that says "force a line break here". It is true that you can say "start a
    new block here", but that really is a different concept.

    > U+2028 or U+2029 if you have explicit paragraph structure set
    > by <fo:block>s; it is risky to mix LRO/RLO/LRE/RLE with

    I think U+2029 really is the same as saying "start a new block", and agree
    that there is no good reason to use it in XSL-FO.

    > fo:bidi-override. If you need explicit line breaks inside
    > non-preformatted text, set a <br/> element in the input XML
    > vocabulary and match it to <fo:block/> in the stylesheet. In
    > this way, your intent is clear to everybody.

    There really is a good reason to not take this approach, unless necessary.
    Simply inserting an </fo:block><fo:block> combination does not do the job.
    The new block created here may not have the same properties -- things like
    space-before, keeps, etc. have great potential to be different. Now, I
    acknowledge that this can be worked around in the stylesheet, but it does
    add an order-of-magnitude level of complexity.

    > One additional consideration: in XML 1.1, U+2028 will be
    > subject to parser-side linefeed normalization. It implies
    > that you never get it from user text; and if you generate an
    > entity just to make it appear after the normalization, why
    > not generate a piece of markup instead?

    OK. I find this to be persuasive, and it means ultimately that either I or
    the authors of the XML 1.1 standard have misunderstood what the Unicode
    standard was trying to do with U+2028.

    This leaves only the issue of documentation. I would simply suggest that
    Section 7.1 of the document "XSL Formatting Objects in XEP 3.7" be modified
    to include your comment above that U+2028 is always treated within XEP as a
    linefeed character.

    Thanks again to both Nikolai and Ken for your explanations. This is not an
    issue I feel strongly about, and I didn't mean for it to turn into a big
    deal.

    Victor Mote

    -------------------
    (*) To unsubscribe, send a message with words 'unsubscribe xep-support'
    in the body of the message to majordomo@renderx.com from the address
    you are subscribed from.
    (*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/tos.html



    This archive was generated by hypermail 2.1.5 : Mon Apr 05 2004 - 09:41:16 PDT