Re: [xep-support] linefeed normalization

From: G. Ken Holman (gkholman@CraneSoftwrights.com)
Date: Sun Apr 04 2004 - 06:36:52 PDT

  • Next message: Victor Mote: "RE: [xep-support] linefeed normalization"

    At 2004-04-04 00:53 -0700, Victor Mote wrote:
    >Section 7.1 of the document "XSL Formatting Objects in XEP 3.7" says:
    >
    >"Line break is forced by explicit linefeed characters: U+000A, U+000D,
    >U+2028, U+2029, unless they are suppressed by linefeed normalization;"
    >
    >By "linefeed normalization", I assume you mean the linefeed-treatment
    >attribute. However, linefeed-treatment seems to only apply to U+000A
    >(although I suppose that they mean U+000D and combinations of the two also).

    Linefeed normalization is not linefeed treatment. The XML processor inside
    of the XSL-FO processor normalizes "natural" end-of-line sequences (naked
    LF, CR, CR+LF) into linefeed characters.

    An XML instance can bypass XML processor normalization by using numeric
    character entities for these characters, at which point they are not naked
    and are not normalized.

    So:

        <name>abc
        def</name>

    is different than:

        <name>abc&#xd;&#xa;def</name>

    to an application reading these with an XML processor.

    >Section 13.2 of the Unicode 3.0 standard describes U+2028 as an "unambiguous
    >character ... line separator...". Is there a mechanism in XEP that will
    >allow one to do *both* of the following: 1) treat U+000A (and U+000D) as
    >spaces (the default behavior), and 2) force a line break at U+2028?

    Do you have the opportunity to do some translation in your stylesheet? An
    example is below of playing with linefeed preservation. I acknowledge that
    doing the transformation may mess up the "spaces adjacent to linefeed"
    processing in XSL-FO.

    You could go to the extent of a recursive call processing the text nodes
    doing a replacement of U+2028 characters with empty blocks. I've included
    that in the example as well.

    Both techniques give the desired result.

    I hope this helps.

    .......................... Ken

    T:\ftemp>type mote.xml
    <names>
    <name>abc
    def</name>
    <name>ghi&#xd;&#xa;jkl</name>
    <name>mno&#x2028;pqr</name>
    </names>

    T:\ftemp>type mote.xsl
    <?xml version="1.0" encoding="utf-8"?><!--mote.xsl-->
    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                     xmlns="http://www.w3.org/1999/XSL/Format"
                     version="1.0">

    <xsl:template match="/names">
       <root>
         <layout-master-set>
           <simple-page-master master-name="frame"
                               page-height="297mm" page-width="210mm"
                               margin-top="15mm" margin-bottom="15mm"
                               margin-left="15mm" margin-right="15mm">
             <region-body region-name="frame-body"/>
           </simple-page-master>
           <page-sequence-master master-name="frame-pages">
             <single-page-master-reference master-reference="frame"/>
           </page-sequence-master>
         </layout-master-set>

         <page-sequence master-reference="frame-pages">
           <flow flow-name="frame-body">
             <block>Translate:</block>
             <xsl:apply-templates select="name" mode="translate"/>
             <block>Recurse:</block>
             <xsl:apply-templates select="name" mode="recurse"/>
           </flow>
         </page-sequence>
       </root>
    </xsl:template>

    <!--preservation through translation and setting of properties-->
    <xsl:template match="name" mode="translate">
       <block linefeed-treatment="preserve">
         <xsl:apply-templates mode="translate"/>
       </block>
    </xsl:template>

    <xsl:template match="text()" mode="translate">
       <xsl:value-of select="translate(.,'&#xa;&#x2028;&#xd;',' &#xa; ')"/>
    </xsl:template>

    <!--preservation solely through recognition of special character-->
    <xsl:template match="name" mode="recurse">
       <block>
         <xsl:apply-templates mode="recurse"/>
       </block>
    </xsl:template>

    <xsl:template match="text()" mode="recurse" name="recurse">
       <xsl:param name="text" select="string(.)"/>
       <xsl:choose>
         <xsl:when test="contains($text,'&#x2028;')">
           <xsl:value-of select="substring-before($text,'&#x2028;')"/>
           <block/>
           <xsl:call-template name="recurse">
             <xsl:with-param name="text"
                             select="substring-after($text,'&#x2028;')"/>
           </xsl:call-template>
         </xsl:when>
         <xsl:otherwise>
           <xsl:value-of select="$text"/>
         </xsl:otherwise>
       </xsl:choose>
    </xsl:template>

    </xsl:stylesheet>

    T:\ftemp>

    --
    Public courses: Spring 2004 world tour of hands-on XSL instruction
    Each week:   Monday-Wednesday: XSLT/XPath; Thursday-Friday: XSL-FO
    Hong Kong May 17-21; Bremen Germany May 24-28; Helsinki June 14-18
    World-wide on-site corporate, govt. & user group XML/XSL training.
    G. Ken Holman                 mailto:gkholman@CraneSoftwrights.com
    Crane Softwrights Ltd.          http://www.CraneSoftwrights.com/f/
    Box 266, Kars, Ontario CANADA K0A-2E0    +1(613)489-0999 (F:-0995)
    Male Breast Cancer Awareness  http://www.CraneSoftwrights.com/f/bc
    -------------------
    (*) To unsubscribe, send a message with words 'unsubscribe xep-support'
    in the body of the message to majordomo@renderx.com from the address
    you are subscribed from.
    (*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/tos.html
    


    This archive was generated by hypermail 2.1.5 : Sun Apr 04 2004 - 06:48:48 PDT