RE: [xep-support] linefeed normalization

From: Victor Mote (vic@portagepub.com)
Date: Sun Apr 04 2004 - 11:32:15 PDT

  • Next message: Jim Quest: "RE: [xep-support] Hyphenation at /"

    Ken:

    Thanks for your response, especially the explanation of linefeed
    normalization, which makes sense. I am unclear though whether the linefeed
    normalization process is normalizing U+2028 and U+2029 characters. If my
    understanding of the purpose of those characters is correct, it seems that
    they should *not* be normalized, but should pass through to the XSL-FO
    processor intact. To test all of this, I dropped some 
 character
    references directly into my XSL-FO input and fed it directly into XEP. Even
    with them not naked, they are being converted or normalized to spaces by
    either the parser used by XEP or by XEP itself. So, whether the U+2028
    character is naked or not, it is getting eaten. If the parser is doing it, I
    would think it could be turned off (at least optionally). In any case, I
    think there is either a bug in XEP or a problem in the doc section already
    cited.

    I understand both of the workarounds that you have proposed, but, since I
    have control of the input, I will probably use a more straightforward
    approach to breaking the text into fo:block elements.

    Thanks again for your helpful response.

    Victor Mote

    > -----Original Message-----
    > From: owner-xep-support@renderx.com
    > [mailto:owner-xep-support@renderx.com] On Behalf Of G. Ken Holman
    > Sent: Sunday, April 04, 2004 7:37 AM
    > To: xep-support@renderx.com
    > Subject: Re: [xep-support] linefeed normalization
    >
    > At 2004-04-04 00:53 -0700, Victor Mote wrote:
    > >Section 7.1 of the document "XSL Formatting Objects in XEP 3.7" says:
    > >
    > >"Line break is forced by explicit linefeed characters:
    > U+000A, U+000D,
    > >U+2028, U+2029, unless they are suppressed by linefeed
    > normalization;"
    > >
    > >By "linefeed normalization", I assume you mean the
    > linefeed-treatment
    > >attribute. However, linefeed-treatment seems to only apply to U+000A
    > >(although I suppose that they mean U+000D and combinations
    > of the two also).
    >
    > Linefeed normalization is not linefeed treatment. The XML
    > processor inside of the XSL-FO processor normalizes "natural"
    > end-of-line sequences (naked LF, CR, CR+LF) into linefeed characters.
    >
    > An XML instance can bypass XML processor normalization by
    > using numeric character entities for these characters, at
    > which point they are not naked and are not normalized.
    >
    > So:
    >
    > <name>abc
    > def</name>
    >
    > is different than:
    >
    > <name>abc&#xd;&#xa;def</name>
    >
    > to an application reading these with an XML processor.
    >
    > >Section 13.2 of the Unicode 3.0 standard describes U+2028 as an
    > >"unambiguous character ... line separator...". Is there a
    > mechanism in
    > >XEP that will allow one to do *both* of the following: 1)
    > treat U+000A
    > >(and U+000D) as spaces (the default behavior), and 2) force
    > a line break at U+2028?
    >
    > Do you have the opportunity to do some translation in your
    > stylesheet? An example is below of playing with linefeed
    > preservation. I acknowledge that doing the transformation
    > may mess up the "spaces adjacent to linefeed"
    > processing in XSL-FO.
    >
    > You could go to the extent of a recursive call processing the
    > text nodes doing a replacement of U+2028 characters with
    > empty blocks. I've included that in the example as well.
    >
    > Both techniques give the desired result.
    >
    > I hope this helps.
    >
    > ......................... Ken
    >
    > T:\ftemp>type mote.xml
    > <names>
    > <name>abc
    > def</name>
    > <name>ghi&#xd;&#xa;jkl</name>
    > <name>mno&#x2028;pqr</name>
    > </names>
    >
    > T:\ftemp>type mote.xsl
    > <?xml version="1.0" encoding="utf-8"?><!--mote.xsl-->
    > <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    > xmlns="http://www.w3.org/1999/XSL/Format"
    > version="1.0">
    >
    > <xsl:template match="/names">
    > <root>
    > <layout-master-set>
    > <simple-page-master master-name="frame"
    > page-height="297mm" page-width="210mm"
    > margin-top="15mm" margin-bottom="15mm"
    > margin-left="15mm" margin-right="15mm">
    > <region-body region-name="frame-body"/>
    > </simple-page-master>
    > <page-sequence-master master-name="frame-pages">
    > <single-page-master-reference master-reference="frame"/>
    > </page-sequence-master>
    > </layout-master-set>
    >
    > <page-sequence master-reference="frame-pages">
    > <flow flow-name="frame-body">
    > <block>Translate:</block>
    > <xsl:apply-templates select="name" mode="translate"/>
    > <block>Recurse:</block>
    > <xsl:apply-templates select="name" mode="recurse"/>
    > </flow>
    > </page-sequence>
    > </root>
    > </xsl:template>
    >
    > <!--preservation through translation and setting of
    > properties--> <xsl:template match="name" mode="translate">
    > <block linefeed-treatment="preserve">
    > <xsl:apply-templates mode="translate"/>
    > </block>
    > </xsl:template>
    >
    > <xsl:template match="text()" mode="translate">
    > <xsl:value-of select="translate(.,'&#xa;&#x2028;&#xd;','
    > &#xa; ')"/> </xsl:template>
    >
    > <!--preservation solely through recognition of special
    > character--> <xsl:template match="name" mode="recurse">
    > <block>
    > <xsl:apply-templates mode="recurse"/>
    > </block>
    > </xsl:template>
    >
    > <xsl:template match="text()" mode="recurse" name="recurse">
    > <xsl:param name="text" select="string(.)"/>
    > <xsl:choose>
    > <xsl:when test="contains($text,'&#x2028;')">
    > <xsl:value-of select="substring-before($text,'&#x2028;')"/>
    > <block/>
    > <xsl:call-template name="recurse">
    > <xsl:with-param name="text"
    > select="substring-after($text,'&#x2028;')"/>
    > </xsl:call-template>
    > </xsl:when>
    > <xsl:otherwise>
    > <xsl:value-of select="$text"/>
    > </xsl:otherwise>
    > </xsl:choose>
    > </xsl:template>
    >
    > </xsl:stylesheet>
    >
    > T:\ftemp>
    >
    > --
    > Public courses: Spring 2004 world tour of hands-on XSL instruction
    > Each week: Monday-Wednesday: XSLT/XPath; Thursday-Friday: XSL-FO
    > Hong Kong May 17-21; Bremen Germany May 24-28; Helsinki June 14-18
    >
    > World-wide on-site corporate, govt. & user group XML/XSL training.
    > G. Ken Holman mailto:gkholman@CraneSoftwrights.com
    > Crane Softwrights Ltd. http://www.CraneSoftwrights.com/f/
    > Box 266, Kars, Ontario CANADA K0A-2E0 +1(613)489-0999 (F:-0995)
    > Male Breast Cancer Awareness http://www.CraneSoftwrights.com/f/bc
    >
    > -------------------
    > (*) To unsubscribe, send a message with words 'unsubscribe
    > xep-support'
    > in the body of the message to majordomo@renderx.com from the
    > address you are subscribed from.
    > (*) By using the Service, you expressly agree to these Terms
    > of Service http://www.renderx.com/tos.html
    >

    -------------------
    (*) To unsubscribe, send a message with words 'unsubscribe xep-support'
    in the body of the message to majordomo@renderx.com from the address
    you are subscribed from.
    (*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/tos.html



    This archive was generated by hypermail 2.1.5 : Sun Apr 04 2004 - 11:42:10 PDT