Re: [xep-support] Operating on all of the pcdata of an XML file: Considered harmful?

From: Martin Holmes <mholmes@uvic.ca>
Date: Fri Apr 01 2005 - 09:29:52 PST

Hi there,

Here are two solutions I use for inserting zero-width spaces, one
specific to URLs, and one more general:

<!-- This template written by Martin Holmes. It inserts zero-width
spaces before and after each slash in a url, enabling the url to be
broken appropriately at line ends. -->
        <xsl:template name="BreakOnSlashes">
                <xsl:param name="InString"/>
                <xsl:value-of select="substring-before($InString,
'/')"/>&#x200b;&#x002f;&#x200b;<xsl:choose>
                        <xsl:when test="contains(substring-after($InString, '/'), '/')">
                                <xsl:call-template name="BreakOnSlashes">
                                        <xsl:with-param name="InString">
                                                <xsl:value-of select="substring-after($InString, '/')"/>
                                        </xsl:with-param>
                                </xsl:call-template>
                        </xsl:when>
                        <xsl:otherwise>
                                <xsl:value-of select="substring-after($InString, '/')"/>
                        </xsl:otherwise>
                </xsl:choose>
        </xsl:template>
        
<!-- This recursive template found on the Web, attributed to Michael
Smith, Julien Letessier and Nikolai Grigoriev, found here:
http://www.dpawson.co.uk/docbook/styling/fo.html#d2635e175 -->

     <xsl:template name="intersperse-with-zero-spaces">
         <xsl:param name="str"/>
         <xsl:variable
name="spacechars">&#x9;&#xA;&#x2000;&#x2001;&#x2002;&#x2003;&#x2004;&#x2005;&#x2006;&#x2007;&#x2008;&#x2009;&#x200A;&#x200B;</xsl:variable>

         <xsl:if test="string-length($str) &gt; 0">
             <xsl:variable name="c1"
                     select="substring($str, 1, 1)"/>
             <xsl:variable name="c2"
                     select="substring($str, 2, 1)"/>

             <xsl:value-of select="$c1"/>
             <xsl:if test="$c2 != '' and
                     not(contains($spacechars, $c1) or
                     contains($spacechars,
$c2))"><xsl:text>&#x200b;</xsl:text></xsl:if>

             <xsl:call-template name="intersperse-with-zero-spaces">
                 <xsl:with-param name="str" select="substring($str, 2)"/>
             </xsl:call-template>
         </xsl:if>
     </xsl:template>

Cheers,
Martin

Louis Amdur wrote:
> I know this issue is something of a chestnut on this list, but I'd like to
> solicit some feedback to see how other folks are handling the issue.
>
> My understanding: When XEP encounters a long string with no "natural" line
> break point (e.g., a programlisting or URL without spaces), it squeezes
> the characters of the string together if the string cannot fit in the
> given space. By RenderX's lights, this is a feature rather than a bug, as
> it exposes weaknesses in stylesheets. One solution is to insert zero-width
> spaces to allow a long string to break when necessary--this can be
> accomplished manually in the markup, or automatically through a
> preprocessing step. For us, the manual option is a non-starter, as we
> translate our content to up to thirty target languages, and translation
> vendors never see the markup itself, just the pcdata (our translation
> memory protects the markup). So we're looking at automating this, knowing
> that an automated solution won't always create graceful line breaks in all
> contexts. I've seen some XSL code fragments on how to test for string
> length and then insert ZWS code points between the characters of strings
> that exceed a given threshold length--the same could be accomplished,
> perhaps more efficiently, through Python or Perl during a pre-processing
> step. The person who is responsible for implementing and maintaining our
> XSL tool chain is, however, resistant to such an approach, claiming that
> he has "philosophical objections" to performing an operation on all of the
> pcdata of an XML file. Other than lacking elegance, I don't really
> understand the foundation of his objection to this sort of solution--all
> sorts of organizations pre-process pcdata as a matter of course (not to
> mention all sorts of non-XML text streams, as well). I'm not really
> interested in forcing any solution down his throat--I just have an
> immediate need to create a bulletproof method for allowing long strings to
> break.
>
> Ideally, I think RenderX should provide a configuration option that would
> allow text to break rather than squeeze, with the caveat that such breaks
> may often not be very pretty. Lacking that, I am gunning for a simple
> pre-processing step that will do the same.
>
>
> ____________
> Lou Amdur
> Senior Principal Information Developer
> Symantec
> (310) 449-7005
> -------------------
> (*) To unsubscribe, send a message with words 'unsubscribe xep-support'
> in the body of the message to majordomo@renderx.com from the address
> you are subscribed from.
> (*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/tos.html
-------------------
(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
in the body of the message to majordomo@renderx.com from the address
you are subscribed from.
(*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/tos.html
Received on Fri Apr 1 10:08:17 2005

This archive was generated by hypermail 2.1.8 : Fri Apr 01 2005 - 10:08:19 PST