[xep-support] Operating on all of the pcdata of an XML file: Considered harmful?

From: Louis Amdur <LAmdur@symantec.com>
Date: Thu Mar 31 2005 - 21:00:06 PST

I know this issue is something of a chestnut on this list, but I'd like to
solicit some feedback to see how other folks are handling the issue.

My understanding: When XEP encounters a long string with no "natural" line
break point (e.g., a programlisting or URL without spaces), it squeezes
the characters of the string together if the string cannot fit in the
given space. By RenderX's lights, this is a feature rather than a bug, as
it exposes weaknesses in stylesheets. One solution is to insert zero-width
spaces to allow a long string to break when necessary--this can be
accomplished manually in the markup, or automatically through a
preprocessing step. For us, the manual option is a non-starter, as we
translate our content to up to thirty target languages, and translation
vendors never see the markup itself, just the pcdata (our translation
memory protects the markup). So we're looking at automating this, knowing
that an automated solution won't always create graceful line breaks in all
contexts. I've seen some XSL code fragments on how to test for string
length and then insert ZWS code points between the characters of strings
that exceed a given threshold length--the same could be accomplished,
perhaps more efficiently, through Python or Perl during a pre-processing
step. The person who is responsible for implementing and maintaining our
XSL tool chain is, however, resistant to such an approach, claiming that
he has "philosophical objections" to performing an operation on all of the
pcdata of an XML file. Other than lacking elegance, I don't really
understand the foundation of his objection to this sort of solution--all
sorts of organizations pre-process pcdata as a matter of course (not to
mention all sorts of non-XML text streams, as well). I'm not really
interested in forcing any solution down his throat--I just have an
immediate need to create a bulletproof method for allowing long strings to

Ideally, I think RenderX should provide a configuration option that would
allow text to break rather than squeeze, with the caveat that such breaks
may often not be very pretty. Lacking that, I am gunning for a simple
pre-processing step that will do the same.

Lou Amdur
Senior Principal Information Developer
(310) 449-7005
(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
in the body of the message to majordomo@renderx.com from the address
you are subscribed from.
(*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/tos.html
Received on Thu Mar 31 21:27:34 2005

This archive was generated by hypermail 2.1.8 : Thu Mar 31 2005 - 21:27:36 PST