From: Jim Melton (jim.melton@acm.org)
Date: Tue Apr 06 2004 - 09:13:33 PDT
Alexander,
At 05:09 AM 4/5/2004 Monday, Alexander Peshkov wrote:
>Hello Jim,
>
>1. Character '/' is not a letter and therefore is not considered to be
> part of the hyphenation rule (our hyphenator is aimed to hyphenate
> words of natural languages, not an arbitrary strings).
Of course that is a true statement. However, "soft-hyphen" is not a
letter, but it is taken into account during the hyphenation algorithm. I
have worked with several hyphenation systems in the past (among other
previous lives, I once write newspaper typesetting systems) and some of
them had pre-defined lists of characters that could be used to break
"words" (strings of characters) while others allowed the user or the system
configurator to specify such characters. (Common characters for this
purpose were "/", ".", "-", and "_", all of which were extremely useful for
typesetting, or formatting, technical works such as those about programming
languages.)
I don't think that it would be a violation of your principles to allow
users to specify a list of "additional characters" at which words/strings
could be broken. It is *extremely* inconvenient for document authors to
have to manually insert zero-width spaces (no such think on *my* keyboard,
so I'd have to use a character entity reference) or soft hyphens (same
problem), especially when there might be literally scores or hundreds of
places in a large, dynamic document.
> I recommend you to add zero-width spaces (U+200B) after and/or
> before slash character so that string could be broken apart at
> these points.
>2. Again, you have to add zero-width spaces (or soft hyphens if you
> want to have visible hyphenation characters) after every character
> (or after every character triad in your case).
>
>Note that you can add those symbols automatically using XSLT-preprocessing.
Interesting observation, but I think I'm feeling particularly dense this
morning. I started to say that I am uncertain how I can efficiently
examine every ordinary character string in a very large document to see if
it has a "/" in it and then replace that string with "&zwsp;/&zwsp;" (or
the equivalent). However, I saw Ken Holman's message that described just
how to do that sort of thing and I realized that I already (sort of) knew
how. (Thanks, Ken!)
But Jim also asked about hyphenation (or at least breaking) strings like
"RT54XIOP", which is a separate problem that is of genuine interest to
others, such as myself. Again, I've worked with a number of systems whose
algorithms permit such strings to be arbitrarily broken so column widths
(for example) are not violated and so character "scrunching" does not have
to be performed. Again, I know that such strings are not "words of natural
languages", but many technical subjects, especially in the computer field,
have such non-words liberally sprinkled throughout books, etc. that are
about those subjects.
I'd like to add my voice to Jim Quest's in asking that RenderX reconsider
the decisions regarding hyphenating/breaking character sequences based on
either predefined non-letter characters or non-letter characters specified
by the person installing/configuring XEP, and the decision to arbitrarily
break character sequences when no such "break character" can be found and
no hyphenation rule can be applied.
Thanks,
Jim
========================================================================
Jim Melton --- Editor of ISO/IEC 9075-* (SQL) Phone: +1.801.942.0144
Oracle Corporation Oracle Email: jim dot melton at oracle dot com
1930 Viscounti Drive Standards email: jim dot melton at acm dot org
Sandy, UT 84093-1063 Personal email: jim at melton dot name
USA Fax : +1.801.942.3345
========================================================================
= Facts are facts. However, any opinions expressed are the opinions =
= only of myself and may or may not reflect the opinions of anybody =
= else with whom I may or may not have discussed the issues at hand. =
========================================================================
-------------------
(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
in the body of the message to majordomo@renderx.com from the address
you are subscribed from.
(*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/tos.html
This archive was generated by hypermail 2.1.5 : Tue Apr 06 2004 - 10:16:44 PDT