Re[2]: [xep-support] Hyphenation at /

From: Jim Melton (jim.melton@acm.org)
Date: Tue Apr 06 2004 - 09:13:33 PDT

  • Next message: Alexander Peshkov: "Re[2]: [xep-support] Hyphenation exception list"

    Alexander,

    At 05:09 AM 4/5/2004 Monday, Alexander Peshkov wrote:
    >Hello Jim,
    >
    >1. Character '/' is not a letter and therefore is not considered to be
    > part of the hyphenation rule (our hyphenator is aimed to hyphenate
    > words of natural languages, not an arbitrary strings).

    Of course that is a true statement. However, "soft-hyphen" is not a
    letter, but it is taken into account during the hyphenation algorithm. I
    have worked with several hyphenation systems in the past (among other
    previous lives, I once write newspaper typesetting systems) and some of
    them had pre-defined lists of characters that could be used to break
    "words" (strings of characters) while others allowed the user or the system
    configurator to specify such characters. (Common characters for this
    purpose were "/", ".", "-", and "_", all of which were extremely useful for
    typesetting, or formatting, technical works such as those about programming
    languages.)

    I don't think that it would be a violation of your principles to allow
    users to specify a list of "additional characters" at which words/strings
    could be broken. It is *extremely* inconvenient for document authors to
    have to manually insert zero-width spaces (no such think on *my* keyboard,
    so I'd have to use a character entity reference) or soft hyphens (same
    problem), especially when there might be literally scores or hundreds of
    places in a large, dynamic document.

    > I recommend you to add zero-width spaces (U+200B) after and/or
    > before slash character so that string could be broken apart at
    > these points.
    >2. Again, you have to add zero-width spaces (or soft hyphens if you
    > want to have visible hyphenation characters) after every character
    > (or after every character triad in your case).
    >
    >Note that you can add those symbols automatically using XSLT-preprocessing.

    Interesting observation, but I think I'm feeling particularly dense this
    morning. I started to say that I am uncertain how I can efficiently
    examine every ordinary character string in a very large document to see if
    it has a "/" in it and then replace that string with "&zwsp;/&zwsp;" (or
    the equivalent). However, I saw Ken Holman's message that described just
    how to do that sort of thing and I realized that I already (sort of) knew
    how. (Thanks, Ken!)

    But Jim also asked about hyphenation (or at least breaking) strings like
    "RT54XIOP", which is a separate problem that is of genuine interest to
    others, such as myself. Again, I've worked with a number of systems whose
    algorithms permit such strings to be arbitrarily broken so column widths
    (for example) are not violated and so character "scrunching" does not have
    to be performed. Again, I know that such strings are not "words of natural
    languages", but many technical subjects, especially in the computer field,
    have such non-words liberally sprinkled throughout books, etc. that are
    about those subjects.

    I'd like to add my voice to Jim Quest's in asking that RenderX reconsider
    the decisions regarding hyphenating/breaking character sequences based on
    either predefined non-letter characters or non-letter characters specified
    by the person installing/configuring XEP, and the decision to arbitrarily
    break character sequences when no such "break character" can be found and
    no hyphenation rule can be applied.

    Thanks,
        Jim

    ========================================================================
    Jim Melton --- Editor of ISO/IEC 9075-* (SQL) Phone: +1.801.942.0144
    Oracle Corporation Oracle Email: jim dot melton at oracle dot com
    1930 Viscounti Drive Standards email: jim dot melton at acm dot org
    Sandy, UT 84093-1063 Personal email: jim at melton dot name
    USA Fax : +1.801.942.3345
    ========================================================================
    = Facts are facts. However, any opinions expressed are the opinions =
    = only of myself and may or may not reflect the opinions of anybody =
    = else with whom I may or may not have discussed the issues at hand. =
    ========================================================================

    -------------------
    (*) To unsubscribe, send a message with words 'unsubscribe xep-support'
    in the body of the message to majordomo@renderx.com from the address
    you are subscribed from.
    (*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/tos.html



    This archive was generated by hypermail 2.1.5 : Tue Apr 06 2004 - 10:16:44 PDT