Re: [xep-support] Improving hyphenation algorithm

From: David Tolpin (dvd@davidashen.net)
Date: Wed Aug 25 2004 - 08:27:28 PDT

  • Next message: Jirka Kosek: "Re: [xep-support] Improving hyphenation algorithm"

    Jirka Kosek:
    [ Charset UTF-8 unsupported, converting... ]
    > Hi,
    >
    > I used XEP for longer documents containg a lot of free-flow text in a
    > last months. I found a lot of words with bad hyphenation. I was very
    > surprised as I use same hyphenation files as for TeX that hyphenates
    > almost all words in a correct way. Then I found following statement in
    > XEP documentation:

    Both in the original Czech hyphenation table for TeX and the modified
    one for XEP (although XEP understands TeX accents, so the modification
    is not necessary), there is a bug.

    u ring, a Czech character, I believe, is written as \r u instead
    of \ru, and thus it is translated into ^^b0u, not into ^^f5.

    Both TeX and XEP hyphenate Czech wrong, only due to differences
    in error handling (natural since TeX accepts 8-bit characters,
    and XEP treats all non-alphanumeric characters as separators),
    the result is different - TeX gives too few hyphenation points,
    while XEP gives to many.

    produktu hyphenates as pro-duktu in TeX, pro-du-k-tu in XEP.

    Both are wrong, but the bug is in the TeX hyphenation table. When
    all occurences of ^^b0u are replaced with ^^f5 (or '\r u' with \ru),
    both TeX and XEP give correct hyphenation:

    pro-duk-tu

    And all other hyphenation results are exactly the same.

    Neither TeX nor XEP use absolute values in hyphenation patterns,
    just because it makes no sense -- they are not designed for that,
    just to compute right hyphenation point by taking the largest value
    in each hyphenation point and checking whether it is odd.

    Hyphenation algorithms in TeX and XEP are identical.

    David Tolpin
    RenderX
    -------------------
    (*) To unsubscribe, send a message with words 'unsubscribe xep-support'
    in the body of the message to majordomo@renderx.com from the address
    you are subscribed from.
    (*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/tos.html



    This archive was generated by hypermail 2.1.5 : Wed Aug 25 2004 - 08:45:38 PDT