Re: [xep-support] Inter-document references

From: Jim Melton (jim.melton@acm.org)
Date: Sat Apr 19 2003 - 11:16:42 PDT

  • Next message: Jim Melton: "[xep-support] Feature suggestion"

    Nikolai,

    Your proposed strategy works wonderfully! I am not posting my actual XSLT
    code because it is so specialized to my requirements that it probably
    wouldn't help anybody else. However, if anybody reading this list wishes
    to see it, I have no objections to my sending it to them.

    Your note outlining the strategy mentioned that you wonder why do I need a
    separate "symbol" file; one possible reason has come to light. The
    documents in my suite of parts add up to about 300,000 lines of source text
    (several megabytes of XML). An XSLT transformation from XML to XSL FO that
    formerly took under 10 seconds now takes about 45 seconds. That is
    approximately a 5:1 reduction in speed. I do not currently believe that
    this will be a significant problem for me (because these conversions are
    done at most several times a month, not several times a day). However, I
    think that the strategy of using a separate "symbol" file would cause a
    much smaller loss of speed.

    Now, I have a few follow-on questions for both Nikolai and David: The need
    to create "named destinations" in my PDF files in order to have my links
    open the PDF file at the correct place suggests that additional PDF code
    will be generated by XEP at certain places in the PDF file.

    1) Will *every* place in my source XML file that has an ID attribute result
    in one of those "named destinations" in the corresponding PDF file? Or
    will it be only those places for which I explicitly generate some XSL FO
    code (e.g., an extension in the rx: namespace)? Or will it be only those
    places for which there is actually some reference? (The last choice seems
    unlikely, because the references will be in a different document entirely.)

    2) How "big" (e.g., how many bytes) will the PDF code for a single "named
    destination"? As David may recall, I have already encountered a problem of
    extreme PDF file sizes caused by vast numbers of internal references, which
    I have currently disabled (but would like to re-enable when possible). If
    every element with an ID attribute results in adding 50 or 100 bytes (never
    mind 200 or 300 bytes!) to the PDF file, this is likely to be prohibitive!

    3) Have you guys made any progress in reducing the size of the PDF code
    corresponding to a "hot link" within a single PDF file? When I last used
    that so prolifically, it was something like 250 to 300 bytes per
    reference. You told me then that you might be able to reduce to as little
    as 150 bytes, which would make a very significant difference to me. Can
    you update me on this situation?

    Again, MANY THANKS!
        Jim

    At 10:47 2003-04-18 +0400 Friday, you wrote:
    >Jim,
    >
    > > The only idea I've had so far is to write a new XSLT stylesheet that
    > > processes each of my documents (one at a time, manually --- since I
    >haven't
    > > gotten around to worrying about scripting issues) and outputs some sort of
    > > file containing nothing beyond the "values" of every symbol in that
    > > document. Then, when I'm processing document 5 through my "real"
    > > stylesheet, it would somehow access all of the "symbol" files and pick up
    > > the chapter/section number and title.
    > >
    > > Question 1) Does that seem like a reasonable approach? Are there better
    > > approaches that I have managed to overlook or suppress?
    >
    >I wonder why do you need a separate"symbol" file. If all parts are similar
    >in their structure, you can set up a generic template to exract
    >section/chapter
    >references from both local and remote documents. To automate the thing,
    >you will need an index of all parts to map numbers to file names,
    >conveniently
    >specified in a separate XML file. More or less like this (untested, just to
    >show
    >the idea):
    >
    >parts-index.xml:
    >
    ><parts>
    > <part number="1" file="SQL-20030418-part1-rev3"/>
    > <part number="2" file="SQL-20030310-part2-rev12"/>
    > ...
    ></parts>
    >
    >
    >Stylesheet (very roughly):
    >
    ><!-- Global variable to store an array of part names -->
    ><xsl:variable name="parts"
    > select="document('parts-index')/parts/part"/>
    >
    ><!-- Template to build section name -->
    ><xsl:template name="get-section-name">
    > <xsl:param name="root" select="/">
    > <xsl:param name="ref"/>
    >
    > <xsl:for-each select="$root//*[@id=$ref][self::chapter or
    >self::section]">
    > <xsl:number format="1.1.1.1.1.1. " level="multiple"
    > count="chapter | section" from="$root" />
    > <xsl:value-of select="title"/>
    > </xsl:for-each>
    ></xsl:template>
    >
    ><!-- Local link: root is /, internal destination -->
    ><xsl:template match="docref">
    > <fo:basic-link color="blue"
    > internal-destination={@ref}>
    > <xsl:call-template name="get-section-name">
    > <xsl:with-param name="ref" select="@ref"/>
    > </xsl:call-template>
    > </fo:basic-link>
    ></xsl:template>
    >
    ><!-- Remote link: root is retrieved by document(), external destination -->
    ><xsl:template match="docref">
    > <xsl:variable name="filename" select="$parts[@part]/@file"/>
    > <fo:basic-link color="blue"
    >
    >external-destination="url(file://{$filename}.pdf#{@ref})">
    > <xsl:call-template name="get-section-name">
    > <xsl:with-param name="root" select="document(concat($filename,
    >".xml")"/>
    > <xsl:with-param name="ref" select="@ref"/>
    > </xsl:call-template>
    > </fo:basic-link>
    ></xsl:template>
    >
    > > Question 2) Would it be better if that "symbol" file were a plain text
    >file
    > > (how would I access such information in XSLT?) or an XML file (something
    > > like <part2References><reference number="4"
    > > title="Concepts/><reference.../></part2References>)? I assume that the
    > > information contained in an XML file would be more easily accessed in my
    >XSLT.
    >
    >That's true. But my suggestion is to work directly from source files,
    >without
    >intermediate data. Using document() function, you can work conveniently
    >with multiple source files.
    >
    > > Question 3) Once I have that information, how can I turn the text (e.g.,
    > > Section 4.2, "Data types", in Part 2) into a "hot link" that will cause
    > > Acrobat to open the file containing part 2 and position it at the start of
    > > Section 4.2 --- or at least on the same page? (I'm aware of the
    > > external-destination attribute, but have not been successful at making
    > > referenced PDF documents open at the right place.)
    >
    >This requires support for named destinations in PDF files produced
    >by XEP. It is not yet available in the current version, but already
    >implemented and under final testing; so in few weeks at most it will
    >be delivered to XEP users. The syntax will be as follows:
    >
    >external-destination="url(file://somefile.pdf#someplace)"
    >
    >will open PDF file 'somefile.pdf' in the same Acrobat window (without
    >going to a browser) and jump to a named destination 'someplace' inside it.
    >Named destinations will be created by @id attributes in the source FO;
    >so the above URL would bring you to the same place as
    >internal-destination="someplace" in somefile.fo. That's what I tried
    >to show in the above sample code.
    >
    >(I can unveil a secret: the above syntax for file links works even in the
    >current version. Named destinations are not created however;
    >so referenced PDF documents always open at the first page).
    >
    >Best regards,
    >Nikolai Grigoriev
    >RenderX
    >
    >
    >-------------------
    >(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
    >in the body of the message to majordomo@renderx.com from the address
    >you are subscribed from.
    >(*) By using the Service, you expressly agree to these Terms of Service
    >http://www.renderx.com/tos.html

    ========================================================================
    Jim Melton --- Editor of ISO/IEC 9075-* (SQL) Phone: +1.801.942.0144
    Oracle Corporation Oracle Email: mailto:jim.melton@oracle.com
    1930 Viscounti Drive Standards email: mailto:jim.melton@acm.org
    Sandy, UT 84093-1063 Personal email: mailto:jim@melton.name
    USA Fax : +1.801.942.3345
    ========================================================================
    = Facts are facts. However, any opinions expressed are the opinions =
    = only of myself and may or may not reflect the opinions of anybody =
    = else with whom I may or may not have discussed the issues at hand. =
    ========================================================================

    -------------------
    (*) To unsubscribe, send a message with words 'unsubscribe xep-support'
    in the body of the message to majordomo@renderx.com from the address
    you are subscribed from.
    (*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/tos.html



    This archive was generated by hypermail 2.1.5 : Sat Apr 19 2003 - 11:38:39 PDT