Re: [xep-support] Splitting of large document

From: Brian J. Butler <bjbutler@bjbsoftware.com>
Date: Fri Jun 03 2005 - 13:40:27 PDT

I have dealt with that problem before. Usually an iterative approach
works. Render once without links to get pagination, insert links,
render again and see if pagination changed, repeat until pagination does
not change.

I'm not sure XEP completely solves this problem, even with a non-split
document. I noticed that one of my page-citations was placed very far
into the right margin, as if XEP was trying to avoid changing
pagination when the citations are expanded.

BJB

Jost Klopfstein wrote:

> Brian,
>
> You can extract almost everything from the XEP intermediate format
> (SVG drawings seem to be encoded ???).
>
> It is a flat XML format, organized in pages, with x and y coordinate
> for every element.
> Details see here http://www.renderx.com/reference.html#appendix_C
>
> This approach works fine if you have a linear workflow:
> 1) process section 1
> 2) process section 2 with starting page information from previous step
> 3) process next sections
> ...
> n) extract necessary information for indexes and TOC from the XEP
> intermediate format files (based on rx:pinpoint data)
> n+1) build TOC
> n+2) build indexes
> n+3) assemble book
>
>
> This concept fails with pointers (references/links) between sections,
> as you may have to change the pagination of a previous section while
> rendering a new section.
>
> Cheers, Jost
>
> Jost Klopfstein
> *Axos Technologies Inc.*
> OnDemand & Transactional Document Solutions, powered by XML
> IT Consulting
>
> *604 628-2248 Phone*
> 604-324-2380 Fax
> jost (at) axostech.com
> http://www.axostech.com
>
>
> ----- Original Message -----
> From: Brian J. Butler <mailto:bjbutler@bjbsoftware.com>
> To: xep-support@renderx.com <mailto:xep-support@renderx.com>
> Sent: Friday, June 03, 2005 10:53 AM
> Subject: Re: [xep-support] Splitting of large document
>
> The biggest memory consumer seems to be the process that
> compresses the PDF document. I find that our document requires
> more than 3GB to process with this option enabled and less than
> 1.5GB without it.
>
> Try the turning this option off in your xep.xml file as shown below:
>
> <!-- Backend options -->
> <generator-options format="PDF">
> <!-- <option name="COMPRESS" value="false"/> -->
> <!-- <option name="PDF_VERSION" value="1.3"/> -->
> </generator-options>
>
> By the way, I have a problem similar to yours. Our 2200 page book
> is only one part of a larger catalog. Unfortunately, the other
> sections contain references to products in the main section. I
> had posted some messages here earlier to see if there is a way to
> write page numbers and other data into the XEP log. I thought I
> would extract the data with an editor and put it in a database,
> using it later to look up page references for the other sections.
> I don't think this is possible, but someone suggested looking at
> the XEP intermediate format. I have not done this yet. If you
> look at it and can distill the information I would like to know
> how to do it. If I try it first, I will post here.
>
> BJB
>
> Jost Klopfstein wrote:
>
>>Mike, Brian,
>>
>>Thanks for your hints.
>>
>>The document is in the 1000+ pages class with plenty of large SVG drawings
>>(>500 Kbytes).
>>
>>I will check with my customer if I can drop the references between chapters.
>>If so, then I can use the split technology and build the TOC and Indexes
>>from the separate sections in XEP intermediate format.
>>Otherwise they have to buy a bigger server...
>>
>>Does anyone know if there is an option to prevent XEP's in memory
>>processing?
>>
>>Cheers, Jost
>>
>>----- Original Message -----
>>From: "Mike Trotman" <mike.trotman@datalucid.com>
>>To: <xep-support@renderx.com>
>>Sent: Friday, June 03, 2005 3:05 AM
>>Subject: Re: [xep-support] Splitting of large document
>>
>>
>>
>>
>>>I have successfully processed 100mB+ documents of 1000+ pages - mainly
>>>consisting of heavily formatted tables with 15 x 20 cells per page,
>>>multiple pages per table, lots of data per cell, footnotes etc.
>>>This included bookmarks and a simple Table Of Contents with internal
>>>links to individual tables.
>>>
>>>By placing each table / document chunk within a separate
>>><fo:page-sequence> I was able to keep the memory requirements very low
>>>(not much more than the default).
>>>I'm now also using XSLT pre-processing where I produce each
>>><fo:page-sequence> in a separate XSL-FO file and generate a master
>>>processing document which sets up regions and page masters
>>>and contains a list of the separate <fo:page-sequence> files to include.
>>>I then process this master list with a simple XSLT to produce the final
>>>FO for output to PDF.
>>>
>>>I haven't used indexes (the TOC references etc. are constructed by the
>>>XSLT) - so don't know what sort of overhead this produces.
>>>
>>>
>>>Mike
>>>
>>>Brian J. Butler wrote:
>>>
>>>
>>>
>>>>I have also been working on a very large document (88MB FO file, 2200
>>>>pages of technical text and drawings). I can offer the following three
>>>>suggestions:
>>>>
>>>>1. Make sure your Java -Xmx size is as large as possible. With
>>>>Windows this will be approximately -Xmx1600Mb.
>>>>2. Use the XEP flag to turn off PDF compression (in xep.xml or command
>>>>line). This will result in a very large PDF, but you can compress it
>>>>after rendering by opening it in Adobe Acrobat and then saving.
>>>>3. Switch to a 64-bit Solaris platform (Opteron processors). We
>>>>benchmarked one of these machines and found that we can -Xmx almost
>>>>unlimited memory. The speed is also very fast.
>>>>
>>>>BJB
>>>>
>>>>Jost Klopfstein wrote:
>>>>
>>>>
>>>>
>>>>>Hi,
>>>>>
>>>>>I ran into memory problems while rendering a large book with TOC,
>>>>>indexes and references between sections.
>>>>>I first thought I could just render section by section into XEP
>>>>>intermediate format and then assemble the pieces with some custom
>>>>>code into a large PDF using the PDF output generator.
>>>>>However I will loose the TOC, indexes and the references between
>>>>>sections.
>>>>>
>>>>>Any ideas?
>>>>>
>>>>>Thanks,
>>>>>Jost
>>>>>
>>>>>
>>>>>
>>>>------------------------------------------------------------------------
>>>>
>>>>
>>>>>Jost Klopfstein
>>>>>*Axos Technologies Inc.*
>>>>>OnDemand & Transactional Document Solutions, powered by XML
>>>>>IT Consulting
>>>>>
>>>>>*604 628-2248 Phone*
>>>>>604-324-2380 Fax
>>>>>jost (at) axostech.com
>>>>>http://www.axostech.com
>>>>>
>>>>>
>>>>--
>>>>Brian J. Butler
>>>>BJB Software, Inc.
>>>>76 Bayberry Lane
>>>>Holliston, MA 01746
>>>>
>>>>E-mail: bjbutler@bjbsoftware.com
>>>>Web: http://www.bjbsoftware.com
>>>>Phone: 508-429-1441
>>>>Fax: 419-710-1867
>>>>
>>>>
>>>>
>>>>
>>>>
>>>--
>>>No virus found in this outgoing message.
>>>Checked by AVG Anti-Virus.
>>>Version: 7.0.322 / Virus Database: 267.5.2 - Release Date: 03/06/2005
>>>
>>>
>>>Message Scanned by ClamAV on datalucid.com
>>>-------------------
>>>(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
>>>in the body of the message to majordomo@renderx.com from the address
>>>you are subscribed from.
>>>(*) By using the Service, you expressly agree to these Terms of Service
>>>
>>>
>>http://www.renderx.com/terms-of-service.html
>>
>>
>>>
>>>
>>
>>-------------------
>>(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
>>in the body of the message to majordomo@renderx.com from the address
>>you are subscribed from.
>>(*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/terms-of-service.html
>>
>>
>>
>>
>>
>
>--
>Brian J. Butler
>BJB Software, Inc.
>76 Bayberry Lane
>Holliston, MA 01746
>
>E-mail: bjbutler@bjbsoftware.com
>Web: http://www.bjbsoftware.com
>Phone: 508-429-1441
>Fax: 419-710-1867
>
>
>

-- 
Brian J. Butler
BJB Software, Inc.
76 Bayberry Lane
Holliston, MA 01746
E-mail: bjbutler@bjbsoftware.com
Web:    http://www.bjbsoftware.com
Phone:  508-429-1441
Fax:    419-710-1867
-------------------
(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
in the body of the message to majordomo@renderx.com from the address
you are subscribed from.
(*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/terms-of-service.html
Received on Fri Jun 3 14:02:26 2005

This archive was generated by hypermail 2.1.8 : Fri Jun 03 2005 - 14:02:27 PDT