[xep-support] Re: Creating embedded index in PDF for faster searching?

From: David Clunie <dclunie@dclunie.com>
Date: Mon Jun 02 2014 - 06:54:34 PDT

"Embedded Index" as far as I know.



I took a quick look at the PDF 1.7 Reference (ISO 32000) as well as
the Adobe extensions, and it was not obvious what encoding mechanism
this application feature maps to.

For the search execution it is "Search > All PDF Documents in ...".


On 5/7/14 10:02 PM, Mark Giffin wrote:
> Sounds interesting! Do you have any idea what the official Adobe name
> for this feature is?
> Mark
> On 5/7/14 9:57 AM, David Clunie wrote:
>> Hi Mark
>> The feature I am describing is quite distinct from the separate
>> "Catalog" feature that I think you are referring to, which produces
>> separate index files, and is not what I want at all.
>> Rather, I am referring to an embedded index within each PDF file.
>> This greatly accelerates using the Find function when an individual
>> PDF file is opened, as well as greatly accelerating the Search All
>> PDF Documents in (folder) function when a bunch of files need to
>> be searched, which allows the user to quickly find stuff without
>> having to mess with configuration of separate catalogs.
>> David
>> On 4/30/14 8:56 PM, Mark Giffin wrote:
>>> I don't think Word can do this. Adobe Acrobat Professional can do this
>>> and I agree, the index it produces is vastly faster, and it will also
>>> index a whole bunch of separate PDF files in one index. It's an old
>>> feature (used to be called "Catalog") that Adobe doesn't seem to talk
>>> about anymore. If you want to automate it you might look at Adobe
>>> ExtendScript for Acrobat. ExtendScript is Adobe's JavaScript-based
>>> scripting language for products like Photoshop, FrameMaker etc. but I
>>> don't know if Acrobat supports it. But if it does you could probably
>>> write a small script to kick off this Catalog indexing, and if you're
>>> really lucky there may be a way to kick it off from the command line, so
>>> you could incorporate it into your PDF build process.
>>> Mark Giffin
>>> http://markgiffin.com/
>>> On 4/30/14 5:13 PM, David Clunie wrote:
>>>> That's a bit disappointing. If Word can do it, it would be nice
>>>> if RenderX could too (as a post-processing step if necessary),
>>>> since doing it manually in Acrobat afterwards is painful, and
>>>> I couldn't find a command line tool to do it.
>>>> David
>>>> On 4/20/14 5:45 PM, Kevin Brown wrote:
>>>>> This is not supported by RenderX and there are no plans to add it.
>>>>> This is
>>>>> an operation best performed after the entire document is created
>>>>> and not
>>>>> "as" it is being created.
>>>>> Kevin Brown
>>>>> (650) 327-1000 Direct
>>>>> (650) 328-8008 Fax
>>>>> (925) 395-1772 Mobile
>>>>> skype:kbrown01
>>>>> kevin@renderx.com
>>>>> sales@renderx.com
>>>>> http://www.renderx.com
>>>>> -----Original Message-----
>>>>> From: xep-support-bounces@renderx.com
>>>>> [mailto:xep-support-bounces@renderx.com] On Behalf Of David Clunie
>>>>> Sent: Wednesday, April 16, 2014 6:10 AM
>>>>> To: xep-support@renderx.com
>>>>> Subject: [xep-support] Creating embedded index in PDF for faster
>>>>> searching?
>>>>> Hi
>>>>> I am creating quite large PDF files that users frequently search
>>>>> within, and
>>>>> the searches are relatively slow.
>>>>> I am using the ENABLE_ACCESSIBILITY in xep.xml to created tagged PDF.
>>>>> If I load these into Acrobat and then use Advanced > Document
>>>>> Processing >
>>>>> Manage Embedded Index > Create Index, then the result is a MUCH faster
>>>>> search.
>>>>> However, I would rather generate these in the pipeline with XEP (or an
>>>>> additional pass with some other command line tool if anyone knows of
>>>>> one).
>>>>> I couldn't find anything in the manual about this, or any obvious
>>>>> option.
>>>>> David


(*) To unsubscribe, please visit http://lists.renderx.com/mailman/options/xep-support
(*) By using the Service, you expressly agree to these Terms of Service http://w
Received on Mon Jun 2 13:38:53 2014

This archive was generated by hypermail 2.1.8 : Mon Jun 02 2014 - 13:38:54 PDT