RE: [xep-support] Preserving whitespace in programlisting output

From: Jim Melton <jim.melton@acm.org>
Date: Mon Jul 11 2005 - 07:38:00 PDT

Kenneth,

The problem is caused by the very nature of PDF and its ancestor
PostScript. There are no "margins" on a PDF page. Each character is
placed at a specific absolute point on the page (well, more precisely, the
first character of each little group of sequential contiguous characters is
placed at such a point and the subsequent characters in the group are
placed at that same point plus a horizontal offset determined by the
"width" of the preceding character(s)).

Thus, if you have the following text:
=====
This is some sample text.
    This indents 3 spaces.
This is more text.
=====
the second line does not "start" at the same place as the first and third
lines. It starts at its own left-most point without any preceding space
characters at all. The fact that the starting point for the second line is
offset to the right from the starting point of the first and third lines
by three times the width of a space character is not captured in the PDF
data at all.

Therefore, when you copy text from the PDF file, there *are no spaces* at
the start of indented lines to be copied. Period.

Try generating some PDF (in non-compressed mode) from any application,
including Acrobat (as well as XEP) and looking at the internal structure of
such lines.

Hope this helps,
    Jim

At 7/11/2005 02:49 AM, Kenneth Johansson wrote:
>Hi David,
>
>In general I agree with you that HTML might be a better choice for online
>reading, but the readers require PDFs in this case, so creating another
>format is not an option.
>
>We use CHM and PDF for User guides and PDF for sysadm, installation and
>upgrade guides.
>
>The installation engineers use the PDF both online and in binders. Mostly
>they copy commands but occasionally they copy chunks of code, like this:
>
>WISE =
>(DESCRIPTION =
>(ADDRESS_LIST =
>(ADDRESS =
>(PROTOCOL = TCP)(HOST = <WISE_HOST>)(PORT = 1521)
>)
>)
>(CONNECT_DATA = (SERVICE_NAME = WISE)
>)
>)
>
>which was copied from a PDF loosing all the indention.
>
>Btw, I don't have a problem with tabs since we don't use tabs in our
>programlistings, but rather with whitespaces which I'd expect would be
>available in the PDF.
>
>/Kenneth

========================================================================
Jim Melton --- Editor of ISO/IEC 9075-* (SQL) Phone: +1.801.942.0144
   Co-Chair, W3C XML Query WG; F&O (etc.) editor Fax : +1.801.942.3345
Oracle Corporation Oracle Email: jim dot melton at oracle dot com
1930 Viscounti Drive Standards email: jim dot melton at acm dot org
Sandy, UT 84093-1063 USA Personal email: jim at melton dot name
========================================================================
= Facts are facts. But any opinions expressed are the opinions =
= only of myself and may or may not reflect the opinions of anybody =
= else with whom I may or may not have discussed the issues at hand. =
========================================================================

-------------------
(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
in the body of the message to majordomo@renderx.com from the address
you are subscribed from.
(*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/terms-of-service.html
Received on Mon Jul 11 08:15:59 2005

This archive was generated by hypermail 2.1.8 : Mon Jul 11 2005 - 08:16:01 PDT