[xep-support] Encoding spaces at line ends and between words of different style

From: Armin Günther <guenther_at_ADDRESS_REMOVED>
Date: Tue May 03 2016 - 08:28:51 PDT

Hi all,

When text is copied for example by an annotation tools like Hypothes.is
from a PDF generated by XEP (or when we simply want to copy text from
PDF into another document) spaces between words of adjacent lines or
between words of different font styles (eg italics/normal) get lost.

Example 1:

PDF (Source):
word1 word2
word3 word4

word1 word2word3 word4

Example 2:

PDF (Source):
*word1* word2 word3

word1word2 word3

Is this problem related to the way spaces are encoded within PDFs and is
there a way to generate PDFs with XEP that avoids this problem? I found
a blog post at an Adobe forum saying that the problem isrelated to the
way the rendering engine generates strings:

> The problem is that the PDF may or may not have the apparent spaces
> encoded as space characters, particularly at line ends, but also
> between words and perhaps even between characters. The rendering
> engine (Ps or PDF driver) may have chosen to break "/word1 word2/"
> into two strings with two starting coordinates and no U+0020 space
> character (or alternative space characters) at all.
Source: https://forums.adobe.com/thread/1367541

Is this correct - are the spaces lost because they are not encoded at
all and the words are separated just by different starting coordinates?
If so, can this be avoided by encoding spaces when the PDF is generated?
My impression is, however, that the PDF viewer might be a source of the
problem (and not the PDF).


Dr. Armin Günther
Information Technology
Leibniz Institute for Psychology Information (ZPID)
54286 Trier, Germany
Fon: +49(0)651-201-2055
Fax: +49(0)651-201-2604
E-Mail: guenther@zpid.de

(*) To unsubscribe, please visit http://lists.renderx.com/mailman/options/xep-support
(*) By using the Service, you expressly agree to these Terms of Service http://w
Received on Tue May 3 08:28:27 2016

This archive was generated by hypermail 2.1.8 : Tue May 03 2016 - 08:28:41 PDT