[xep-support] Re: Mixed languages in a single PDF

From: Bisswanger, Ramon <ramon.bisswanger@hp.com>
Date: Tue Dec 04 2012 - 00:13:07 PST

Hi Darren,

the font selection can be configured in your XSL:FO: http://www.w3.org/TR/xsl/#font-selection-strategy
If you set the value to "character-by-character" then only Chinese characters will be replaced by Sim Sum and the Latin/Western ones will all be in Arial

Best Regards,

From: xep-support-bounces@renderx.com [mailto:xep-support-bounces@renderx.com] On Behalf Of Darren Munt
Sent: Dienstag, 4. Dezember 2012 04:49
To: 'RenderX Community Support List'
Subject: [xep-support] Re: Mixed languages in a single PDF

Thanks for this Kevin. We're producing and downloading these on the fly, so we need to be mindful of processing time and file size..

The most recent example of the issue was when a Chinese client set up the records from which the reports are derived, they entered people's names using Chinese characters. They then wanted to produce the reports in English for their head office in Australia. Because we don't know that they've entered text that's not displayable in our default font for the English report, they got the reports in English but without the names. The names could just as easily have been in English or Greek or any other language, so it makes it diff icult to plan a font stack for the documents in advance.

With the method you suggest, does the font selection take place at document, block or character level?

For example if I specified something like font-family="Arial, Sim Sum", would XEP be able to display just the Chinese characters in Sim Sum, or would it select Sim Sum for the entire document?

From: xep-support-bounces@renderx.com [mailto:xep-support-bounces@renderx.com] On Behalf Of Kevin Brown
Sent: Tuesday, 4 December 2012 2:18 PM
To: 'RenderX Community Support List'
Subject: [xep-support] Re: Mixed languages in a single PDF

You could do something like this:

font-family="Helvetica, Arial, Arial Unicode"

make sure all of these are mapped in your xep.xml. it's a font selection strategy that will select the appropriate font for something based on whether the character is in the font list. This would choose a font for the text in order --- Helvetica first (no impact), Arial second (would be embedded and hopefully subsetted) and Arial Unicode last (you of course have to have the font and license to use it, it is huge and has many, many glyphs).

Now, I would give this as a solution if you are formatting a few documents, occasionally. If you are looking for high performance formatting then no way would I do this - Arial Unicode is a 14MB-27MB font (depends on which version you have) and the prolog for the font itself which must be embedded in the PDF is 512KB (we honor the wishes of copyright holders and embed their copyrights into documents when their fonts are used unlike other formatting engines which ignore them). If you are looking for performance then you need to plan your system accordingly and select fonts or a list of fonts that have the glyphs you need.

Kevin Brown

From: xep-support-bounces@renderx.com<mailto:xep-support-bo%0d%0a%20unces@renderx.com> [mailto:xep-support-bounces@renderx.com] On Behalf Of Darren Munt
Sent: Monday, December 03, 2012 6:14 PM
To: 'xep-support@renderx.com'
Subject: [xep-support] Mixed languages in a single PDF

We have a particular problem with some of our documents, which combine system-generated text with user input. We produce the same report in many different languages and sometimes we have an issue whereby user input is in a different character set to the main document language. For example we produce a report in English but some of the user text has been entered in Chinese. There is no way of telling what language the user text might be in, apart from either asking them when they enter it or doing some sort of language-detection (wh ich is how I understand the web browser does it for example).

When the report is generated in English, we use the Arial font to display text and it does not have character mapping for the Chinese characters, so they do not appear. When we generate the report in Chinese, we use the Sim Sum font, which does display Chinese, but any English text appears in the same font and it's not a great looking font for Latin characters.

Using Sim Sum by default is not an option because of the appearance of the Latin text and also we support many other languages, including Greek and Arabic, so we really need to be able to specify the font based on the language selected for the report. It's quite possible in our system for a report to contain any number of different languages in a multi-national or multicultural scenario.

I've told the client I don't think there is a way of being able to support ad hoc language changes within the document this way, but I thought I would throw it out there in case there's something XEP can do that I don't know about. Any suggestions gratefully accepted.


(*) To unsubscribe, please visit http://lists.renderx.com/mailman/options/xep-support
(*) By using the Service, you expressly agree to these Terms of Service http://w
Received on Tue Dec 4 00:01:39 2012

This archive was generated by hypermail 2.1.8 : Tue Dec 04 2012 - 00:01:54 PST