[xep-support] Re: Fonts and all

From: Kevin Brown <kevin@renderx.com>
Date: Thu Mar 05 2015 - 11:56:04 PST

Here's some more information on this subject for those (thinking) of using
those wonderful Noto fonts from Google.
If you have one of these files around and examine the properties, you would
see that the fonts are marked with an Embeddability flag of "Installable".
What does this mean?
Well ... it means that if you are going to use the font in a document, you
have only two choices:

(A) You can embed the *entire* font in the document as subsetting the font
is not allowed, or
(B) You can choose to not embed anything and the recipient of the PDF
*must* have the font installed in their system in order to actually view the
content

And to note, it's not a bug. RenderX respects the embeddability flag. I
created a document using Microsoft Word and exported that to PDF.
I got a very small PDF *but* when I opened it, it had no fonts in the
document and all the text was bitmaps.
Word has a setting that states "Bitmap all fonts that cannot be emdedded."
So it too respects the flag and would not subset the font.

Well, it's not clear to me who actually thought this would be great. Neither
of these options are actually a viable solution.

For (A), just one Google Noto Font like "NotoSansCJKsc-Bold.otf" is 16.1MB.
Yes that is right. 16.1MB. It means that if you created a document and
inserted one character using this font and wanted option (A), your
(uncompressed) PDF would be at least 16.1MB. Using PDF compression it comes
down to about 4MB.

For (B), you would have to have a very controlled distribution of those PDFs
because everyone who wanted to look at it would have to have the font
installed and accessible by the OS.

So, I guess the directions in this thread are great instructions for
managing fonts, but just don't use the Google Noto (or their Adobe
equivalent) fonts.
At least all of the NotoSansCJKsc ones as those are the only ones I
examined.

Kevin Brown
RenderX

-----Original Message-----
From: Xep-support [mailto:xep-support-bounces@renderx.com] On Behalf Of
Kevin Brown
Sent: Wednesday, March 04, 2015 12:47 PM
To: 'RenderX Community Support List'
Subject: [xep-support] Re: Fonts and all

More on this to come for sure.
I did some additional testing and found the reason why the resulting PDFs
are too large using the Noto fonts.

We are now checking some things out ... it could be a bug in OTF font
embedding or it could be that (some of) the fonts have an improper flag
set.
The resulting PDFs have the entire font embedded.
Once I know more I will post back.

Directions are good, just the Noto fonts I used seem to be an issue.

Kevin Brown
RenderX

-----Original Message-----
From: Xep-support [mailto:xep-support-bounces@renderx.com] On Behalf Of
Kevin Brown
Sent: Tuesday, March 03, 2015 2:52 PM
To: 'RenderX Community Support List'
Subject: [xep-support] Fonts and all

A few recent questions have been asked in Stackoverflow and also in our own
support groups about fonts. I thought a nice sample here would be
appropriate.

1) First, get access to a Unicode font that contains the characters you
want.

Well there are many places and we are not a font repository. However, one
great place to look is Google and specifically Google’s fonts. They are
trying to organize a complete set of fonts for most all of the world’s
languages. There are certainly many more places and the group is welcome to
chime in. I happen to have a nice setup with many of the “noto” fonts on
Google. See https://www.google.com/get/noto/#/

2) Next, add those fonts to you configuration.

Again, there are many fonts types and many places to put them. Here I am
only going to describe how I do it to stay organized. I prefer to organize
all my fonts under the RenderX installation. Upgrades will preserve them and
it makes it easier to deploy to other machines. You can have your own
methods for sure, organize them into your own areas.
The "xep.xml" file contains all the information about fonts. There are many,
many different settings and you can find them all in our documentation. I
will just cover adding a complete family of the Noto fonts here for
Simplified Chinese.

First I downloaded the ZIP archive of Simplified Chinese Noto fonts from
here: https://www.google.com/get/noto/#/family/noto-sans-hans
In the RenderX installation directory, there is already a directory called
"fonts/".
That directory contains the base14 fonts that all installations should come
with that deal with PDF (the 4 variations each of Helvetica, Times and
Courier plus Symbol and ZapfDingbats).
I created a subdirectory under this directory called "Noto/" to handle all
my google Noto fonts.
I unzipped the Simplified Chinese fonts in this directory.
It contains 7 fonts, all starting with "NotoSansCJKsc" with various weights
(Light through Black).
So we need to add these all (if we need them all) to "xep.xml".
Here's a snippet from my "xep.xml" showing how they are added:

  <fonts xmlns="http://www.renderx.com/XEP/config" xml:base="fonts/"
default-family="Helvetica">
        ...

    <!-- Google Noto Fonts -->
    <font-group xml:base="Noto/" label="Noto" embed="true" subset="true"
initial-encoding="standard">
      <font-family name="NotoSansCJKsc">
        <font weight="100"><font-data otf="NotoSansCJKsc-Thin.otf"/></font>
        <font weight="200"><font-data otf="NotoSansCJKsc-Light.otf"/></font>
        <font weight="300"><font-data
otf="NotoSansCJKsc-DemiLight.otf"/></font>
        <font><font-data otf="NotoSansCJKsc-Regular.otf"/></font>
        <font weight="500"><font-data
otf="NotoSansCJKsc-Medium.otf"/></font>
        <font weight="bold"><font-data otf="NotoSansCJKsc-Bold.otf"/></font>
        <font weight="800"><font-data otf="NotoSansCJKsc-Black.otf"/></font>
      </font-family>
    </font-group>
  </fonts>

Now, a few points of interest here.
The <fonts> section contains the overall set of <fonts> available to
RenderX. It has an attribute "xml:base" that instructs RenderX to look
inside the "fonts/" directory for all fonts *where explicit paths are not
specified*.
I created a "Noto/" directory under that directory so I add an "xml:base" to
my <font-group> that points to this directory.
Now, all the files in that directory can be accessed by relative paths.
In other words ... I have a file under the RenderX installation as
"fonts/Noto/NotoSansCJKsc-Thin.otf".

Also note the font-weight's.
Noto fonts (for ideographic languages) could be provided in several weights
like Thin, Light, DemiLight, etc.
So, in XSL FO/RenderX terms we have:

Thin = 100
Light = 200
DemiLight = 300
Normal or Regular = 400 (or no weight specified) Medium = 500 Bold = 700 (or
"bold" specified for font-weight in XSL FO) Black = 800

So the above "xep.xml" fragment maps all of the various possible weights
that Google fonts provides. Of course, you do not have to use (or allow
people to use) all of these. They are designed correctly as individual font
files. So each has to be handled separately and included separately. More on
this at the end.

3) Format documents referencing the fonts you specify.

You specify the font-family (at least) and in this case also the font-weight
to cause the font selected. You would use the <font-family> @name in
"xep.xml" to reference it.
So for example, here's a bit of FO that would cause all of the above fonts
to be used:

                        <fo:block-container font-family="NotoSansCJKsc">
                                <fo:block>Chinese Sample using
NotoSansCJKsc</fo:block>
                                <fo:block space-before="6pt"
font-weight="100">大橋全長二千六百八十三點五八五米,其中主橋長六百三十六點六
米。主橋為雙塔雙預應力混凝土邊主梁斜拉橋。主塔呈鑽石型,高一百0六點九米,塔
的上方套紅鐫刻江澤民題寫的橋名。雙主塔通過一百七十六根斜拉索承載橋面。主橋橋
面寬二十九點八米;橋下水位通航淨高為二十四米,可通行三千吨級的輪船。
</fo:block>
                                <fo:block space-before="6pt"
font-weight="200">大橋全長二千六百八十三點五八五米,其中主橋長六百三十六點六
米。主橋為雙塔雙預應力混凝土邊主梁斜拉橋。主塔呈鑽石型,高一百0六點九米,塔
的上方套紅鐫刻江澤民題寫的橋名。雙主塔通過一百七十六根斜拉索承載橋面。主橋橋
面寬二十九點八米;橋下水位通航淨高為二十四米,可通行三千吨級的輪船。
</fo:block>
                                <fo:block space-before="6pt"
font-weight="300">大橋全長二千六百八十三點五八五米,其中主橋長六百三十六點六
米。主橋為雙塔雙預應力混凝土邊主梁斜拉橋。主塔呈鑽石型,高一百0六點九米,塔
的上方套紅鐫刻江澤民題寫的橋名。雙主塔通過一百七十六根斜拉索承載橋面。主橋橋
面寬二十九點八米;橋下水位通航淨高為二十四米,可通行三千吨級的輪船。
</fo:block>
                                <fo:block space-before="6pt">大橋全長二千六
百八十三點五八五米,其中主橋長六百三十六點六米。主橋為雙塔雙預應力混凝土邊主
梁斜拉橋。主塔呈鑽石型,高一百0六點九米,塔的上方套紅鐫刻江澤民題寫的橋名。
雙主塔通過一百七十六根斜拉索承載橋面。主橋橋面寬二十九點八米;橋下水位通航淨
高為二十四米,可通行三千吨級的輪船。</fo:block>
                                <fo:block space-before="6pt"
font-weight="500">大橋全長二千六百八十三點五八五米,其中主橋長六百三十六點六
米。主橋為雙塔雙預應力混凝土邊主梁斜拉橋。主塔呈鑽石型,高一百0六點九米,塔
的上方套紅鐫刻江澤民題寫的橋名。雙主塔通過一百七十六根斜拉索承載橋面。主橋橋
面寬二十九點八米;橋下水位通航淨高為二十四米,可通行三千吨級的輪船。
</fo:block>
                                <fo:block space-before="6pt"
font-weight="bold">大橋全長二千六百八十三點五八五米,其中主橋長六百三十六點
六米。主橋為雙塔雙預應力混凝土邊主梁斜拉橋。主塔呈鑽石型,高一百0六點九米,
塔的上方套紅鐫刻江澤民題寫的橋名。雙主塔通過一百七十六根斜拉索承載橋面。主橋
橋面寬二十九點八米;橋下水位通航淨高為二十四米,可通行三千吨級的輪船。
</fo:block>
                                <fo:block space-before="6pt"
font-weight="800">大橋全長二千六百八十三點五八五米,其中主橋長六百三十六點六
米。主橋為雙塔雙預應力混凝土邊主梁斜拉橋。主塔呈鑽石型,高一百0六點九米,塔
的上方套紅鐫刻江澤民題寫的橋名。雙主塔通過一百七十六根斜拉索承載橋面。主橋橋
面寬二十九點八米;橋下水位通航淨高為二十四米,可通行三千吨級的輪船。
</fo:block>
                        </fo:block-container>

That's all there is to it. Get a font, store it somewhere and add it to the
configuration. Then add text and make sure you tell XEP (through XSL FO) to
use that particular font.

Now, some specific hints on fonts.

I love (and hate) the Noto fonts above. Why both?
Love: Well, they provide great coverage of all the characters. If you are
producing technical documentation then great, this is a great use of these
fonts.
Hate: They are HUGE (15MB file) and very large even when subsetted and
embedded. I would not necessarily use them to print 100,000 invoices, I
would use them to do a technical manual.

The little text above with fonts embedded and subsetted is a CRAZY 8 MB.
Insane! The same exact document with Simsun font (which only would have one
weight) is only 32kb.
And If I format that simple little document above which access the 7 fonts
listed (15mb each) for the simple characters in it, it takes 20 seconds to
format.

We had another example where a customer wanted to use "Arial Unicode" font.
This font file is 37MB and is copyrighted. They were creating letters and
were using "Arial Unicode" to select a specific bullet character they liked.
The "Arial Unicode" copyright notice that is required to be inserted when
embedding the font is about 250kb alone so that a simple letter with one
bullet is 300kb/file. The processing time to process each document was about
3 seconds.
I selected a similar bullet from "ZapfDingbats". The results were 12kb file
size and 12 documents processed per second. Yes, 3 seconds per document went
to 12 documents per second. A "slight" improvement.

So, try to use a tight, controlled font -- only what you need for a
particular application. The smallest font file possible that has all the
glyphs you need.

Kevin Brown
RenderX

_______________________________________________
(*) To unsubscribe, please visit http://lists.renderx.com/mailman/options/xep-support
(*) By using the Service, you expressly agree to these Terms of Service http://w
ww.renderx.com/terms-of-service.html
Received on Thu Mar 5 11:53:05 2015

This archive was generated by hypermail 2.1.8 : Thu Mar 05 2015 - 11:53:12 PST