RE: [xep-support] PDF redaction strategy

From: Kevin Brown <kevin@renderx.com>
Date: Thu Jun 25 2009 - 10:49:36 PDT

David:

Thanks for the sample and further description provided off-list. Here's my
two-cents offered up as one of my "XEP Tricks".

Since true redaction should operate within the realm of the formatted
document. You cannot scramble letters or replace them and guarantee exact
sizing. You can only do so with fixed pitch fonts or operating in a post
composed world. Therefore, I would do this in the XEP Intermediate Format
(XEPOUT). I have done a simple test that I will post here. These are the
types of projects our Services team does all the time.

Key little trick: To mark a specific position in the XEPOUT, you can use
something like this:

<fo:inline font-family="Courier" font-size="0pt">Marker1</fo:inline>

The beauty of this construct is a marker that has no size (font-size="0")
and does not Kern (font is Courier) so it can be found easily. it is
important that such things are stripped from XEPOUT prior to formatting
though, Adobe doesn't like 0pt fonts. So just considering a simple example,
you might see this in FO:

<fo:flow flow-name="xsl-region-body">
        <fo:block>Let's test the redacting of
                <fo:inline background-color="yellow">
                        <fo:inline font-size="0pt"
font-family="Courier">startredact</fo:inline>whatever<fo:inline
font-size="0pt" font-family="Courier">endredact</fo:inline>
                </fo:inline>stuff in this document.
        </fo:block>
</fo:flow>

If you run this to XEPOUT you would see the following (XEPOUT can be
obtained from the -xep switch as output type in the command line). I have
only extracted the page element:

<xep:page width="576000" height="792000" page-number="1" page-id="1">
<xep:rgb-color red="1.0" green="1.0" blue="0.0"/>
<xep:rectangle x-from="209040" y-from="635250" x-till="270408"
y-till="646350"/>
<xep:word-spacing value="0"/>
<xep:letter-spacing value="0"/>
<xep:font-stretch value="1.0"/>
<xep:font family="Helvetica" weight="400" style="normal" variant="normal"
size="12000"/>
<xep:gray-color gray="0.0"/>
<xep:text value="Let&apos;s test the redacting of " x="72000" y="637734"
width="137040"/>
<xep:font family="Courier" weight="400" style="normal" variant="normal"
size="0"/>
<xep:text value="startredact" x="212376" y="637734" width="0"/>
<xep:font family="Helvetica" weight="400" style="normal" variant="normal"
size="12000"/>
<xep:text value=" whate" x="212376" y="637734" width="34992"/>
<xep:text value="v" x="247368" y="637734" width="5700"/>
<xep:text value="er " x="253068" y="637734" width="14004"/>
<xep:font family="Courier" weight="400" style="normal" variant="normal"
size="0"/>
<xep:text value="endredact" x="267072" y="637734" width="0"/>
<xep:font family="Helvetica" weight="400" style="normal" variant="normal"
size="12000"/>
<xep:text value="stuff in this document." x="270408" y="637734"
width="116724"/>
</xep:page>

You could use XSL or a program to then process this to something like this
by finding the "startredact" and "endredact" markers (I added line spaces
for clarity):

<xep:page width="576000" height="792000" page-number="1" page-id="1">

<!-- Change all yellow rectangle(s) to black. Instead of yellow, pick your
own color scheme not used anywhere else so that they can be easily found -->
<xep:rgb-color red="0.0" green="0.0" blue="0.0"/>
<xep:rectangle x-from="209040" y-from="635250" x-till="270408"
y-till="646350"/>

<xep:word-spacing value="0"/>
<xep:letter-spacing value="0"/>
<xep:font-stretch value="1.0"/>
<xep:font family="Helvetica" weight="400" style="normal" variant="normal"
size="12000"/>
<xep:gray-color gray="0.0"/>
<xep:text value="Let&apos;s test the redacting of " x="72000" y="637734"
width="137040"/>

<!-- Create an XSL or custom program that removes everything between a
font-switch to "Courier, size 0" and (see below) -->
<!-- <xep:font family="Courier" weight="400" style="normal" variant="normal"
size="0"/>
<xep:text value="startredact" x="212376" y="637734" width="0"/>
<xep:font family="Helvetica" weight="400" style="normal" variant="normal"
size="12000"/>
<xep:text value=" whate" x="212376" y="637734" width="34992"/>
<xep:text value="v" x="247368" y="637734" width="5700"/>
<xep:text value="er " x="253068" y="637734" width="14004"/>
<xep:font family="Courier" weight="400" style="normal" variant="normal"
size="0"/>
<xep:text value="endredact" x="267072" y="637734" width="0"/> -->
<!-- The line with "endredact" in it -->

<xep:font family="Helvetica" weight="400" style="normal" variant="normal"
size="12000"/>
<xep:text value="stuff in this document." x="270408" y="637734"
width="116724"/>
</xep:page>

Then process this XEPOUT to final format (like xep -xep modified.xep -pdf).

For block areas, you can use similar markers before and after and surround
the whole area with a <block-container background-color="yourchoice"> in
order to find them. Or since it is a block area, you could also use
<rx:pinpoint label="startredact"/> before and <rx:pinpoint
label="endredact"/> after to locate them (not using the Courier, 0pt trick.

The only caveat would be when a redacted area crossed page boundaries, your
code would need to be able to handle that. It should not be a problem. Just
be sure to leave the </xep:page> intact and continue you rules on the next
page.

Kevin Brown
Executive Vice President, Sales & Marketing
RenderX, Inc.
(650) 327-1000 Direct
(650) 328-8008 Fax
(925) 395-1772 Mobile
skype:kbrown01
kevin@renderx.com
sales@renderx.com
http://www.renderx.com

-----Original Message-----
From: owner-xep-support@renderx.com [mailto:owner-xep-support@renderx.com]
On Behalf Of David E Nedrow
Sent: Wednesday, June 24, 2009 8:18 PM
To: List - XEP Support
Subject: [xep-support] PDF redaction strategy

I've been looking for a way to mark certain items for redaction. I
would basically like to add 'redact="true"' to any element that should
be redacted. When the PDF is generated any elements marked for
redaction are displayed as a black box whose sides are aligned with
the content area of the element that would otherwise have been rendered.

The closest I've come is a suggestion on docbook-apps for the
following...

<xsl:template match="text()[ancestor::*[@redact='yes']]">
        <xsl:value-of select="translate
(.,'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890-_=+!
@#$%^&*();':"<>,./?',
        
'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXX
')"/>
        <!-- Add other chars, e.g. accented chars and other languages,
as
needed -->
</xsl:template>

The output of this would be printed black on black. There are a couple
of drawback...

1) It doesn't handle elements like <imagedata/>
2) It doesn't preserve the actual flow of the document, when compared
to a non-redacted version. This is caused by a series of replacement
letters will almost never match the content area of the "regular"
output.

Has anyone ever used XEP for something similar to this? I'm thinking
that perhaps between my customization layer and XEP, I could tell XEP
to render the element as a black box.

Thoughts?

-David
-------------------
(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
in the body of the message to majordomo@renderx.com from the address
you are subscribed from.
(*) By using the Service, you expressly agree to these Terms of Service
http://www.renderx.com/terms-of-service.html

-------------------
(*) To unsubscribe, send a message with words 'unsubscribe xep-support'
in the body of the message to majordomo@renderx.com from the address
you are subscribed from.
(*) By using the Service, you expressly agree to these Terms of Service http://www.renderx.com/terms-of-service.html
Received on Thu Jun 25 11:25:43 2009

This archive was generated by hypermail 2.1.8 : Thu Jun 25 2009 - 11:26:02 PDT