FineReader blog

Redacting a PDF document is not rocket science

redacting pdf

Time and time again information is being disclosed in high-profile legal cases because of epic redaction fails.

It is true, that the PDF format has its specifics, that make redaction somewhat tricky, but understanding the basics and using the right tools makes it simple and efficient.

Why do redaction “accidents” happen?

For sure placing black bars over the text lines will not do the job. Simply deleting the bars, searching through the document, or copy pasting the passage will uncover the text underneath.
One should have in mind that PDF files often consist of multiple layers, for example, a page image (i.e. the scan of a document) and a text layer (placed under the page image by applying OCR). Obviously, both layers contain the text of the document.

pdf layers

Proper redaction solutions will make sure to remove the text from all layers of the document, including the page image layer, and REPLACE it by a black (or any other color) bar, not just cover it.

How to redact information in your documents reliably

In ABBYY FineReader you can find a “Redact” tool which removes selected text from all layers of the document.

redact tool finereader

This tool can be used either to highlight text passages that need to be redacted manually, while working through the document or by searching for a keyword, name, number etc., selecting some or all places where the searched keyword appears and applying redaction to all of them at once throughout the document.

redact pdf

Before redaction: the software highlights where the searched keyword is found

pdf redaction

After redaction: there are no search results for the named keyword

Important: some precaution is still needed – in some cases, technology cannot solve all problems and some human oversight is recommended. Here are some examples:

  • A PDF document includes a company logo, like the ABBYY logo, for example. In a “digital-born” PDF the company logo is an image, will therefore not be found by the search function, and will not be redacted. It is similar in scanned PDFs too: the company logo can be treated by the software as image even if it contains some text. In both cases, you can redact it manually by drawing an area around it using the “Redaction” or “Eraser” tools.
  • When reviewing so-called “image-only” PDF documents (i.e. a document scan) – which means there is no digital text available for keyword search in the redaction process. For ABBYY FineReader this is not an issue – the software will detect automatically that the document does not have a text layer and will make it searchable while it is open. Make sure that the “Enable background recognition” option (which is turned on in the software by default) is always on when you redact documents:
еnable background recognition
  • Your document includes a photo on which a name can be seen. For many reasons this name may not be found using search. You can use the “Redaction” or the “Eraser” tools to remove the complete photo or only the part containing the text.
  • You can use the “Redaction” or the “Eraser” tools to conceal faces in pictures too.

Redaction should not stop with the obvious

Besides the text layer, added to scanned documents by OCR (Optical Character Recognition) to make them searchable, PDFs may include other information that is not immediately visible to the reader. Such information may be hidden in the document properties (metadata), in comments, in attached files, in bookmarks etc. For example, the author of the document, discussion between a client and the attorney, names mentioned in the document.
ABBYY FineReader can find keywords in the document properties (metadata) and comments and will separate them from the keywords found in the document text itself in the search results.
When you apply redaction to these areas FineReader replaces the redacted keyword by ***.

pdf comments finereader

Comments and metadata before redaction

pdf comments redaction

Comments and metadata redacted

If you want to make sure that “hidden objects” such as comments, metadata, attachments, bookmarks, etc. are removed from the document, you can use the “Delete Objects and Data” tool in FineReader:

pdf redaction mode

Here you can select the objects that you would like to permanently remove from your document and apply.

sanitizing pdfs

Sanitizing PDF documents

PDF redaction with ABBYY FineReader in action

Try ABBYY FineReader yourself – redact your document and then try to copy and paste the redacted text, search it, or even convert the whole document to Microsoft Word – you won’t be able to find a way to reveal what the redacted information was.

Subscribe to our newsletter

Enter email and find out how to unlock efficiency with a free copy of FineReader e-book.

redirectUrl: formName: MobileSubscriptionForm
marketo id:
qs parameters:

I am aware that my consent could be revoked at any time by clicking the unsubscribe link inside any email received from ABBYY Solutions Ltd. or via ABBYY Data Subject Access Rights Form.

By submitting this form, I consent to the use of my personal information for the purposes described in the Privacy Notice.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Your subscription was successful!

Connect with us