Digitizing old texts: Gothic/Fraktur OCR

ABBYY has been developing OCR for digitizing old books since 2003, and now it supports black letter, Schwabacher, and most other Gothic fonts in English, German, French, Italian, Spanish and Latvian Gothic.

The challenge: digitizing old texts

  • Black letter fonts

    Black letter fonts, also known as “Gebrochene Schriften” or broken scripts, first emerged as early as the 12th century, and evolved over the years to consist of a variety of derivations and font types.

  • Fraktur typeface

    The Fraktur typeface, domi­nant in Germany, was created on behalf of the German Emporer Maximilian and soon became popular in many parts of Europe.

  • Characteristics and peculiarities

    Common characteristics and peculiarities of the type include the elongated s and ligatures, or “joined” letters for certain letter combinations. The frequency of its application makes the understanding of Fraktur essential for studying text and developing recognition technologies for the period between 1800 and 1938.

  • Digital information

    Now that the worldwide flow of information is becoming digital, and digital library collections are being created - so it is important to start to make historic documents available on-line.

  • Scanning

    Scanning is just the first step - Optical Character Recognition is just as important to “open” the content for humans, for search and for other analysis technologies.

Recognition of old European documents and Gothic fonts in books printed in 18-20th centuries is available in the software products ABBYY FineReader PDF 15 and ABBYY FineReader Server.

If you are developing own software systems for recognizing text in historical documents, the software development kit ABBYY FineReader Engine or the web-based recognition service ABBYY Cloud OCR SDK can be integrated into your software to add the requested functionality for recognizing Gothic fonts.