Old European languages and gothic fonts
Digitizing old texts: Gothic/Fraktur OCR
ABBYY has been developing OCR for digitizing old books since 2003, and now it supports black letter, Schwabacher, and most other Gothic fonts in English, German, French, Italian, Spanish and Latvian Gothic.
The challenge: digitizing old texts
Black letter fonts
Black letter fonts, also known as “Gebrochene Schriften” or broken scripts, first emerged as early as the 12th century, and evolved over the years to consist of a variety of derivations and font types.
The Fraktur typeface, dominant in Germany, was created on behalf of the German Emporer Maximilian and soon became popular in many parts of Europe.
Characteristics and peculiarities
Common characteristics and peculiarities of the type include the elongated s and ligatures, or “joined” letters for certain letter combinations. The frequency of its application makes the understanding of Fraktur essential for studying text and developing recognition technologies for the period between 1800 and 1938.
Now that the worldwide flow of information is becoming digital, and digital library collections are being created - so it is important to start to make historic documents available on-line.
Scanning is just the first step - Optical Character Recognition is just as important to “open” the content for humans, for search and for other analysis technologies.