Artificial intelligence (AI) plays an increasingly significant role in the further progress of business technologies. ABBYY's innovative AI-based technologies and solutions are changing the way people work and defining the next phase of digital transformation. But what does AI do specifically for the product to which this blog is devoted – ABBYY FineReader? How important is AI for the quality of OCR, which, as we discussed earlier in the blog, is one of the crucial components that makes FineReader an efficient and versatile PDF tool for any kind of documents: digital and scanned PDFs, scans, and papers? Today we are talking about that with Ivan Zagaynov, Deputy Head of Core Recognition Technology Group in ABBYY.
AI trends in OCR and FineReader
When working with any kind of images of documents, be it an image file or a scanned PDF, we have to use OCR. OCR is not a new technology, but it is constantly developing. What are the latest trends there?
Ivan Zagaynov: Yes, we continue to develop our OCR technology to achieve even better results in more complex cases. General trends are to use AI approaches like machine learning and deep learning techniques more widely to solve more and more generic tasks. Sometimes the quality of document images is very low, and we need to (at least, many of our customers expect that we should) work with photos, including such complicated examples as documents with folds, with jagged edges, held by hand in front of a mobile camera, and so on.
There are two general approaches to these challenging cases. The first is to correct document images, making them look like ideal scans and therefore improving the resulting quality of the existing OCR engine. However, this approach has some limitations and is not suitable for all cases. Besides, there is no single mechanism that corrects all kind of image degradations perfectly. There are many automated stages in this process (for example, find document edges, find skew angle, correct brightness and contrast, etc.), and if any of them fail, the result of the OCR will be poor. The second approach is to apply neural networks to the original corrupted image to extract necessary objects, i.e. text lines, separators, tables, barcodes, stamps, etc. We can make geometric and photometric corrections to individual objects and recognize them using dedicated pre-trained neural networks without making any global corrections to the whole image.
Thinking about OCR in general now, it is no more just about robust recognition of ideal document scans, but rather about extracting structured textual information from any source image. And recently developed AI approaches are extremely valuable for such non-trivial tasks.
FineReader 15 is an AI-powered PDF solution. What kind of AI is there in it? There’s a common opinion that AI must be a thing that can learn and improve itself over time as people do …
I.Z.: First, let me clarify the definition of AI in this context. We think of it as an area of computer science about the design of artificial machines and algorithms that work and react like humans. Usually it is called “weak AI,” which is in fact very capable thing, but focused on a solution of specific, well-defined problem. Why am I clarifying this? Because sometimes the scientific label “AI” refers to some general artificial intelligence, or so-called “strong AI,” which is attributed to the human-like behavior capable of solving any new, previously unseen problem.
Although FineReader doesn’t learn as people do, on a micro-level, for each page processed, there is some “gaining” experience that helps it to obtain a better result. When FineReader starts to recognize a page, it has a number of pre-trained neural network models, for example, for character recognition tasks, but there are also mechanisms that calculate some statistics and “fine-tune” the recognition process “on the fly.” Thus, character-by-character, line-by-line, FineReader actually uses previously obtained experience during the processing of the whole page. This helps us to gain in speed and quality and to get better results for the whole page of natural language text, compared to the recognition results of separate lines or a random sequence of characters. Context information here plays a significant role, and it is used in our OCR engine in a similar manner as a human does when he/she reads text: humans often predict words and verify themselves based on the meaning of the whole sentence. However, this “gained” experience is not shared between pages or documents. On the other hand, this helps to keep reproducible behavior on a page level and correctly use context on every page.
Reading languages with neural networks
What were the latest improvements made for FineReader 15 that couldn’t happen without using AI technologies?
I.Z.: I believe without AI technologies we could not get the great improvement of speed and quality in the recognition of complex scripts like Chinese, Japanese, and Korean. We actually made it with the help of our newly invented architectures of deep convolutional neural networks for character recognition tasks. Previous approaches, including those using traditional machine learning, had unstable results in difficult cases and worked more slowly in general. In addition, we now use new “word language models” inside the recognition process. Models estimate the probability of the occurrence of a word in the environment of other words to correct some ambiguities between similar OCR results. They were pre-trained on large amount of text so that they actually keep inside some knowledge about specific natural languages.
The newest AI algorithms help a lot with the languages based on a complex script, such as Chinese, Japanese, and Korean. Is there any AI helping with more traditional (in terms of how the FineReader OCR technology has been developing historically) Latin- and Cyrillic-based languages?
I.Z.: Yes, there are neural networks helping to recognize these languages as well. Script complexity is just about the correct choice of the model architecture that allows us to reach optimum balance between OCR speed and quality. Therefore, we use different models for Latin, Cyrillic, Arabic, and other script types. For example, for Arabic script we use an end-to-end approach for word recognition without character separation. A special architecture, combined of convolutional and recurrent neural networks, solves that task. For Latin and Cyrillic, we use a mixed approach and switch between different recognition models based on the visual quality of the text. This helped us to significantly improve both speed and accuracy.
Not only for “pure” recognition
Is there any AI “outside” of OCR in FineReader? When analyzing a document structure and its elements, detecting exact structure of tables, preprocessing images, working with digital PDFs, maybe?
I.Z.: Of course, we apply AI techniques on almost any stage of our document processing pipeline. Analyzing document structure or image preprocessing usually involves a large number of object detection and classification tasks. Nowadays many of them are solved with the help of deep learning approaches as well as with traditional machine learning. When we talk about table or barcode detection, or finding a header or footer on the page, there are pre-trained models that find relevant objects on document images.
For example, accuracy of table detection has been further improved in FineReader 15 by using a combination of a convolutional neural network and a linear partition graph for document analysis. The neural network defines probabilities for every pixel to belong to different types of elements – text areas, tables, pictures, etc. – on the whole page. The graph then accurately finds all the elements themselves, in their integrity, based on the information obtained from the neural network. The resulting improvements are especially noticeable in difficult cases: on documents with complex multicolumn structure, background images, on tables with no separators, and so on.
And many of those processing stages you have mentioned are performed also when you just work with PDF documents themselves with no intention to convert them, like copying tables, searching for keywords in a scanned PDF, or comparing two copies of a document to find differences, right?
I.Z.: Absolutely! AI techniques are utilized for enabling many PDF and document comparison functions of FineReader, not just document conversion.
Solving document tasks well needs intelligence
Is there any task that FineReader, as a PDF and document conversion solution, solves that couldn’t be solved successfully without AI?
I.Z.: The definition of AI itself implies that “it is an algorithm that work and react like humans,” no matter what is inside of it: we may use some heuristic solution or neural networks that perform just like a human being. So, if some task was solved successfully (on that high level of customer expectations), then we should say that there is at least so-called “weak AI” behind it. There are some very good examples, which our solution delivers: we can make any PDF document searchable on the fly, convert it to any other format like word DOC, XLS, etc., and even edit scanned PDFs on a paragraph level. Moreover, we can create a PDF from any scan or digital photo and preserve the original document layout and style formatting, like columns, font, and text color. Is all that possible without AI?
To sum up, what could you tell our readers as to how important AI technologies are for software that is intended to facilitate and automate various document-related tasks?
I.Z.: I would say that the incorporation of the latest technologies in our products allows us to deliver the best possible solutions to our customers, and since it is our mission to satisfy them with a new level of automation, the importance of AI technologies may not be underestimated. It is critical to be on the cutting edge!
Try it yourself: Test AI-powered ABBYY FineReader 15 in your daily work with PDF and scanned documents, and experience the effectiveness and convenience it brings.