Unsurprisingly, OCR is consistently a hot topic in PDF and the PDF user mind in general. In paper intense work environments, PDF conversion and OCR engines have proven to be a successful work-around for transferring paper files into word processing applications. Thus, with the help of scanners and the PDF format, any and all types of paper work can be done electronically and efficiently. Or can it?
While trying to integrate and transfer every non-digital working habit into an electronic equivalent, there are still some things that just can’t be done with ease using the same everyday tools. For instance, what about converting hand printed/written documents?
Three Flavours Of OCR
Many of you have probably wondered why such a thing can’t be done with the OCR technology in PDF conversion products. Well, this is because OCR technology and devices are only capable of recognizing the machine printed characters and fonts. And seeing as how the number of documents that are being scanned in are usually typewritten, OCR is employed in almost all cases.
In other cases, there are documents that contain handwritten sections and/or fields that are used for collecting data—a thing being slowly superseded by the fill-able PDF form. You can create a digital copy from such a document simply by scanning it in, right? Yes. However, it requires a different recognition technology altogether. Using OCR, you can perhaps get maybe one letter to “OCR” into ASCII, if it’s printed clearly and written in ink that’s thick enough to be read. But that’s about it. This is where another flavor of OCR comes in: Intelligent Character Recognition.
ICR is a more advanced form of OCR that translates hand printed letters into digital ASCII equivalents. This version of OCR is primarily used for processing applications and forms on which you “print clearly” and place individual letters in boxes. This structured method of reading a hand printed document is one of the major limitations of the technology, but controls and reduces the amount of human errors that cause misinterpretations.
In addition, there are documents that contain handwriting—aka cursive writing. Can recognition on such documents be performed? The answer: Yes. The third flavor of OCR is IR (Intelligent Recognition), the latest generation of OCR technology to date. This is used to read unconstrained writing (text not contained in boxes) and uses the same methods to translate the characters into ASCII text. From my online searching, there are a good number of companies that provide full fledged OCR/ICR/IR solutions, which can be integrated with digital workflows.
Thus, if you’re looking to OCR handwritten PDFs, you’ll be sorely disappointed. The ability to do everything and anything with technology is perhaps the ultimate goal for developers and users. Practicing it, on the other hand, is perhaps the ideal goal for every worker bee out there. It’s sad to say, but there are some cases in which you can only do so much.