Ah, modern technology is wonderful; take a scanned image (or take a snap using a mobile camera/Digicam) and presto – OCR software extracts all the information from the image into easily editable text format.
Optical character recognition (OCR) is a system of converting scanned printed/handwritten image files into its machine readable text format. OCR software works by analyzing a document and comparing it with fonts stored in its database and/or by noting features typical to characters. Some OCR software also puts it through a spell checker to “guess” unrecognized words. 100% accuracy is difficult to achieve, but close approximation is what most software strive for.
Maybe you have already come across our previous How to Extract Text from Images (OCR) post and used JOCR, a a free OCR software tool. Or you might have set your preference for a few online OCR tools. Then again, if you have thought up ways to exploit OCR software for productivity shortcuts, then let us give you a few more tools to play with.
We will be looking at 5 free pieces of OCR software and to start off let’s see the overlooked two that are already installed on our systems.
For the occasional basic OCR stuff, MS OneNote’s optical character recognition feature is a timesaver. You might have missed it…it’s called Copy Text from Picture.
OneNote is simplicity personified. But it’s not too great for handwritten characters or even fuzzy ones. But for a quick job, I am all for OneNote’s clip and paste.
Another little used tool within the Microsoft family. It’s right there under Menu – Microsoft Office – Microsoft Office Tools – Microsoft Office Document Imaging.
Doing OCR using the document imaging tool is a bit limiting because it accepts only TIFF (or MDI) formats. But that’s not too much of a bother as any graphic application can be used to convert an image to TIFF. In the screenshot below, I have used MS Paint to convert a JPEG to a TIFF.
Again, MODI handled printed text ably, but my handwritten text was met with an ‘OCR performed but could not recognize text prompt’. Of course, do try out with your own handwriting.
So, now let’s leave the Microsoft family behind and look at three free tools which call themselves OCR Software…
The difficulty I was having with handwriting recognition using MS tools, could have found a solution in SimpleOCR. But the software offers handwriting recognition only as a 14 day free trial. Machine print recognition though does not have any restrictions.
SimpleOCR was fine with normal text, but its handling of multi-column layouts was a comedown. In my opinion, the conversion accuracy of the Microsoft tools was considerably better than SimpleOCR.
SimpleOCR (v3.1) is a 9MB download and is compatible with Windows.
Just what I was talking about in the beginning! TopOCR, in a breakaway from typical OCR software, is designed more for digital cameras (at least 3MP) and mobile phones along with scanners. Like SimpleOCR, it has a two window interface – The source Image window and the Text window.
The image sourced from a camera or a scanner in the left window gets converted to the text format in the text editor on the right. The text editor functions like WordPad and can use Microsoft’s Text to Speech engine.
For best results with your camera read there How to Get the Best Results with TopOCR page.
TopOCR (v3.1) is an 8MB download and is compatible with Windows (not tested on Vista).
This free OCR software uses the Tesseract OCR engine. Tesseract OCR code was developed at HP Labs between 1985 and 1995 and is currently with Google. It is thought of as one of the most accurate open source OCR engines available.
FreeOCR is a simple Windows interface for that underlying code.
FreeOCR (v.2.03) requires Microsoft Net 2.0 framework. The Windows XP/Vista compatible 4.38MB software can also be downloaded from this alternate site.
Free OCR tools come with their own limitations. And scanning a page has to do a lot with resolutions, contrasts and clarity of fonts. From an average user’s standpoint, 100% OCR accuracy remains a pipedream.
Though the free tools were adequate with printed text, they failed with normal cursive handwritten text. My personal preference for offhand OCR use leans towards the two Microsoft products I mentioned in the beginning.
Your own say matters. Which is your tool of choice? Do the free OCR software recognize what you through at it? And more importantly, do you recognize what they throw back at you? Let us know…