DIY Book Scanner/OCR

From Wikibooks, open books for an open world
Jump to navigation Jump to search

tesseract OCR software was developed by HP, placed in the open source domain and more recently has been updated by Google. It's free and high quality so worthy of note. It's a command line driven an example .bat file in the windows environment would be:

tesseract image.tif outputbase

to use a white list where digits is the name of the white list

 put this in a text file called tessdata/configs/digits:
 tessedit_char_whitelist 0123456789

tesseract image.tif outputbase nobatch digits