OCRFeeder is a document layout analysis and optical character recognition system.
Given the images it will automatically outline its contents, distinguish between what's graphics and text and perform OCR over the latter. It generates multiple formats being its main one ODT.
It features a complete GTK graphical user interface that allows the users to correct any unrecognized characters, defined or correct bounding boxes, set paragraph styles, clean the input images, import PDFs, save and load the project, export everything to multiple formats, etc. OCRFeeder was developed as the project of the Master's Thesis in Computer Science of Joaquim Rocha.
0.805 Aug 2014 17:21
Added support for multiple image TIFFs.
Fixes for PIL importation, error when exporting a PDF with empty text areas, PDF output options in ocrfeeder-cli, getting engine name in ocrfeeder-cli, the use of newer versions of Unpaper, text in the pages icon view.
Fixex reordering pages in the icon view, issues when no locale is set, loading project with more than one page, updating the OCR engines in the BoxEditor.
Improvements: Ported the application to GObject Introspection, Scan with 300 DPI and in color mode, Use the last visited directory when adding a new image, Warn when no OCR engines are found on startup or when performing the recognition, Update the box editor's OCR controls sensitiveness according to the existence of OCR engines.