![]() ![]() Or they generated ridiculously large PDF files. ![]() Or they changed the resolution of the embedded images.Or they did not handle accents and multilingual characters.Either they produced PDF files with misplaced text under the image (making copy/paste impossible).I searched the web for a free command line tool to OCR PDF files: I found many, but none of them were really satisfying: Scales properly to handle files with thousands of pagesįor details: please consult the documentation.Uses Tesseract OCR engine to recognize more than 100 languages.Distributes work across all available CPU cores.If requested, deskews and/or cleans the image before performing OCR.Optimizes PDF images, often producing files smaller than the input file.When possible, inserts OCR information as a "lossless" operation without disrupting any other content.Keeps the exact resolution of the original embedded images.Places OCR text accurately below the image to ease copy / paste.Generates a searchable PDF/A file from a regular PDF.See the release notes for details on the latest changes. ocrmypdf # it's a scriptable command line program -l eng+fra # it supports multiple languages -rotate-pages # it can fix pages that are misrotated -deskew # it can deskew crooked PDFs! -title "My PDF" # it can change output metadata -jobs 4 # it uses multiple cores by default -output-type pdfa # it produces PDF/A by default input_scanned.pdf # takes PDF input (or images) output_searchable.pdf # produces validated PDF output OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched or copy-pasted. ![]()
0 Comments
Leave a Reply. |