Skip to content

Tag: scanning

Linux OCR with Tesseract

I'm scanning old Flor y Fauna news letters for my Dutch Hardwood Investment Wiki. I need to do this because most of these newsletters, although produced digitally, are available in the Sicirec archive only in paper form. The only graphical item these news-letters sport is a simple graphical header, so I want to convert the scans to text and put the text in a wiki article for each newsletter; I don't want to upload dozens of image-heavy PDFs just to show the original (crappy) layout. Read More »

Removing unwanted grey values in scanning white papers

When doing automated scanning, like I do to for properly organizing paper administration, the resulting images can get quite large because the background has near-white information that is still very complex to save. Imagemagick has nice solution for that; -white-threshold x%. It also has -black-threshold, should it be necessary.

Read More »