Sunday, March 11, 2018

windows - Apply metadata to files based on OCR of their content

Essentially I'm looking for a tool (Windows/Mac) that will allow me the following workflow:





  1. Scan ALL my documents into a folder (200-300 scanned images)

  2. run the tool that will go through all the files and run OCR on them

  3. based on the OCR, meta-data is applied onto each file.

  4. I then read the meta-data and accordingly categorize files through a batch process.



While there are quite a few suggestions in SU & SE for doing plain OCR on files, I wasn't able to find a solution that essentially allows me to do programmatic like stuff based on the OCR-ed data from the documents.



The document template is standard so we know what kind of file to expect. We just want to scan the whole bunch and then run a backend process that neatly categorizes/uploads into respective folders. Having it OCR'ed gives me option to search within the file, while i open it in a program like Acrobat reader/Preview. But I want to run this categorization logic from a batch/shell/apple script. Stuff like prefixing the document number to filename etc.

No comments:

Post a Comment

hard drive - Leaving bad sectors in unformatted partition?

Laptop was acting really weird, and copy and seek times were really slow, so I decided to scan the hard drive surface. I have a couple hundr...