Monday, September 26, 2016

the maximum page limit in OmniPage

I use OmniPage Ultimate (v19) all the time. I am slowly scanning in my academic book collection and converting the books into searchable PDF. If I was a hard-core FOSS guy I would use some kind of Tesseract-based workflow, but OmniPage just sort of works (once you get the settings tweaked right), it's super-fast, and I'm already time-starved rather than cash-starved. (I do wish it did a better job with equations, though.)

The thing about academic books is that they are often very long, and the OmniPage Batch Manager (aka DocuDirect) tries to limit you to 500 pages per output file. It's actually ok if your input file is longer, but the output files are chunked into units of (at most) 500 pages, which you have to combine by hand. I imagine this may have to do with internal implementation details; or it may just be something to upsell you to their SDK product. I have no idea. They implement this by limiting what you can enter as the maximum page limit in the Batch Manager Options GUI (i.e., as a text entry validation).

However, now that PCs routinely have large memories, and 500 pages is not an exact limit anyway, you may want to risk the occasional OmniPage crash and use a larger page limit.

It turns out that the Batch Manager options are stored in:
C:\Users\<user>\AppData\Roaming\ScanSoft\OmniPage 19\Job\BMSettings.dat
If you can use a binary editor (like hexl-mode in emacs) you can identify which bytes correspond to the page limit and set them.
  1. MAKE A COPY of your current BMSettings.dat in case you trash your working copy and want your current settings back.
  2. Open the DocuDirect GUI and use the Options pane to set the page limit to something between 256 and 500 that is easily detectable as two hex bytes. For example, 319 = 0x1FF.  Fully close DocuDirect (this means exiting the OmniPage Agent in your toolbar, not just closing the GUI).
  3. Open BMSettings.dat in your binary editor and look for the string "SetPageLimit" (hexl-mode helpfully shows you the ASCII in a display next to the hex values). Then look for your two chosen hex bytes (e.g., 0x01 and 0xFF) in the dozen-odd bytes following it. Change those two bytes to something that represent a larger page limit, like 0x13 and 0x88 (= 5000), respectively.
  4. Now open the DocuDirect GUI again. You should now see your chosen page limit in the Options pane. (If you touch this value in the GUI, you will be subject to the "500 page" validation again, so don't do it.)