-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't load OSD data #33
Comments
This is a duplicate of #23, but not having an OS X system prevents me from doing any debugging in regard to that. Sincerely I think it's an issue with how tesseract-ocr is compiled on OS X since the library doesn't export anything to define load paths from what I recall. In short, the only advice I have is to look carefully at the configure options when building tesseract-ocr and hope for the best. |
Alrighty, thanks @meh, i'll try to take a poke around homebrew's tesseract recipe. The thing that i don't quite get is how the PSM settings look for the osd.traineddata in a manner that's different than the main mechanism for loading language training data (since i'm ocring non-english documents just fine). |
If you look at the This means the load paths for language files are a compile time option. EDIT: wait, it actually does a |
@knowtheory curious if you had any luck digging into the OSx related issues? Would be nice to play with the other Tesseract configs, like segmentation mode and custom configs. But I get the same errors mentioned above. Not sure I have the experience to help debug, but I'll probably give it a shot if I have time. |
@bwinterling unfortunately, no i haven't had time to dig in :( |
Hey meh,
Just looking for some quick advice. I've managed to get ruby-tesseract-ocr working with page_segmentation_mode 1 on ubuntu (12.04) and the OSD trained data.
I'm having trouble doing the same on OSX (mavericks) unfortunately. I've got tesseract installed via homebrew, and despite the fact that I can use the default tesseract CLI wrapper to extract text using the OSD data, i can't manage the same using ruby-tesseract-ocr. The tesseract CLI has a --list-langs options which displays "osd" as one of the options.
Despite that, this keeps happening:
Do you have any advice as to whether i'm missing a config thing somewhere? I'm mostly perplexed because, as far as i can tell, the data is in the right place, and everything else works (no compile errors or anything either).
The text was updated successfully, but these errors were encountered: