OCR for Gothic?

Hallo

Bitte, is there any OCR software to read old German Gothic print?

Bob Doerr in the beautiful Missouri Ozarks
Sole surviving founding officer, Missouri Chapter, Nature Conservancy, 1956
http://www.nature.org/wherewework/northamerica/states/missouri/
Editor, since 1992, Missouri State Genealogical Association Journal

I don't know of any specific OCR for Fraktur script, but the Omnipage software from Nuance is trainable. Start with very clear pages at first, set up a profile (and make sure the language is set to German) to save what it's "learning" in one place, then scan and laboriously correct all the errors. After a while, you'll notice it's making less and less errors. In version 15 I never got it to do a whole page error free, but I did get it down to one or two, and these were from pages I'd rate a 7 on a scale of 1 to 10 in clarity.

Mike

Hi Bob,

Bitte, is there any OCR software to read old German Gothic print?

I suppose you refer to "Fraktur print".

Have a look here => http://susi.e-technik.uni-ulm.de:8080/Meyers2/Digitalisierung.html
This university group OCR-ed the 4th edition of Meyers Lexikon from 1888:
It is possible with the Russian-developed FineReader ... (Best of the Best, Cr�me de la Cr�me !) ... even the standard version.

I know ... as I did the 5th edition for myself ...

Problem with Fraktur is that it was not yet a standardized print, i.e. every "Verlag" had their own set of Fraktur letters (Meyers different from Brockhaus etc.).
We usually do not perceive that, however the OCR software does and then needs human training ...

Sincerely
Hanno (V.J.Kolbe)

P.S. The Goths actually had a written language, however were a wee bit too early around to profit from Gensfleisch's invention ... the idea of which he got here in Alsace when he saw a device to squeeze the juice out of grapes to make wine ( 8-))

Bob, I use Finereader also. I also have Omnipage and in my opinion it
doesn't match up to FineReader. Still, I'm not happy with the results
of whatever OCR program is used. I doubt that any one program can ever
serve all of the varying fonts out there in old publications. The
newer books can be handled, as the print quality is more uniform, and
the program can be trained as has been mentioned but certain letters
continue to have problems no matter how much training is done. It's
the nature of the beast of the language.

Old texts which are of more interest to me, are very difficult to
scan. For example I have mid 19th century address books which were
scanned and sent to me digitally. I've given up to try to create text
out of them. Too many abbreviations and stuff that needs human
reading. Then there are the Preussischen Rank Listen of the early 19th
century, of which I have some actual yellowed copies, it's not really
possible to distinguish certain characters except in context of their
use.

FineReader has a professional version sold in Germany which costs well
over $1000 and has certain use limitations that is supposed to be the
end all but I wonder. (by use limitation I mean the program has a fee
structure associated with the amount of scanned data).

There is a lot of work being scanned in Germany and then manually
corrected using FineReader by volunteers. It's a backbreaking job to
scan entire volumes of history or encyclopedias.

Fred
PS Best of luck with whatever you hope to scan. It can be done but be
ready for lots of intervention.

Hi, Hanno,

It seems to me that OCR that could read Meyers could read any Gothic!

I find the Fractur used in St. Louis much easier to read than Meyers.

At one time I regularly received beautifully laser-printed documents, but OCR was slightly incorrect. At worst, I had a page on which only once were three successive letters OCR'd correctly!

Thank you for your response.

Bob Doerr in the beautiful Missouri Ozarks
Sole surviving founding officer, Missouri Chapter, Nature Conservancy, 1956
http://www.nature.org/wherewework/northamerica/states/missouri/
Editor, since 1992, Missouri State Genealogical Association Journal

Hi Bob,

I find the Fractur used in St. Louis much easier to read than Meyers.

Two more things:

- Training for Fraktur makes only sense when you have a book of 100+ pages (then you "recover" the time which you spent in "training to read Fraktur").

- It is important that the scans are "clean and crisp". I swear on black/white (got best results then), others say "Nope ! Greyscale does the trick !"
I did my decision by counting the errors on the same page, scanned under different conditions and read with the same trained set..

A bit hard to get into ... but when you see how page after page gets recognized ... thats the third best feeling in the world !!

Greetings
Hanno