Yesterday I completed — or at least thought I completed — a program for rapid indexing of names from history books. It is for use with some of the books I found on the Internet Archives  for the Erie County site — those books are listed in the Records section of that site, with links to the originals on the Internet Archives.

There are various versions of the books available, but only two concern me here, one is an OCR scan (i.e. plain text) and the other is a PDF file with scanned original pages. OCR is not perfect, so I go through the text version, copy names to my program, then check the PDF version to make sure the names have no errors.

I can copy lists of names, then set a few radio-button options on the program to tell it how to parse those names. I need to tell it what year the names apply to, if the names are in normal order or surname first, and if the names are separated by commas, semi-colons or line breaks. All these options change page by page in the original book, so I am entering many short lists. First settlers, church membership when the church was founded,  members of militia groups or dozens of other types of lists appear in the book, and can be quickly copied, corrected, and entered.

At least that is the plan. There seems to be a glitch — I entered about 75 names in 15 or 20 minutes this morning, and the individuals got listed correctly in the database — but the supporting factoids (just one per person in this case, that the person was living at that place and time) did not work correctly. I’ll fix that and continue on.

Doing this, I believe, will make the content of such books more accessible, by bringing them to the attention of researchers. Instead of searching thousands of pages of books, you can search one database, on the county site, and find who is listed, where and when. That is the kind of thing these county sites are intended for — making research easier and faster, so you can get more done in less time. You still always have to check the original records of course  — that is genealogy — but we try to help organize and centralize the data so you know which sources to consult.

Leave a Reply

Read the FEED!



Search