Archive for the ‘County Sites’ Category
Currently I’m working on data entry software for census records. There are a lot of variables to consider: how much information should be included, which facts are to be recorded, how closely is the data to be source-referenced (i.e. page/folio or exact line for each entry?), are short-cuts possible that can make the indexing process easier or more effective?
First, I don’t understand why anyone would want to make a complete transcription of census data, such as some Genweb sites have attempted. Seems like a massive waste of time to me. If the transcription is complete, you don’t need to check the original, right? Well, no, it does not work that way. Genealogists should always confirm the data in the original anyhow — so what do you gain?
The only rational reason for a complete transcript of any original record is that it provides back-up if the original is destroyed, or if the original is not accessible. Well that certainly does not apply to US Federal Census records — there are so many microfilm and digital copies around that only global disaster could wipe them all out, and in that case the transcriptions would probably go to, and there would be nobody left to care.
So what we are talking about here is indexing. Indexes are vital for making information accessible. But how much detail is enough? We want people to be able to determine which of the dozens of John Smith listings in a city are the John Smith they seek. Obviously, year of birth is one of the main identifiers — and that is available on Federal censuses from 1850 forward — though it must be calculated from the persons age in most years. Those same censuses also list the birthplace of each individual — at least the state, or in the case of foreign born, the country — and that is a big help in distinguishing similarly-named individuals too. But it is really the family relationships that are usually the clincher in distinguishing individuals.
With our Rec2Gen software, relational information is available, so the entire household will be listed for each individual, by clicking on the name and looking at the relatives, friends and associates sections of the full report. So obviously our data entry software needs to enter the data for entire households together.
Another problem I have with current indexing practice is this notion that you have to copy exactly what the original record shows. If you are citing information in your compiled genealogy, or a genealogy report of some kind, then yes, you need to report exactly what is shown in the record. But an INDEX is not a RECORD — it is just a finding aid. Making changes that make it easier for people to find the entries they seek just makes sense. Why should people have to check both the Wi and Wm part of any alphabetized list looking for William? And while common abbreviations can be accounted for in computerized search programs, it is impossible to include every possible variation. Why not just index it as William, and when the researcher views the original record they will record Wm or Willm, or whatever they find?
So, getting back to my programming, the indexing program for 1850+ censuses will simply include name, birth year (calculated if need be) and birthplace. Data will be entered for a complete household, even if it happens to span more than one page or folio. That allows for the most rapid data entry, lets me write just one program for many different census years, and still provides easy access to the most important identifying characteristics. It will also allow members to enter just their own family from the censuses.
Yesterday I completed — or at least thought I completed — a program for rapid indexing of names from history books. It is for use with some of the books I found on the Internet Archives for the Erie County site — those books are listed in the Records section of that site, with links to the originals on the Internet Archives.
There are various versions of the books available, but only two concern me here, one is an OCR scan (i.e. plain text) and the other is a PDF file with scanned original pages. OCR is not perfect, so I go through the text version, copy names to my program, then check the PDF version to make sure the names have no errors.
I can copy lists of names, then set a few radio-button options on the program to tell it how to parse those names. I need to tell it what year the names apply to, if the names are in normal order or surname first, and if the names are separated by commas, semi-colons or line breaks. All these options change page by page in the original book, so I am entering many short lists. First settlers, church membership when the church was founded, members of militia groups or dozens of other types of lists appear in the book, and can be quickly copied, corrected, and entered.
At least that is the plan. There seems to be a glitch — I entered about 75 names in 15 or 20 minutes this morning, and the individuals got listed correctly in the database — but the supporting factoids (just one per person in this case, that the person was living at that place and time) did not work correctly. I’ll fix that and continue on.
Doing this, I believe, will make the content of such books more accessible, by bringing them to the attention of researchers. Instead of searching thousands of pages of books, you can search one database, on the county site, and find who is listed, where and when. That is the kind of thing these county sites are intended for — making research easier and faster, so you can get more done in less time. You still always have to check the original records of course — that is genealogy — but we try to help organize and centralize the data so you know which sources to consult.
Today I finally got the first Rec2Gen county sub-site online, for Erie County New York. For now, think of it as a skeleton of what I have in mind. You can watch it grow into a full-fleshed behemoth over the next year or two.
Under surnames, click on the A or B and you will see lots of names — but further down the alphabet there is little yet. That is because one of my first projects is an index of the 1870 City Directory for Buffalo. Each day I spend a little under an hour to add another page from the book, a digital copy of which is available online (see the Records section under Buffalo for a link).
One of things I want to do is provide the kind of information that is present in the data, but not easily discovered. For this city directory, for example, I plan to add the capability to search for neighbors of any individual, based on their listed address. It doesn’t do that yet however, that is only one example of a long wish-list that I plan to add as time goes on. In terms of programming it is relatively simple, and will only take a few hours to implement, but it will not be much good until the whole directory is in the database, and that will take a year if I only add 5 pages a week, as I am now. But who knows, maybe I’ll have more time in the future, and can spend longer hours on that project — but I’m not counting on it.
There may not be much apparent activity at the Rec2Gen site, but that is because I’m working on the software for the first sub-site, for Erie County (Buffalo and environs) in New York. I want to be sure that site-owners will be able to personalize their own county sites, but still have all the basic data that makes up the heart of the system. So I’m writing the program that will store the information about a site that controls it’s appearance, and allows the site owner to add web pages, edit existing pages, or modify the appearance of any page or for the site as a whole.
Mostly, that means providing a variety of templates, so the owner can choose to have one, two or three columns of data, control the width of each column, as well as background color or images. Type font styles, size and color will all be individually controllable for different headings and text areas. In computer code this is mostly done with CSS — cascading style sheets — so the site can be modified in ways that substantially change the appearance, without affecting the underlying data or content that is displayed. That means that appearances can be changed at any time, without any loss of data or need to reformat everything.
To make sure everything works as planned, and the software is easy to use, I’ll be using this program to create the first county site. Then I’ll have several programs for the owner to enter and manipulate data — some of which I’ve already written for the main site, and others of which will be written as needed to generate data for that first site. With data entry software, the emphasis is on ease of use, efficiency and speed. In a future post I’ll go into more details about the planned content for those county sites — the ultimate goal is make them the most useful locality sites for genealogists of all levels of experience and expertise.
You are currently browsing the archives for the County Sites category.