Genealogy databases tend to be huge. For instance, one new database that I am aware of and hope to write about in a few weeks contains 1.5 terabytes of information about French-Canadians, including dates of christening, marriage, and death. In case you are not familiar with the term, a terabyte is one thousand gigabytes or one million megabytes or roughly 700,000 floppy disks. That's a lot of data! Would you want to go through the task of making a backup of this data?
I don't have specifications available at my fingertips, but I suspect that the databases of the LDS Church and of Ancestry.com are even larger. I also know that the New England Historic Genealogical Society already has more than 1.5 terabytes of disk storage capacity on its web server although the disk array is not yet full.
As big as these disk storage arrays may be, they are still a drop in the bucket when compared to some devices being built today. Capricorn Technologies says it has recently completed delivery of more than a petabyte of storage to the Internet Archive, a non-profit organization based in San Francisco that creates periodic snapshots of the Internet. (I wrote about the Internet Archive and its plans to store everything on the Internet in a Plus Edition article in this newsletter on August 19, 2004.)
A petabyte is 1,000 terabytes. In other words, it is the equivalent of about seven hundred million floppy disks. A petabyte could easily store every word ever written about genealogy, including every record ever digitized. It would then have plenty of room left over for storing hundreds of millions of records to be digitized in the future.
The entire system from Capricorn Technologies fills sixteen equipment racks. You won't want that in your living room, but you could squeeze it into a two-car garage. This seems like a modest size, one that can be easily installed into most professional data centers.
The new disk systems cost about $2 per gigabyte. I suspect that any commercial or non-profit genealogy organization that has a lot of data can afford one of these systems. In short, disk storage space is no longer an issue.
Now all we need is a petabyte of digitized genealogy data…