The Library of Congress stores thousands of rare public domain documents relating to America's history - documents that are slowly decaying. Now the library of the U.S. citizenry is about to begin an ambitious project to digitize these fragile documents and publish the results online in multiple formats. The project is built on free open source software (OSS), including a Linux operating system cluster of over 1,000 machines. The documents will be made available to the public at no charge.
Thanks to a $2 million grant from the Sloan Foundation, "Digitizing American Imprints at the Library of Congress" will begin the task of digitizing these rare materials -- including Civil War and genealogical documents, technical and artistic works concerning photography, scores of books, and the 850 titles written, printed, edited, or published by Benjamin Franklin. According to Brewster Kahle of the Internet Archive, which developed the digitizing technology, open source software will play an "absolutely critical" role in getting the job done.
The main component is Scribe, a combination of hardware and free software. "Scribe is a book-scanning system that takes high-quality images of books and then does a set of manipulations, gets them in optical character recognition and compressed, so you can get beautiful, printable versions of the book that are also searchable," says Kahle.
You can read more about this ambitions project on Linux.com at http://www.linux.com/article.pl?sid=07/03/26/1157212.
