There are a number of reasons for copying an entire web site to your own hard drive. Perhaps you want to save a copy of all the information available on the site. You can then disconnect from the Internet and peruse the site at your leisure while disconnected. In fact, you can read the pages of a web site while on a train or airplane.
Another reason that I use is for searching data: even though I have a broadband Internet connection, I find it much faster to search a copy of a web site on my own hard drive than to search the original pages online with Google or other search tools. A final reason is that you own the site and want to make a backup copy in case of disasters.
A number of programs are available that will simplify the copying of web sites to your local hard drive. I have experimented with three or four of them and have now settled on HTTrack.
HTTrack is a powerful and easy-to-use tool that can backup your entire web site for off-line browsing and archival purposes. The program will make a mirror image of an entire site or (optionally) of part of a site. The program will recursively build all directories, getting HTML, images, and other files from the server to your computer. As part of the process, HTTrack arranges the original site's relative link-structure.
You can later open a web browser to view the web pages stored on your hard drive. It will look just like the original web site except that performance is much better. As you move from web page to web page, the new pages appear almost instantly.
HTTrack's default settings will copy an entire web site. I suggest you not try that with Ancestry.com, Google, Switchboard.com, or other sites with huge databases. You might need a few terabytes of disk storage to accomplish that! However, HTTrack's menus make it easy to narrow the parameters down to copy only a subset of the web site, such as only the message board or only the pages in the "Canadian genealogy" subfolders.
None of the web site copying (mirroring) programs will work on interactive web pages. That is, when copying sections of Ancestry.com or Switchboard.com (a popular telephone directory web site), these programs will have difficulties with any pages that ask the user to input a name to be searched. For instance, a web page on Ancestry.com that says, "Enter first and last names" will be copied. That is, the entire page WITH BLANK SPACES will be copied. However, none of these programs will let you enter names to be searched, nor will they record the results of such searches. As a result, you cannot use any of these programs to copy an entire database from Ancestry.com or Switchboard.com or other database-driven interactive sites.
Another limitation is that web pages which require a program to be executed on the web server will not function properly when copied to your hard drive. Many web pages ending in the letters .cgi, .php, or .pl specify that a program must be run on the web server in order to supply information to the viewer. HTTrack and other web mirroring programs will typically copy the original page but have no method of running the required programs that are not installed on your local PC. Therefore, any expected results from these programs cannot be displayed on your local screen.
For instance, you might be copying data from www.weather.com and encounter a page ending in .cgi that queries the current weather report for your town. The original page will be copied to your hard drive. However, if you read that page two or three days later while offline, there is no way to execute that CGI script and display the current weather on your screen.
The "mirror" also is not an exact duplicate of the original. One case in point is that all the internal links are changed to make them work properly from the hard drive. For instance, let's say I copy my web site (www.eogn.com) to my hard drive. Now the link on my web page that points to http://www.eogn.com/index.html (a second page on my site) gets converted so that it points to C:\My Web Sites\EOGN\blog.eogn.com\index.html. This way, when I later view the page on my own hard drive, this link takes me to the correct document as stored on my own hard drive rather than trying to connect to the equivalent page on the web.
HTTrack adds some information to the beginning and end of each page stored on the hard drive to show that it is a copy of the original, created by HTTrack. This is a good idea because it shows that the page is not the original. However, all this prevented me from performing one of my original plans: I had thought that I could quickly edit the pages stored on my hard drive and then upload the results back to the web server. Because of all the changes HTTrack makes inside the copied web pages, my original plan is not practical.
Even with these minor limitations, HTTrack will properly copy most genealogy web pages since perhaps 99% of such pages online do not depend on .cgi, .php, or .pl scripts. Just bear in mind that HTTrack will not be effective for sites created with The Next Generation or any other sophisticated interactive genealogy web site. It also will not work well with online interactive databases, such as www.weather.com, stock market reports, or other sites that display constantly changing data.
All in all, I am pleased with HTTrack, even with these minor limitations. I found the program to be easy to download, install, and operate. For most of the simpler web sites, all the user needs to do is run the program, supply the URL of the site to be copied, and then sit back while the program copies everything. You later use a standard web browser to open the main index file of the copied site (usually index.html) as it exists on your hard drive and use it as if you are on the web. The one major difference will be speed: web pages appear in your browser's screen much faster when retrieving those pages from your own hard disk rather than from the web.
The best part of HTTrack is its price: FREE.
This one is a keeper. I use it periodically to make backup copies of web sites that I own. I also occasionally copy other sites to read while I am traveling and disconnected from the Internet. That is a great way to pass the time on a long flight.
For more information about HTTrack for Windows, Linux, and UNIX, or to download the program, go to http://www.httrack.com
