How to Download an Entire Website for Offline Reading

NOTE: This article has nothing to do with genealogy. If you are looking for genealogy-related articles, you might want to skip this one. However, the article describes some very useful methods of storing information from web sites onto your local hard drive. That can be useful for genealogists and for many other people as well.

It is easy to save individual web pages for offline reading, but what if you want to download an entire website? Would you like to download part or all of a particular web site and store the information on your computer’s hard drive? There are several different programs for Windows and Macintosh that will let you do just that. Some of the programs are available free of charge.

Joel Lee has published an article in the Make Use Of web site that describes four such programs. He even describes my favorite one, called SiteSucker. He writes:

“If you’re on a Mac, your best option is SiteSucker. This simple tool rips entire websites and maintains the same overall structure, and includes all relevant media files too (e.g. images, PDFs, style sheets). It has a clean and easy-to-use interface that could not be easier to use: you literally paste in the website URL and press Enter.”

I have to agree with Joel Lee’s description. I have been using SiteSucker for years to make backup copies of my own web sites and have been pleased with its operation. I even wrote about SiteSucker in this newsletter in 2014 at http://bit.ly/2yZhcwZ.

SiteSucker costs $5, a modest amount considering how useful the program has been for me.

I have also tried another program he describes, called Wget. I wan’t too enthused about Wget and I noticed that Joel Lee didn’t say much about it either, other than to give a basic description of the program. He didn’t offer any opinion of how useful Wget is compared to the other three programs.

One thing that Joel Lee does not mention is that these programs download all static web pages but will not retrieve dynamic web pages.

NOTE: Dynamic web pages are those created at the moment a user requests information. For instance, a visitor to MyHeritage.com or Ancestry.com or FamilySearch.org might enter a query for an ancestor named John Jacob Jingleheimer Schmidt. The web site then creates a BRAND NEW web page at that moment and displays it to the user. The web page is then deleted from the web site as it is no longer needed.

SiteSucker cannot create queries so it will not ask for new web pages. You won’t be able to download the billions of records that are available on these large genealogy databases. That’s a good thing as you probably don’t own enough disk drives to hold all that data anyway. However, these programs will copy all pages that are not database-driven, such as the static web pages at https://www.eogn.com.

You can read Joel Lee’s article, How Do I Download an Entire Website for Offline Reading?, at http://bit.ly/2D1k2pJ.

3 Comments

I use wget all the time in automated scripts for downloading content from websites. However, it is a hard-core (and powerful) Unix/Linux tool with a bit of a steep learning curve. Every time I set up a new way to use wget there’s always a lot of head scratching and trial-and-error until I get it right. Not a tool for the casual user, but I would not be surprised if it is used behind the scenes by many GUI-based site-scraping tools.

Like

I’ve been able to do it for many years with Adobe Acrobat,but why would I wanted thousands of pages when I’m only interesed in a few.

Like

On my Android tablet I find dynamic web pages simple to deal with — take a screenshot and OCR the text when you need it (all stored on Drive automatically, of course).

Like

Leave a Reply

Name and email address are required. Your email address will not be published.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

You may use these HTML tags and attributes:

<a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <pre> <q cite=""> <s> <strike> <strong> 

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: