How to Download Entire Websites for Offline Use

Information on the World Wide Web may not remain online forever. However, it is easy to download and save information when you do see it. The information then remains available to you in case you ever want to go back and read it again in the future.

With today’s low prices for internal and external large capacity disk drives plus excellent software that can search through many gigabytes of saved data to find the specific thing(s) you are looking for, it often makes sense to save huge amounts of data in the hopes that you can find specific items of interest in the future.

In fact, you can download and save entire web sites.

This isn’t practical for downloading the entire FamilySearch web site, MyHeritage.com, Ancestry.com, Findmypast.com, or the other mega-sites, each with many terabytes of genealogy information. (One terabyte equals 1,000 gigabytes.) However, copying a complete web site works well for smaller sites, such as a genealogy society’s web site or the personal web site of one person.

Downloading an entire website is also handy for those who want to archive a site in case it goes down. If you own a personal web site, making a copy of the entire site is an excellent method of maintaining backup copies to be available in case of hardware malfunctions or if you simply want to move the web site to a new hosting provider.

Ryan Lynch has published an article in the MakeTechEasier web site that describes several programs for Macintosh and Windows systems that will download an entire web site and save it to your own hard drive(s) or to your own private storage space in the cloud. You can find How to Download Entire Websites for Offline Use at: https://www.maketecheasier.com/download-entire-websites-for-offline-use/.

4 Comments

Dick,
This works great for static web sites — those that have all their content in pre-generated pages. It does not work for sites that dynamically generate on-demand web pages in response to a search or other query. Fortunately most smaller web sites, from smaller organizations like local historical of genealogical societies, are static. The fact that you can’t download the entire contents of a site like familysearch.org has as much to do with its dynamic nature as with its size.
The other problem is that many sites have security mechanisms to prevent access from software robots, exactly what these programs are. We have all seen the “I am not a robot” prompt.
Almost all modern web browsers have a “save-as” option that permits you to save the web page that is currently displayed to you hard disk, including graphics and other related files. You can then display the page at any time while you are offline and it works with dynamic pages as well as static ones. This does not require any additional software beyond the web browser that you currently use and may well be the easiest and simplest solution for the average genealogist who wants to save the results pages of a few search queries.
Clark Bagnall

Like

Not only that, but an awful lot of websites make it a violation of the terms of service to automatically and/or systematically download the content. So what you’re suggesting may well actually be a CFAA violation.

Like

I have to comment about some of the comments that have been posted here.

First, a disclaimer: I am not an attorney. This is a subject that I am interested in and I have read a lot about it and discussed this with several attorneys. However, that doesn’t mean that my comments here constitute legal advice.

I see two different things being mixed together here and yet they are really separate issues:

1. Downloading data of any sort and saving it for your own personal use.

2. Downloading data of any sort and then giving it or selling it to others.

While U.S. copyright laws are somewhat vague at times, various court cases have verified that downloading information, images, music, or other items and SAVING IT FOR YOUR OWN PERSONAL USE is usually considered to be 100% legal, even if it is copyrighted material.

The landmark case for this was “Sony Corp. of America v. Universal City Studios, Inc.” See https://en.wikipedia.org/wiki/Sony_Corp._of_America_v._Universal_City_Studios%2C_Inc. for the details. In short, the Supreme Court of the United States ruled that the making of individual copies of complete television shows for purposes of time shifting does not constitute copyright infringement, but is fair use.

Later court decisions verified that the same rules apply to text materials, movies, images, and music. For instance, millions of people legally download or purchase music or television programs and then copy the material to iPad music players or record them on home digital video recorders (DVRs) to legally watch and listen to again and again for their own personal use. You do that every time you record a television program at home for later viewing.

In contrast, downloading any copyrighted material and then GIVING IT OR SELLING IT TO SOMEONE ELSE without permission of the copyright holder is considered to be a violation of copyright laws. Don’t do it!

In short, downloading part or all of a web site for your own personal use by itself is not illegal. The question of legalities only arises when you give or sell that material to someone else. It makes no difference whether you give it away free or if you sell it. The copyright laws prohibit sharing copyrighted material without permission.

There are some exceptions for LIMITED AMOUNTS of material that qualify as “fair use” but that is not an issue when downloading an entire web site.

The above concerns U.S. copyright laws only. I have no expertise in the copyright laws of other nations.

Again, I am not an attorney. My comments here do not constitute legal advice. If you have further questions, you need to consult with an attorney who specializes in intellectual property issues.

– Dick Eastman

Like

I’ve found this process incredibly useful in research, archiving and scraping for various datasets. Great for also grabbing blog posts and replies of humble genealogy sites!
I’ve used a few different applications over the years but SiteSucker is the one I usually default to.

Like

Leave a Reply to Clark Bagnall Cancel reply

Name and email address are required. Your email address will not be published.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <pre> <q cite=""> <s> <strike> <strong> 

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: