« Pieces of Plymouth Rock sold on eBay | Main | FGS 2007 Call for Papers »

December 01, 2005

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Dino (All Dino, All the Time)

And how does copying an entire website without the owner's (who is usually the copyright holder) permission comply with current copyright law?

It violates it big time, it even violates the fair use sections. It goes far beyond what copyright law would consider 'fair' when it refers to "amount and substantiality of the portion used in relation to the copyrighted work as a whole."

Dick, if you own the site you should have the original source that you can republish if you need to. I can see a use for the tool if you want to make an offline copy of your own web site to give to someone.

As for making copies of an entire web site to use later when you see fit, is absolutely no different than going to the library and xeroxing an entire book.

Dick Eastman

Disclaimer: I am not a lawyer.

However, as I understand it, there are no copyright issues with making copies of publicly-available information for your own personal use. However, there may be an issue with what you do with that copy.

In other words, if you make a copy and keep it on your own hard drive and simply read it yourself, I believe you are perfectly within legal boundaries. Anybody can make a copy of eogn.com for their own use and read it while riding the commuter train without worrying about legalities.

Should you later republish or reprint or otherwise redistribute the information to others, you might then be in danger of violating copyright laws.

- Dick Eastman

Trishymouse

I'm also puzzled since you said you can't use it to edit and then upload because of all the changes it makes to the pages, but later say you use it for backing up your websites. I'd rather do a straight, quick, FTP to my HD of my folders and files from my website directory and get them clean, than use this program that messes them up and have to clean them up later...

Trishymouse

I'm also puzzled since you said you can't use it to edit and then upload because of all the changes it makes to the pages, but later say you use it for backing up your websites. I'd rather do a straight, quick, FTP to my HD of my folders and files from my website directory and get them clean, than use this program that messes them up and have to clean them up later...

Dino (All Dino, All the Time)

Dick,

I'm happy that you allow so much freedom with copies of your web site. Unfortunately (or fortunately, depending on your views) web sites have exactly the same copyright protection granted to books and other printed matter.

Just as it is copyright infringement for you to go to a library and xerox (or scan) an entire book without the consent of the copyright holder, it is also copyright infringement to make a copy of an entire web site.

Didn't you review a book on copyright not too long ago? The following is from Cyndi's List:
Web pages are protected by a copyright. Information contained on those web pages and all original information that is not in the public domain is protected by copyright. A compilation of works, including a set of links arranged into a compilation, IS protected by copyright.
http://www.cyndislist.com/copyrite.htm

Dick, thank you for allowing everyone to freely use your column. Not all authors are so generous.

Dick Eastman

Dino, I think you need to re-read my earlier posts on this topic. In short, anyone is free to make copies FOR THEIR OWN PERSONAL USE. That is true of web sites, books, records and most anything else. Copyright becomes an issue only when someone tries to redistribute the information or to re-use it in something else.

As I wrote earlier, "if you make a copy and keep it on your own hard drive and simply read it yourself, I believe you are perfectly within legal boundaries. Anybody can make a copy of eogn.com for their own use ..."

and

"Should you later republish or reprint or otherwise redistribute the information to others, you might then be in danger of violating copyright laws."

In short, you can legally copy it all you wish but don't give it to anyone else or re-use the data in any manner without permission.

- Dick Eastman

Dick Eastman

The backups made by HTTrack and most other web site copiers are not suitable for re-uploading. They are not identical images and cannot be identical if they are to be viewed offline. However, all the text is backed up.

Chris Dunham

Reality check: No one is likely to be sued for copying a public website for personal offline browsing. That being said, the question whether one could be successfully sued will never be answered until someone actually is. That seems to be the way copyright law works.

Offline browsing is simply an extreme case of caching, which web browsers do all the time by default. I currently have over 50MB of website info stored on my hard drive. Have I infringed upon anyone's copyright? How about if I download a free utility that allows me to easily view these cached files?

Some software even allows you (or your ISP) to "precache" webpages you haven't yet viewed. All of this is intended to speed up the delivery of web content--not to deprive anyone of his livelihood.

In offline browsing, content is used as it was intended to be used, just at a more convenient time (called "time-shifting").

Copying an entire website is not really analogous to photocopying a book at the library. It's more analogous to recording an episode of Desperate Housewives for later viewing. The U.S. Supreme Court in the 1984 "Betamax case" ruled that recording a program to view later fell under "fair use":

"When one considers the nature of a televised copyrighted audiovisual work ... and that time-shifting merely enables a viewer to see such a work which he had been invited to witness in its entirety free of charge, the fact ... that the entire work is reproduced ... does not have its ordinary effect of militating against a finding of fair use."

Even this is not strictly analogous, since most television programs are not available "on demand" the way websites are. Again, Internet copyright law is still in its infancy, and future cases will undoubtedly clarify the fair-use provisions.

Also, notice the phrase "free of charge" in the court decision. I wouldn't try downloading large portions of the Ancestry.com website for later viewing--especially since this is expressly forbidden in the TOS.

Ray Marshall

The major reason that I would want to copy an entire web page (and I copy partial web pages all the time) is that in my 16 years of experience with a computer, I have seen an awful lot of web pages disappear on me.

And since I have saved all of my bookmarks from three different computers, I have a huge bookmark file.

There are probably plenty more pages that are gone that I have not had cause to revisit, but might some day.

Web pages disappear for various reasons, one of which would be the death of an owner with a survivor who isn't interested in the subject. This will happen more and more as we baby boomers (and older) get into our 60s and 70s. Another reason would be moving to another server and changing the page title.

Thanks for the tip, Gene.

Ray Marshall
Minneapolis

KosherJava

Keep in mind that many sites will be brought to their knees by the use of HTTrack. Any dynamically generated genealogy site has 10s of thousands of "virtual pages". Most small genealogy websites will exhaust all of their allotted bandwidth for the month from 1 single site copy using HTTrack.

Chris Dunham

>>Keep in mind that many sites will be brought to their knees by the use of HTTrack.

Good point. This is especially true of sites using free webhosting.

leifbk

I've got an online 14000 persons genealogy database with dynamic pages, and I hate it when I see these vultures in action. To let people view your material online, and to let them download your entire site in what is looking like a DoS attack are two totally different things. Unless it is explicitly allowed, I feel that you should be very careful about doing it. There are ways to prevent it from happening, but that will require active steps from the site owner.

See the news thread at http://groups.google.com/group/lucky.freebsd.questions/browse_frm/thread/eb55e1d51cfebc97/da5d3b70664b6381?tvc=1&q=httrack++abuse&hl=en#da5d3b70664b6381 for an alternative to Dick's rosy view as well as some technical advice for dealing with site rippers.

At least, HTTrack is honoring the "robots.txt" file by default. But that won't stop anyone who doesn't take 'no' for an answer, and finds out how to change this setting.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.

Receive FREE daily newsletter updates by email

  • Enter your email address


    Click here to see a typical e-mail message you will receive.

    I promise that:

    1. I will never sell, rent, or give away your address to any outside party, ever;
    2. I will never send you any unrequested e-mail, besides newsletter updates; and
    3. All unsubscribe requests are honored immediately, period.

My Photo

Search This Site for Past Articles

Meet Dick Eastman in Person

November 2009

Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30          

Amazon Kindle

Offers

Blog powered by TypePad

Amazon Picks

Receive daily newsletter updates by email

  • Enter your Email


    Preview

    (Don't worry, I hate spam as much as you do and you will be able to UNSUBSCRIBE within seconds at any time!)