« Social Networking Features added to Familyrelatives.com | Main | Who Wants to Be A Millionaire? - Irish Edition »

September 13, 2007

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Infinite Ancestors

Safari does something similar, just Save As... Web Archive.

theKiwi

But Safari only gets the page being viewed. Software like SiteSucker can download an entire site - all the pages of a site as long as they're linked up to a page you start from, or linked to a page linked to the page you start from.

Roger

Happy Dae

So we CAN legally copy entire websites? There's no copyright infringement? We don't have to ask permission? Has Steve Danko read this, yet?

I would like to do this legally, of course, because I could learn a lot from the structure of the files, XHTML, CSS, PHP, JavaScript and whatever else. I don't wish to steal -- just to learn.

Someone chime in here with the rules and protocol, please. Perhaps Mr. Manson?

Happy Dae.
http://www.ShoeStringGenealogy.com/ssg1.htm

Dick Eastman

---> So we CAN legally copy entire websites? There's no copyright infringement? We don't have to ask permission?

Yes. Absolutely. 100% legal. The same is true for music, books, videos and more. Well, there is ONE hitch: All of the copies must be for your own personal use only.

That has been true for decades.

As soon as you copy or republish any information/music/videos and provide part or all of the copy to someone else, you may have the copyright police knocking at your door.

Disclaimer: I am not a lawyer. If you have questions, you are encouraged to seek professional legal help.

- Dick Eastman

rdx

Most of the times you can be safe downloading an entire site for personal use. However, there might be policies written by sites that expressly forbid the systematic download of their pages, sections or of the site as a whole. E.g. the digital library I worked for had the policy to prohibit the download of entire books.

Marjorie

BOY... I hope this works. My own wedsite is on Google, but their backgrounds don't copy under Safari.
I'd like to get my website _off_ Googlepages and ON to my own ISP host .
==Marjorie

Dick Eastman

It worked when I tested it on a small web site I own, only about 6 or 8 web pages. I decided to not try it on eogn.com as there thousands of pages there. I back that up using other methods.

- Dick Eastman

Ed

We all have our favorites for offloading genealogy info from the Web. I have used sitesucker (Mac) once, with great success for static .html pages. On a PC I use EasyWebSave ($10) for savingsingle whole pages - just rightclick & choose it, the page is saved, filed in its special directory, and I move on. For Interactive [a selection on a dynamic page] info-saving, I use Techsmith's Snagit 'text capture' to the clipboard, coupled with Fookes' NoteTab Pro, which has a 'Pasteboard Feature' that append-saves whatever is sent to the clipboard, semi-automatically building a .txt-file of just those snippets on 'Uncle Roger.' (This method is also great for interactive lookup comparison pages as in MyHeritage.com or GenCircles.com that often won't 'copy/paste'.) When I am done, I search the contents of my findings/files with Gaviri's PocketSearch, which not only covers all my hard-disks, but also zip drives, thumb-drives, etc. and highlights the search-word showing full-context. I have tried all the 'big' desktop search engines, but they are always grinding away in the background, slowing everything down to the point of aggravation . Pocketsearch is the only one I know of that is small, nimble (quick returns) and searches beyond my regular hard disks. None of these are Free (except SS), but are low cost, and IMHO are worth the money in terms of time and effort saved.

Marilyn

When you say entire website, does that mean files that are being stored, but not shown, on my website? I have a website that I often use to store files I plan to access from remote locations. I upload with FTP a Word document or a video, for example, and then download it later by entering the URL address of the document on my website. For example: "www.mywebsite.com/mydocument.doc". I know there are other ways to do this, but it is so easy this way, and some of my files are too large for sending via email.

These are private files and I don't want to share them with other people. However, my html pages are in the same folder on my website and are visible on my site. Am I at risk for these being copies with this procedure you are talking about? Would buring them one more level deeper into a folder on my website help? for example: "www.mywebsite.com/mydrivewayfolder/mydocument.doc"???

Suzie

On your recommendation this morning I tried out SiteSucker. WOW!

I am taking over as a County Coordinator for a GenWeb site and have been trying for 2 weeks to get all the files [18,500] downloaded to my Mac. Last Tuesday I got to the half way point. Yesterday, in 7 hours, I finally reached the 2/3's complete goal.

Today, with SiteSucker, the whole job [STARTING OVER!] is complete in 6 hours. AND, it gave me an error log to show which files were missing apparently basing that determination on an analysis of links within the files.

The file count is almost exactly 1000 pages smaller than two different FTP programs found. I'm thinking that SiteSucker downloaded only those files linked to the webpage and did not download any that were formerly linked but aren't any more. That's my guess but I'm planning to contact Rick Cranisky at SiteSucker and double check.

I'm not throwing anything out until the revision is complete, just in case.

SiteSucker is GREAT! Thanks for the "heads up", Dick.

Dick Eastman

You are correct: SiteSucker (and most other programs that download entire sites) can only find pages that have other pages linking to them. SiteSucker follows links. It has no other method of finding web pages.

Any pages that are "marooned" (unlinked) will not be found by the various web site download programs.

Actually, that is a benefit. If you have an FTP listing of ALL pages and then compare that list against the files that SiteSucker retrieves, you can easily identify marooned pages: they are the ones listed by FTP but not found by SiteSucker.

- Dick Eastman

Lee

I have been keeping blogs various places for years, and today used SiteSucker to quickly archive them on my mac, allowing me search and edit everything... very basic example of the utility of this program with ones own web content.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.

Receive FREE daily newsletter updates by email

  • Enter your email address


    Click here to see a typical e-mail message you will receive.

    I promise that:

    1. I will never sell, rent, or give away your address to any outside party, ever;
    2. I will never send you any unrequested e-mail, besides newsletter updates; and
    3. All unsubscribe requests are honored immediately, period.

My Photo

Search This Site for Past Articles

Meet Dick Eastman in Person

November 2009

Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30          

Amazon Kindle

Offers

Blog powered by TypePad

Amazon Picks

Receive daily newsletter updates by email

  • Enter your Email


    Preview

    (Don't worry, I hate spam as much as you do and you will be able to UNSUBSCRIBE within seconds at any time!)