Ancestry.com Migrates Its Entire Server Infrastructure to the Cloud

I have written often about the advantages and the disadvantages of storing your data, especially backup copies, in the cloud. Mostly, it is an efficient and effective method of keeping your information safe. A lot of industry leaders agree. Now Ancestry.com’s entire data center has been moved to cloud computing.

A few years ago, I visited Ancestry.com’s data center on two different occasions. While impressive, it was a typical data center. (I have been inside hundreds of data centers over the years.) One major disaster, such as a fire or earthquake, could have left the company without a lot of data processing capabilities. To be sure, Ancestry.com maintains almost constant backups of their data. However, building a new data center after a disaster, probably in a new location, and restoring the backups would have required months, possibly years.

Now Ancestry.com has a new solution. According to an article by Natalie Gagliordi in ZDet:

“Genealogy service provider Ancestry.com is the latest data-heavy company to migrate its entire infrastructure to Amazon Web Services.

“Ancestry is a 34-year-old company and is rarely mentioned for its technological prowess, but it deals in data at a massive scale. Its services rely on artificial intelligence and machine learning to help subscribers uncover connections in millions of family trees and historical records.

“On the technology front, Ancestry currently manages about 10 petabytes of structured and unstructured data generated by more than 2.6 million subscribers, including 20 billion historical records detailing births, marriages, deaths, military service, and immigration. On average, more than 75 million searches are handled by Ancestry servers daily.”

NOTE by Dick Eastman: 10 petabytes is 10,000 terabytes or 10 trillion bytes.

The old data center is being decommissioned.

You can read the entire article at: http://zd.net/2sIypH0.

17 Comments

10 petabytes = 10 quadrillion bytes
A small factor of 1000 😉

Like

Remember, the cloud is just someone else’s data centre. This will still leave Ancestry up the creek if the Amazon data centre goes off line. Which has happened, as I recollect. They’ll only see no effect if they pay for the disaster recovery options, and since they’re not paying for it now (you imply), they’re unlikely to pay for it in the future. Unless it’s radically cheaper, in which case this is a good move.

Like

    —> Remember, the cloud is just someone else’s data centre. This will still leave Ancestry up the creek if the Amazon data centre goes off line.

    Actually, the cloud is someone else’s MULTIPLE data centers, all configured in such a way that if one data center gets knocked offline by a disaster (fire, flood, tsunami, earthquake, etc.) the other data centers take over the workload within seconds. In most cases, if a cloud-based system gets knocked offline, the users never notice. The users are simply automatically switched to the other data centers operated by the same service and operations continue as normal.

    This was well proven a few years ago when the earthquake and tsunami hit Japan. Amazon’s AWS (Amazon Web Services) data center in Japan was knocked offline for weeks but most of the users never noticed the loss. The other data centers operated by Amazon simply took over and everything continued to operate as normal from the end users’ viewpoints.

    True cloud computing includes live, constant backups (often called “replication”) amongst the various servers in different locations around the world. Most of the companies that offer cloud-based services deliberately place their various data centers in locations widely dispersed around the world so that any local disaster will not have any effect on end users.

    —> Unless it’s radically cheaper, in which case this is a good move.

    It is always radically cheaper than each company building its on “hot backup” servers in widely dispersed locations.

    Liked by 1 person

    “Actually, the cloud is someone else’s MULTIPLE data centers, all configured in such a way that if one data center gets knocked offline by a disaster … the other data centers take over the workload within seconds”
    Sorry Dick – you describe the potential of the cloud quite accurately – but you have to pay for that. When Amazon Dublin went out a few years ago, a number of services went down and when we all asked, “What about the Cloud?”, the answer was that that these were the companies that had not paid for alternative hot-sites or whatever you care to call it.
    In order to switch “within seconds” to “the other data centers operated by the same service”, you need to mirror all the disk updates continually to all the alternate data centres. Just having one alternate means doubling your disk and processing costs continually. Now, with any critical system, you’d be mad not to – but for everyone in the cloud? Also, it is a non-trivial exercise to ensure that disk updates are kept in step – there is no inherent guarantee that two mirror “disk drives” update in the same sequence as the originals. Ensuring that you haven’t lost data in transmission costs money.
    Companies do take the decision not to do some of this.

    Like

    Correct. The cloud services can be configured in many ways. One method is to only have a single server or single group of servers. I suspect that many of the low-budget businesses do that. The problem is that such single-server operation ois not a true cloud operation, even if the servers happen to be installed in a data center that is part of the cloud.

    Any TRUE cloud operation runs on multiple servers in multiple locations, typically locations around the world.

    Any corporation that is concerned with high uptimes and high reliability, such as a multi-million-dollar online genealogy service that cannot afford to have big outages (such as the RootsWeb fiasco of some months ago), will ignore the low-budget options and will configure systems that are designed to keep their business in operation. They will always specify true cloud operation, using multiple simultaneous servers in multiple locations around the world. An additional benefit is that such redundancy in multiple locations also allows for quicker response times to customers around the world.

    In addition, in a cloud operation, if a particular server or group of servers becomes overloaded (such as the news services on election night), it is easy for the cloud service to add additional servers within a matter of minutes to handle the load, then to reduce the number of servers when the heavy load goes away. In traditional, non-cloud data centers, adding servers is perhaps a six-month process, involving planning, funding, ordering, waiting for manufacturers, shipping, and installation. Then, once the load is reduced, the company is stuck with lots of expensive hardware they no longer need.

    Cloud computing allows the addition of more hardware to handle the load within minutes.

    Even then, nothing is ever perfect. Outages can still occur, primarily from software problems. However, having redundant servers in redundant data centers in multiple locations worldwide certainly REDUCES the chances of an outage.

    For instance, how many outages has Google had? How about Amazon? How about Netflix? Those services all run in multi-location clouds.

    Like

    OK Dick – the methodology that you describe is excellent and if that’s what you want to define Cloud Computing as, then I would agree with all those advantages totally. I am unconvinced that everyone uses the definition in the same manner, which is why I want people to understand that just putting your software on someone else’s servers isn’t enough.

    Like

    —> which is why I want people to understand that just putting your software on someone else’s servers isn’t enough.

    I agree 100%. Simply placing your server someplace in the Internet, even a single server in a cloud computing data center, is not true cloud computing. Anyone who is interested in the topic can find a rather detailed article about cloud computing, including lots of “how to” information, at https://en.wikipedia.org/wiki/Cloud_computing

    That article states, “Cloud computing is a type of Internet-based computing that provides shared computer processing resources and data to computers and other devices on demand. It is a model for enabling ubiquitous, on-demand access to a shared pool of configurable computing resources (e.g., computer networks, servers, storage, applications and services), which can be rapidly provisioned and released with minimal management effort. Cloud computing and storage solutions provide users and enterprises with various capabilities to store and process their data in either privately owned, or third-party data centers that may be located far from the user–ranging in distance from across a city to across the world. Cloud computing relies on sharing of resources to achieve coherence and economy of scale, similar to a utility (like the electricity grid) over an electricity network.”

    Anything less than a “shared pool of configurable computing resources” is not cloud computing.

    Like

We’re sorry, you should try our basic viewer. Perhaps not all that advanced.

Like

I wonder if this explains why it has been running so slow!

Like

Arthur J Mastera June 9, 2017 at 4:59 pm

How Google Copes When Even It Can’t Afford Enough Gear? Another application of the cloud.

Like

Ancestry is doing 2 transitions right now that may be in conflict.
The move to AWS and a new family tree synchronization interface.

Right now, today and for the last 2 months, they have suspended FamilySync and TreeSync because of technical problems. Yes that is even true if you are in the test group for Family Maker 2017. I understand the challenges they are having with sync technology and am waiting patiently for them to resolve everything.

Like

It’s just dawned on me – what happens if Amazon have a huge inferno or other major catastrophe to their cloud based system? I use Dropbox extensively but what would happen if they had a major catastrophe? We are relying on Cloud storage but does the Cloud have a backup plan?

Like

    —> We are relying on Cloud storage but does the Cloud have a backup plan?

    Yes.

    By definition, everything that runs on true cloud systems is fully backed up all the time in multiple locations. In addition, the servers are all run in parallel with different servers in different locations sharing the workload. If any server in one location goes offline for any reason, the servers in other locations take over within seconds in a manner that is usually invisible to end users. As a result, services on cloud computing platforms rarely go down for hardware reasons. (They are still susceptible to software problems, however.)

    That is one of the biggest advantages of the cloud: guaranteed almost 100% uptime regardless of fires, floods, earthquakes, tsunamis, forest fires, hardware failure, and other major disasters should never have any impact on the users of cloud systems. Most cloud services guarantee at least 99.9% uptime and a few offer even higher percentages than that, such as 99.999%.

    Wikipedia has an excellent introduction to cloud computing that starts with the following:

    Cloud computing is a type of Internet-based computing that provides shared computer processing resources and data to computers and other devices on demand. It is a model for enabling ubiquitous, on-demand access to a shared pool of configurable computing resources (e.g., computer networks, servers, storage, applications and services), which can be rapidly provisioned and released with minimal management effort. Cloud computing and storage solutions provide users and enterprises with various capabilities to store and process their data in either privately owned, or third-party data centers that may be located far from the user–ranging in distance from across a city to across the world. Cloud computing relies on sharing of resources to achieve coherence and economy of scale, similar to a utility (like the electricity grid) over an electricity network.

    You can read a lot more at: https://en.wikipedia.org/wiki/Cloud_computing

    PC Magazine also has an excellent article that explains cloud computing at http://www.pcmag.com/article2/0,2817,2372163,00.asp

    Like

Given the chaos and pushing 4 month delay in getting new fully functional software for their business from their spin off vendor, on top of their unexplained destruction of YDNA records of several years ago, one has to wonder what is going on with the company and why they chose now to sell it to the market?

Like

    Jeff,
    I suspect economics is the reason. It is probably much cheaper and less complicated to move to AWS rather than running your own data center.
    ancestry.com is a complex web site. During 2016, they constantly had outages in some of their services. Features would disappear from the home page for several days.
    I suspect they are doing a rearchitecture to simplify their site and to make it more reliable.

    Like

So leaving aside the technical details surely the next thing will be Amazon taking over Ancestry as a subsidiary.

Like

Leave a Reply

Name and email address are required. Your email address will not be published.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

You may use these HTML tags and attributes:

<a href="" title="" rel=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <pre> <q cite=""> <s> <strike> <strong> 

%d bloggers like this: