Some weeks back, somebody asked me “What’s the big deal with the cloud? I don’t even understand WHAT it is!”. This is a common problem, and one I’m going to try and clear up here and now.
Why is it so hard to define?
The cloud is so hard to define, because it is comprised of several different ideas and technologies. As I see it, the cloud comprises of the following things:
- File Storage
- Remote Computing Power
- Clustered Web Hosting
- Data Storage
- Web Applications
- Data Exchange
I will attempt to define all of these, and end with a real world scenario (though fake) of how several of these can be brought together.
There are, in my opinion, two large players in this market, Amazon, and through it’s Mosso brand, Rackspace. In addition, Google plays a large part.
Disks are cheap, we know that. You can buy 1TB for US$75, that’s peanuts! The problem is high availability and data throughput. This is where “old skool” CDN’s typical played a role. However, with the introduction of Amazon’s Simple Storage Service (S3), things changed. While there is little difference between the two services in terms of the reason you used a CDN; what S3 bought to the table was a unique pricing plan (no huge setup fees, just pay pennies for what you use) making it available to every company at every level, more importantly they also introduced an API.
Through the API, those looking for a standard CDN-type service, can upload their resources transparently as an integral part of their process. In addition many services capitalize on this API to provide non-CDN services, such as data backup.
Since the introduction of S3, Rackspace has also entered the space with it’s Cloud Files service.
Remote Computing Power
Another facet of The Cloud, is remote computing power, this originally took the form of Amazon’s Elastic Cloud Computing (EC2). The idea behind this service, is the ability to configure what I can best describe as virtual machines to perform specific tasks (i.e. crunch data). Then, using the API, you can “spin up” multiple virtual appliances using the disk image as you need them.
This means you have the resources of a giant enterprise company at your disposal, on an as-needed basis, and again, one of the breakthroughs is Amazons pricing: Pay for what you use.
In 2008, a small company loved by geeks around the world, entered into this space, SliceHost. Known (at least, by a savvy few) for their excellent VPS services, the introduction of an API, put them in direct competition with Amazon’s EC2. In October 2008, Rackspace purchased SliceHost and while SliceHost is still a separate company, the technology now powers Rackspace’s Cloud Servers offering.
Clustered Web Hosting
Clustered web hosting is nothing new. Companies have been creating clusters of servers for eons, for many tasks; ranging from number crunching, to data analysis, through to web servers and database servers. Where this space enters into The Cloud, is through a service like Rackspaces Mosso/Cloud Sites service. Like a traditional cluster, they provide high availability, lots of power and reliability. (Note: I use Mosso for this blog and a number of other sites)
However, where this becomes blended with the cloud, versus traditional clusters, is that Mosso operates one giant cluster, with huge numbers of websites using the same cluster, with the infrastructure in place to allow those sites to grow as large as they wish to autonomously and transparently as needs require.
Another (perhaps the first, but I’m not that familiar with them) player in this space, is MediaTemple’s Grid-Service.
You might ask yourself, what is the difference between File Storage and Data Storage? The answer is the same as what is the difference between the file system and a database.
This area is the newest addition to the cloud, and one I think most people saw as needed to really replace the old style non-cloud systems. The biggest player in this market is Amazon’s SimpleDB (beta), with Google’s BigTable service only available through their python-based AppEngine.
Arguably the meat of Web 2.0, web applications allow people to create, and work in the cloud without any knowledge of the technology. To them, data held by web applications is in the same place as their webmail. API access to integrate these applications into other services are a part of how they are used within the cloud. The obvious player in this area is Google, with it’s Gmail, Google Calendar, and other Google apps such as Google Docs.
Data Exchange using web services is the heart of Web 2.0: the mashup. Data exchange is not strictly part of the cloud, but web services are. Almost all of the cloud is interacted with using web services. In addition, thanks to the ideas of single sign on using OpenID etc, we are starting to see different facets of our data migrating across the websites we use to make it more useful and accessible — this is part of the cloud.
For this scenario, I’m going to make up some fictional scenario involving Twitter. I have absolutely no idea what they have technically going on, and have no idea if this is how they might handle the scenario; it’s just a well known scenario that could be solved using the cloud.
The scenario is this: Oprah joins your service, and suddenly you have an influx of a new users. In addition Ashton Kutcher and CNN are duking it out to reach 1 million followers.
You have 2 weeks to prepare, you could call your Dell representative and order 50 new servers, clone disks, and put them into your cluster… but what if it’s not enough? How do you spend that much money when the hype might only last 2 weeks? a month? The simple answer is, you don’t. Instead you configure a couple of EC2 or CloudServer instances, and as your load starts to ramp up, you simply initiate more and more appliance on-the-fly using their respective APIs.
Knowing that Oprahs show is going to air at a specific time; you might fire up several instances to get the ball rolling an hour before hand.
You have one appliance which will function as web servers for twitter.com, one for handling API requests, perhaps even split out registration to it’s own appliance, and then of course clustered copies of their traditional RDBMS (i.e. they’re not typically using Amazon Simple DB for their regular storage as it’s functionality just isn’t up-to-par).
You already have S3 in place for use avatars, but instead of calculating the filename hash on every request, or retrieving it from your local database, you push that into Amazon Simple DB.
And that’s it. As the load starts to drop off, you shut down EC2 instances, knowing if you get a sudden influx, you can always spin them back up.
Eventually, you get a handle on what your new average load will be (presumably, only some small portion of the initial influx of “zomg Oprah says this is awesome so it must be” people will stay) and then you can actually purchase the right amount of actual hardware to add to your own systems.
Or not. Keep it in the cloud. That’s a decision you can now make at your leisure, instead of scrambling to make your best guess in that two week period before things go nuts.
The reason the cloud is so hard to define, is because it’s no single thing. It is, like it’s namesake, nebulous. It is simply there, and will look like what you make it.
Please read Rob’s reply below, he is an employee of Rackspace, and usually (always?) the guy behind @mosso.
Rob La Gesse
I think you hit the nail on the head in many respects. A couple of additions I would add to to your initial data points about what cloud is – it is also contract free. And it is instant, or near instant (I just checked, it took me 38 seconds to spin up a new 1024K Cloud Server running Ubuntu.)
Availability and accessibility are a huge part of this discussion.
You also didn’t hit on support at all, which I think deserves a lot more focus than it is getting – for Cloud to permeate, customers need an SLA they can believe in (and people they can talk to). We think we offer a differentiating advantage there. A lot of people are good at racking servers and writing code – few companies are known as great service companies. We aim to be one of those well known great service companies. We also happen to be racking more servers for customers than anyone else on the planet. We are a support company though. First.
Another note – we have three cloud product offerings (so far!) – Cloud Sites, Cloud Files, and Cloud Servers. Note the spaces – two words – important to us :)
Finally – we agree with you – the hosted database is a huge issue. How can we scale databases in the cloud, yet build something open, and shared with the community – so everyone can benefit? We’re actually investing in third party database software (read more here: http://blog.mosso.com/2009/05/a-key-to-cloud-standards-the-cloud-database/).
We believe the community can build a better cloud over time – and for now, this early in the game, building open source platforms is probably at least as important as building open API’s.
Thanks for the time you spent on this.
Rob La Gesse
Director of Customer Development
The Rackspace Cloud
Thanks gents for the cloud explanation. Although familiar with the concepts, I hadn’t spent much time considering the real players/carriers. The example scenario was magnificent in its simplicity for getting a feel for the power of “the dial” for adding additional computing/storage/bandwidth.
Some related posts I enjoyed on the subject (I’m sure I’ll add my own once I read a few dozen more posts):
Creating the cumulus, Software will be transformed into a combination of services
Cloud Computing: A System of Control
Dark cloud computing (very curious of what we can learn from pirates)
Thanks to pj from HackerNews for pointing these out
Comments are closed.