Revving up Google App Engine
Last week, Google announced the fees for its Google App Engine (GAE) service.
Although the Web application hosting service was free before (and for moderate usage remains gratis), the new payment structure will allow organizations to use the service in a systematic fashion. In other words, GAE has gone from beta to production.
It was a bit of good timing last week when consulting company Information Concepts held a morning-long introductory seminar on GAE in the brightly colored Google offices in Reston, Va. Although primarily a company for Microsoft .NET projects, Information Concepts started a cloud-computing practice a few years back. Wayne Beekman, co-founder and co-owner of the firm, gave the presentation.
For those of you who are trying to figure out how this newfangled concept called cloud computing will work on the operational level, GAE is as good a place as any to start. And Beekman offered plenty to think about.
Most of the time was spent not on GAE, the details of which, Beekman admitted, are a bit anticlimactic. But, oh, was the conversation lively concerning the architectural issues that GAE raised, which is to say the issues that cloud computing in general brings up.
Basically, GAE is a Google-hosted platform that can run applications written in Python. (Other languages — such as PHP, Java and Ruby — are being considered.) With the downloadable software development kit (SDK) and a copy of the Python runtime, you develop your application on a local machine and then upload it to Google. Google will run the app and worry about bandwidth, CPU and storage issues. Google provides a dashboard that allows you to keep track of how often the application runs.
Beekman noted that the app engine could not be used by all enterprise applications. "It would be presumptuous to say that everything you need to do with every workload could be done by Google App Engine," Beekman said. “If you need a legacy stack to run some batch processing, this isn't the place to do that.”
But GAE is well-suited to transaction-based Web applications. And a surprisingly large number of Web applications fit that profile.
What is the advantage of using GAE instead of running those Web apps in-house? Scalability and costs, Beekman said. "It's all about building an application and deploying it quickly without investment in hardware or licensing," he said.
Usually, when an organization needs a new application, the project team building that app must requisition the necessary equipment from the IT department — Web servers, application servers and database servers. That approach can take a considerable investment of time and money before the first user is even served. Worse yet, the development team never knows how many servers you'll actually need.
Beekman offered a few numbers for consideration. Say you're setting up a Web application for 5,000 to 10,000 users. Conservatively speaking, you'd need three Web servers, two app servers and two database servers. Between buying the equipment and running it for three years, the total cost for the organization would be about $500,000.
The initial outlay for setting up that app through GAE? Zero, at least as far as hardware and software licensing costs are concerned.
Beekman didn't offer a long-term comparison of the costs of hosting an application in-house versus running it on GAE. But if you know the use characteristics of the app you plan to run, you can do a quick cost comparison. With GAE running in full-bore production mode, the bill accrues at 10 cents per CPU core hour, 10 cents per gigabyte of bandwidth inbound and 12 cents per gigabyte outbound, and about 15 cents per gigabyte of data stored.
However, beyond comparing those basic numbers, GAE also challenges how efficiently any IT department can respond to changes in utilization, Beekman said.
In the home-grown approach, if you buy too many servers for your app, you're wasting money because those servers are sitting idle. But if you order too few servers, then your users will experience slow — or nonexistent — service, which depresses the whole value of the service. Explained that way, Beekman made it seem as though running apps in-house is a losing proposition, though, in fairness, a shop that uses virtualization could shave a lot of that excess cost and underutilization.
In contrast, GAE can scale to however many users it serves. "That's the beauty if this," Beekman said. “You can put something out there, and whether it is a dirt path or 12-lane highway, the platform will expand to your needs.”
Computational, bandwidth and storage space are added on the fly in an automated fashion. The program manager doesn't have to designate those resources. Google adds them automatically and keeps track of how much you use, billing you accordingly. You can set filters that establish limits for how much you want to pay each month.
Thus far, about 45,000 apps have been built on GAE, and about 10 million developers have registered for the service. Because the SDK is open source, any new apps can be moved to other platforms, even internal ones, if Google's pricing grows too demanding. And data storage could be redirected to MySQL or some other database, Beekman said.
Although GAE's value proposition does sound fine and dandy, the audience had more than a few questions, many of which highlighted the limitations of using GAE, at least in its current incarnation. However, such concerns could easily arise over any cloud-based offering — not only Google’s but those from Amazon or Terremark.
For instance, when you keep your data in the cloud, where does it actually reside? Many organizations have regulations that prohibit storing their data out of the country. Also, knowing where the data is stashed is important in continuity-of-operations planning. Someone in the audience said his organization created a COOP plan that requires that data and apps reside in two separate geographic locations. Should a flood or earthquake or some other natural disaster wipe out the first data center, the second location would keep all the material safe and dry and the services running.
Here's the problem: Google does not divulge where it keeps your data, and the company makes no promises that your data will reside in the U.S. or be geographically dispersed.
In fact, the whole idea of keeping data in two locations would probably seem quaint to Google engineers.
The complicating issue is how Google saves data. When most people think of data being saved, they think of data being committed to a single database. But Google uses a more distributed approach, accomplished through a combination of the Bigtable distributed storage system and the Global File System. A Google engineer in the audience said the company considers a piece of data to be written only when it is written to three separate disks and the index is updated.
In other words, Google data is probably parsed out across multiple locations worldwide. And given how reliable the Google search is, that approach seems to work, even if pinpointing the geographic location of a piece of data is pretty difficult. But although that is probably the most secure approach engineering-wise, Google isn't talking in the language of the enterprise customer that needs to comply with a requirement that its data reside in two geographically distinct locations.
Beekman said Google would probably need to offer terms that are a bit more concrete for the enterprise market. Trusting in the technical goodness of Google in lieu of contract-enforceable specifics probably wouldn't cut the mustard with most government procurement officers, even as Google gives their managers the warm fuzzies.
Google’s distributed approach to storing data might also force some data managers to rethink how they store their information. If the data will not reside in a relational database, much of the relationships and other logic that used to be inherent in the database must be defined in the application layer instead.
"The app engine is not a relational database. You have to think differently," Beekman said.
The security issue came up as well. How do you know that your data and application can't be intercepted by other parties or even by other applications running in the GAE environment? Data separation was one of Gartner's seven security considerations for organizations thinking about moving to the cloud.
All the GAE apps are isolated, the Google engineer in the audience insisted. We later caught up with Google for more clarification: GAE starts each app as a single-threaded process. As traffic increases, the app is cloned into multiple processes. The various services that Google offers through GAE — such as the data store, data caching or access to e-mail —are accessible by remote procedure calls in a format called Protocol Buffers.
To keep apps from interfering with one another, each app is sandboxed through various techniques. Google won't divulge most of those techniques, but the company has said that Python libraries that could be used for snooping — the ones that rely on native code or allow a program to open a socket — have been removed from the library set available through GAE.
Uptime was another concern audience members voiced. Google offers no specific guarantees of how reliable you could expect the service to be, called a service-level agreement (SLA) by the industry. When your users come calling, you want to make sure the app is ready. The discussion of downtime is pertinent given that one Google service has had a few unscheduled downtimes of late.
Beekman argued that, to a certain extent, uptime concerns for cloud computing are a bit overblown. Whatever downtime that, say, Gmail users encounter in a month might be small compared to the average in-house Microsoft Exchange environment. Your in-house IT staff probably isn't as well trained in matters of keeping the servers running as is the average Google administrator.
It is worth nothing that, with the paid version of Google Apps, the company guarantees that your apps will be available at least 99 percent of the time in any given calendar month. Maybe the company will eventually apply the same SLA to GAE.
Ultimately, using Google to house data and apps takes a leap of trust, Beekman said, and he seemed to believe it is one worth taking.
"It's the same as moving your money out of the mattress and into a bank," Beekman said. "The give is trust, the get is the value proposition."
Posted by Joab Jackson on Mar 03, 2009 at 7:05 PM