Reaching For The Clouds


Cloud computing is a term which has been used quite a bit lately. It seems every other website offers some sort of a “cloud solution” for developers: Microsoft, Google, Amazon.com, and even Rackspace.com. And there also some confusion as to what the term cloud computing actually means. This blog post intends to define cloud computing from a developer’s perspective and compare some of the options available to developers.

What Is Cloud Computing?

For a developer, cloud computing is really a combination of development style and software deployment. Cloud-based development happens when developers build applications that are to be hosted on a third-party’s servers. These servers (or “the cloud”) host the application for the developers for others to use. Although this might sound a lot like a standard web hosting service, the key difference here is scalability and availability. Cloud providers usually allow developers to rapidly add and remove server resources, often by simply adjusting a configuration setting. This can be anything as simple as disk usage to something more complex, like virtual machine images. They also usually charge fees based upon processor/ image usage, as opposed to monthly fees.

For example, a start-up company that builds a .NET web service can have it hosted at almost any web hosting provider. They’d likely pay for a web host, domain, possibly a database, and bandwidth. Now, if that web service becomes very popular, the company would need to contact the host, see if larger servers could be used, bandwidth increased, and so on. But what happens if there are no larger servers? What happens if the host can’t meet the needs developers uptime requirements? At this point, a company which developed the service would host it at their own location. They’d pay for the server(s), the internet connection, the support staff, and so on. They might also have to re-code the web service to take full advantage of the computing power now available in the locally hosted environment. We can see the costs rising here.

Enter the Cloud. Cloud computing offers an environment hosted by a third-party which handles the hardware, bandwidth, support staff, and the internet connection. More importantly, it provides all of the raw computing resources the web service might need: storage, web access, and so on. As more resources are needed, they are added dynamically.

That is cloud computing at the most basic level. There are differentiating features that each cloud provider offers as well. Some offer built-in workflow support, some offer integration with certain authentication mechanisms, and so on. Although outside the realm of this blog post, there are other definitions of cloud computing, usually involving “software as a service” (SaaS). These include a provider on the internet providing distinct services, such as weather, current traffic conditions, or even credit card payments.

What’s The Difference Between The Cloud Providers?

Although not an exhaustive list, here are some of the major players in the cloud computing arena.

Microsoft Azure

Microsoft’s foray into the cloud is its Azure platform. Azure is in a “Community Technology Preview” status at this time. It offers a set of developer services which allow .NET Applications to be hosted and processed via one or more Worker and Web Roles. These applications can take advantage of a variety of features. These include Access Control (allowing integration with standards-based authentication providers), Service Bus (allows connectivity between application across the internet), Workflow Services (for workflow management), Live Services (offering syncing between a diverse range of hardware), and SQL Data Services (which is soon going to support relational database solutions). Also announced is upcoming access to SharePoint and CRM functionality. Since this in a CTP status, there are no charges at this time for the services. However, this will undoubtedly change in the future, and Microsoft has not announced any anticipated pricing models.

Google App Engine

Google is offering their Google App Engine as a way for developers to start developing for the cloud. It currently supports development using a limited version of Python 2.5. This is mainly due to prohibitions on using the C libraries and file I/O. The HTML portions of the applications written can use JavaScript (for AJAX development) and some Python frameworks can be imported. Google App Engine (or just “app engine”) supports a datastore for development, and it uses a SQL-like language called GQL. The actual storage of the data is not in a traditional relational model, rather, it uses a DataStore / Entity / Property model. This might require a different data storage design than a developer is usually used to. Application access is provided by Google Accounts. There are free quotas for basic development, and charges are applied for emails / bandwidth / data access after the quota has been met. Google also offers high-speed URL fetching, mail, key/value cache, and image manipulation services for Google App Engine applications.

Amazon Web Services

Amazon has a series of cloud offerings under an umbrella called “Amazon Web Services.” These are not Web Services of the W3C definition. Rather, this is a brand name, which includes “Elastic Cloud Compute Cloud” (also called Amazon EC2), “Amazon DB”, “Amazon Simple Storage Service” (also called Amazon S3), “Amazon Cloudfront”, and “Amazon Simple Queue Service (also called Amazon SQS). To avoid confusion, all references to the W3C defined web services will be called web services (in lower case). The Amazon umbrella term will use capital letters.

Amazon EC2 offers virtual servers to clients for development and hosting. In addition to offering to quickly add new virtual server instances within minutes, they offer a variety of operating systems for those instances. These include OpenSolaris, Linux, and Windows Server. Amazon offers billing based upon capacity that is used and also for the instance types used. For example, a standard “small” instance of 1.7 Gb of memory and a single virtual core cab cost cost very little, while the highest CPU on-demand instances (8 virtual cores and 15 Gb of memory) cost a lot more. There is no development platform specific for the Amazon Cloud, however, web services are available for configuration.

Amazon SimpleDB offers a basic data storage system for development.This is in a “Domain –> Item–> Attribute” model and is not relational. Developers access this by web service calls and a SQL-like query.

Amazon S3 is a service which provides non-SQL based data storage of files up to 5Gb. This is essentially a fast, reliable means to store individual files for many people to access across the Internet via web services.

Amazon Cloudfront is a service which provides rapid distribution of popular content. It supports high transmission speeds with low latency. Access is through a distribution’s domain name, as opposed to web services (like Amazon S3).

Amazon SQS provides queuing services via SOAP and Query web services. There is no guaranteed FIFO (first in, first out) support and this must be handled at the application level.

Rackspace

Rackspace offers some basic “cloud” services. Through the “Mosso” brand name, RackSpace offers what they call “Cloud Sites”, “Cloud Files” and “Cloud Servers”.

Cloud Sites offers a basic way of creating additional scalable websites very quickly. Billing is done on a computational cycle system (in addition to bandwidth and SAN storage).

Cloud Files allows high performance hosting of files and static content through REST APIs.

Cloud Servers is an offering similar to Amazon’s EC2. While just launched, it offers dynamic creation of Linux instances

In summary, many companies are reaching out to developers to provide some form of hosted services. While trying to navigate through all of the offerings may be confusing, developers should take the time to research their options and make an informed decision. By having your application hosted on or interacting with a cloud, you will have access to different services which could well provide a needed competitive edge.

Microsoft AzureGoogle App EngineAmazon Web ServicesRackspace (Cloud Servers)
Development Language.NET Languages (Primarily C# and VB.NET)Python 2.5Only limited by OSLimited by choice of Linux OS
Native Authentication/AuthorizationYes - Access ControlYes - Google AccountsNoNo
Persistent Data StorageYes – Relational model being createdYes – Non-relationalYes – Non Relational with Amazon SimpleDBOnly Static content via Cloud Files
Native Hardware Sync SupportYes – Live ServicesNoNoNo
Native Service BusYesNoNoNo
Native “Workflow” SupportYesNoNoNo
Native CRM Support PendingNoNoNo
Image Manipulation ServicesNoYesNoNo
URL FetchingNoYesNoNo
Need to maintain servers (apply updates, install applications, etc.)NoNoYesYes
Physical Processor GuaranteeYesNoNoNo
Physical Memory GuaranteeYesNoNoYes
Native Queue SupportYesNoYes (FIFO is not guaranteed)No

* Please note – this list is not exhaustive of features. These services have been researched and the information in this posting is believed to be accurate at this point in time. However, since many of these services are in some form of “beta”, feature sets will likely change in the future.

Improving upon vanilla virtualization with cloud computing

In our article on Developing a Software Energy Usage Profile (shameless self promotion - I know), my colleagues Rajesh Chheda, Steve Stefanovich, Joe Toscano and I discussed the value of assessing the energy cost of your business applications. Reducing the storage, processing, and network bandwidth required by your software applications has a direct result on its energy consumption. One of the remedies we prescribed for energy hungry applications was virtualization. I'd like to build on this idea by including cloud computing as an option.

What are the differences between Virtualization and cloud computing?
In the past, businesses paid for and maintained their own servers. In exchange for complete control of their server farm, businesses are responsible for purchasing the hardware, installing the operating systems, configuring the network, paying for the cooling costs, and replacing damaged equipment.

Virtualization helps by reducing the costs of maintaining your own infrastructure, but it does not eliminate them. A hosted solution removes some of the work associated with the server maintenance and also offers an economy of scale. A hosted solution doesn’t necessarily have the computing power to scale out however (not as easily as a real cloud solution).

In general, there is a considerable amount of virtualization involved in cloud computing, but it differs primarily in the fact that this virtualization occurs someplace else. You don't know how many machines are involved, what else they're doing etc. Microsoft's Azure platform and Amazon's offerings (Elastic computing, Simple DB, Amazon Web Services) are examples of cloud computing. These services remove the need for you to focus on the hardware specifics of your deployment, allowing you to concentrate on your business requirements.

Cloud computing would be an incremental advance on basic virtualization if that was all that it provided, but there many other advantages.


While virtualization allows you to scale your application out as necessary, you are still limited by your available number of servers and your network capacity. Organizations faced with unpredictable application usage must maintain a large, fallow server farm to support the peak demand. Cloud computing pushes this requirement off to the cloud where there is greater capacity. Moving your application to the cloud eliminates the need to maintain extra application servers for those “what if” scenarios.
Services provided by Amazon and Microsoft also are distributed across the globe, increasing fault-tolerance and reducing latency to users.

Conclusion
Virtualization and Cloud computing both provide a flexible alternative to traditional application deployment, doing so in a way that encourages developers to focus on their software Energy Usage Profile. Cloud computing provides additional benefits beyond simple virtualization, reducing the infrastructure cost to a product of CPU, Storage, and Bandwidth. These are the same factors developers need to consider when writing greener applications.
Prior to cloud computing, businesses, both established and starting up, were forced to devote considerable capital to server farms. If unused, this hardware does nothing but convert costly electricity into heat which must be removed via cooling requiring additional energy. Even if fully utilized, this hardware is expensive to maintain and replace in regular cycles. The advent of cloud computing allows more effective and efficient use of processing power, energy, and your budget.

MIX09 Keynote Recap

Today's MIX 09 Keynote in Las Vegas was hosted by Scott Guthrie. As much as I had hoped to attend, my Vegas trip didn't materialize. However, I watched the live stream and, to be honest, I was blown away by the sheer number of announcements. Here's a consolidated summary, with links to the various downloads.

Expression Web 3
  • SuperPreview feature – view rendering across all browser types and versions. Split-screen designer, fullscreen-mode, allows comparison of “baseline” browser vew and alternative browser view side-by-side. Also “overlay” mode – stacks two browser views, to help identify differences.
  • Supports both local browsers and cloud service, to allow for rendering on browsers not installed locally (e.g. Safari for Mac)
  • Easy to compare multiple versions of IE without having each browser version installed
  • Beta available right now: http://www.microsoft.com/expression/try-it/blendpreview.aspx
ASP.NET 4.0
  • Web forms: more control over viewstate, html markup, improvements in data binding and url routing
  • Improvements to Ajax stack and jQuery support
  • Client–side templates and databinding, along with additional REST support
  • Velocity caching engine
Visual Studio 2010
  • Improved JavaScript, Ajax, jQuery scripting support, including IntelliSense
  • SharePoint editing/debugging within Visual Studio IDE
  • Publishing and deployment improvements, including database deployment
  • Still a CTP - get the latest here
IIS7
  • Ten new extensions available, including secure FTP, WebDAV, app request routing
  • Grab the new extensions here.
Microsoft Web Platform Installer
  • Provides single source for all components required for Web platform installation.
  • Version 2 (beta) already includes support for just-released ASP.NET MVC 1.0.
  • Grab the latest v2 beta here.
Microsoft Web App Gallery
  • Free apps to download and use
  • Anyone can add an app to the gallery
  • Visit the gallery here.
Commerce Server 2009
  • Available now.
  • View detailed information here.
Azure Services Platform
  • FastCGI support (allows 3rd-party programming languages like php)
  • .NET full trust
  • Raw ADO.NET support
  • View details here.
Microsoft BizSpark
  • New program designed to help startup companies get up and running quickly
  • Includes software and licensing support for 3-year period
  • Includes marketing, business development opportunities, hosting partners
  • Free to qualifying startups
  • Read details here.
  • Demo performed by startup StackOverflow.com, a BizSpark member
Silverlight V3
  • GPU acceleration support (on Windows and Mac)
  • New CODEC support (h.264, aac, MPEG-4)
  • Raw bitstream API
  • Improved logging for media analytics
  • Perspective 3D
  • Bitmap / pixel API
  • Deep linking, navigation page framework (for interfacing with browser navigation)
  • ClearType support
  • Multi-touch support
  • Library caching (download and cache to local system)
  • Data binding improvements
  • Multi-tier data (support for data context updates in Silverlight pushing updates to the server)
  • Binary XML
  • Out-of-browser experience - Silverlight app runs like a desktop app, yet still within a sandbox (and still dependent only on the Silverlight runtime bits). Supported on both Windows and Mac
  • Offline awareness (with network status change events)
  • V3 download is apparently 40K less than V2 download!
  • Get the latest beta here. Warning: once you install the Beta tools, you now have a Silverlight 3-only development environment, so maybe go with a VPC...
IIS Media Services
  • New product, available today.
  • Includes live streaming (in beta, available today)
  • Edge-caching. Akamai has already announced services to support this
  • For more info and installation details, go here.
Expression Blend 3
  • Includes new SketchFlow tool offering sketching/prototyping, complete with transitions, collaboration tool (free SketchFlow player), all built into Blend.
  • Version control support
  • Xaml IntelliSense
  • Photoshop PSD import, including preservation of layers (as well as the ability to selectively toggle layers)
  • Grab the preview here.

LINQ on Objects Performance

One of the major additions to .NET Framework 3.5 is LINQ. Recently, I have decided to learn LINQ to Objects, that is, applying LINQ on collections. I personally find LINQ quite easy to use once you understand the building blocks of the feature. However, I start to wonder what kind of performance LINQ has. Therefore, I have decided to create some tests and compare LINQ with other iterating features in C#.

This article assumes the reader already has a basic understanding of LINQ. If you don’t know LINQ, I highly recommend reading the series of BLOGS written by Sahil Malik: http://blah.winsmarts.com/2006/05/17/demystifying-c-30--part-1-implicitly-typed-local-variables-var.aspx.

In C#, there are several ways to loop through a collection. You can use the ‘for’statement; ‘foreach’ statement; the Collection.FindAll() with a predicate; and the newcomer LINQ. In all my tests, I am going to test the performance of using all these different iteration constructs to loop through two different types of lists:
  • List<int> - A list of value type, in this case, integers.
  • List<Student> - A list of reference type, in this case, a class called Student. The class has a public property called ID, which is an integer.
The idea is to determine whether the performance of LINQ will make any difference between value type and reference type. Each list contains 10000 items. So for the integer list, the value of each item in the list will be in the range of 0 to 9999.
int listCount = 10000;
List<int> list = new List<int>();
for (int i = 0; i < listCount; ++i)
{
list.Add(i)
}
And for the Student list, the value of ID of each student will be in the range of 0 to 9999.
List<Student> list = new List<Student>();
for (int i = 0; i < listCount; ++i)
{
list.Add(new Student(i))
}
Each list is going to be iterated by using 'for', 'foreach', 'List<T>.FindAll/Find’ and ‘LINQ’. Moreover, each construct will iterate each list with different criteria to return different results. The following shows the results returned by each test:
  1. Returning an empty list. This scenario is our control case. The search criteria will be setup in a way that the entire list will be iterated, but no item is added to the returned list. This is done by searching for (x == -1) for List<int> and (student.ID == -1) for List<Student>. The LINQ queries for the two lists are:
    var testList = (from anItem in list where (anItem == -1) select anItem).ToList<int>();

    var testList = (from anItem in list where (anItem.ID == -1) select anItem).ToList<Student>();
  2. Returning a list with one item. In this scenario, the search criterion will be setup so that every item in the list will be visited, but only one item is added to the returned list. This is done by searching for (x == 1) for List<int> and (student.ID == 1) for List<Student>. The LINQ queries for the two lists are:
    var testList = (from anItem in list where (anItem == 1) select anItem).ToList<int>();

    var testList = (from anItem in list where (anItem.ID == 1) select anItem).ToList<Student>();
  3. Returning a list with multiple items. In this scenario, the search criterion will be setup so that any item that carries a value of even number will be added to the returned list. In short, half of the items in the original list will be added to the new list. The LINQ queries for the two lists are:
    var testList = (from anItem in list where ((anItem % 2) == 0) select anItem).ToList<int>();

    var testList = (from anItem in list where (anItem % 2) == 0)
    select anItem).ToList<Student>();
  4. Return the first item in the list that matches a criterion. In this case, the value (an integer or a Student instance) will be returned instead of a list. The criterion is setup so that the whole list is iterated. This can be done by searching for (x == 9999) for List<int> and (student.ID == 9999) for List<Student>. This is where I invoke List<int>.Find() and List<Student>.Find(), instead of calling FindAll(). The LINQ queries for the two lists are:
    var firstItem = (from anItem in list where (anItem == listCount-1) select anItem).FirstOrDefault<int>();

    var firstItem = (from anItem in list where (anItem.ID == listCount-1) select anItem).FirstOrDefault<Student>();
At the end, we will be performing 32 tests altogether (16 List<int> and 16 for List<Student>), as shown in the following table:


Return Empty ListReturn a list with one itemReturn a list with multiple itemsReturn the first item
ForXXXX
ForeachXXXX
List<T>.Find or List<T>.FindAllXXXX
LINQXXXX


I created a console application for each test, and hence there are a total of 32 applications. In each test, the list will be looped 100,000 times repeatedly. To compare how long each test takes, I use the .NET framework’s System.Environment.TickCount to retrieve the tick count right before and after each test. The difference between the two tick counts give us the total number of tick count a test takes. For example, this is the code snippet on how I get the tick count of using LINQ to find the first item in a list with type List<int>.
int count1 = System.Environment.TickCount;
for (int i = 0; i < 100000; ++i)
{
var firstItem =
(from anItem in list
where (anItem == listCount-1)
select anItem).FirstOrDefault<int>();
}
int count2 = System.Environment.TickCount;
double result = count2-count1;
In order to create a fair result for each test, I also impose extra rules to each test:
  1. Each test will be run 10 times in total.
  2. The tick counts from the first test is discarded. This is to ensure that the JIT compiler won’t be affecting the results since the IL code is compiled by the JIT the first time it is loaded.
  3. An average of the tick counts of the remaining 9 tests will be calculated. This is to average out any discrepancies caused by other processes that are running on my computer.
  4. All tests are built in release mode with Visual Studio 2008 SP1. The tests are run from the command prompt instead of Visual Studio.

The following table shows the results in milliseconds for the List of integers.


Return Empty ListReturn a list with one itemReturn a list with multiple itemsReturn the first item
For0.2254
0.22530.6750.1854
Foreach0.3572
0.3611
0.7555
0.3586
List<T>.Find or List<T>.FindAll0.6145
0.6152
0.9025
0.6580
LINQ0.8570
0.86231.95220.8564


The following table shows the results in milliseconds for the List of reference type (Student).


Return Empty ListReturn a list with one itemReturn a list with multiple itemsReturn the first item
For0.3812
0.38220.98460.2541
Foreach0.4234
0.4680
1.0113
0.4258
List<T>.Find or List<T>.FindAll0.6141
0.6184
1.3440
0.6148
LINQ1.5797
1.5820
3.0297
1.5799


The results show some interesting facts.

For vs Foreach
There is always a debate whether foreach or for loop is faster. On one hand, a foreach loop is slower because it is expanded to something like the following:
using (IEnumerator<int> enumerator = list.GetEnumerator())
{
int item;
while (enumerator.MoveNext())
{
item = enumerator.Current;
}
}
A foreach carries the overhead of calling MoveNext() and Current within the loop. Moreover, at the beginning, it needs to call the virtual method GetEnumerator(). At the end, it has to dispose the enumerator object as well.

On the other hand, for loop can be slower because it has to make more virtual method calls. Here is an example:
for (int j = 0; j < listCount; ++j)
{
item = list[j]; <-- virtual method call
}
In fact, when I first wrote the tests, the foreach loop was actually faster than the for loop. However, after I made a little optimization to the for loop, it outperforms the foreach loop. This is what I have changed:
for (int j = 0; j < list.Count; ++j)
to
int listCount = list.Count;
for (int j = 0; j < listCount; ++j)
Overall, I consider the performance difference between for and foreach as minimal. I prefer using foreach statement because the code is cleaner and easier to read.

List.Find()/FindAll()
It is a known factor that calling Invoke() in a delegate is expensive. Jan Gray has written an article on performance of managed code back in 2003 which could be found here: http://msdn.microsoft.com/en-us/library/ms973852.aspx.
Note that the article was written back in 2003 and the compiler has improved a lot since then. Nevertheless, using an anonymous method in a FindAll() and Find() call is still slower than For and Foreach.

LINQ
Here comes the real reason of this article. As you can see, LINQ performance is the worst among all the test sets. The inferior performance of LINQ can be explained by looking at what a LINQ statement actually gets translated to. Here is an example:
tempList =
(from student in list
where student.ID == listCount-1
select student).ToList<Student>();
Through Query Expression Translation, the above LINQ statement is translated to
this.list.Where<Student>(student => (student.ID == listCount-1)).ToList<Student>()
  1. The LINQ statement uses a lambda expression which is basically a delegate behind the scene. As mentioned earlier, delegate is expensive.
  2. It uses the extension method “Where” to do most of the work.
  3. Once the Linq executes, it creates and returns an IEnumerable<T> collection.
  4. It has to call the IEnumerable.ToList(), which clones another List<T> collection object.
There is no free lunch in using LINQ on collection. The simple LINQ statement is actually translated back into comparable C# code which could be expensive to run.

Conclusion
We can see that LINQ performance on iterating a generic List is worse than other C# construct. However, readers should not discard LINQ based on these simple tests. This is because LINQ does give us some compelling reasons to use:
  1. LINQ statement is easy to write and understand. Code management also carries a lot of hidden value in software development. By using LINQ statement on more complicated construct, developers can understand the code easier than complicated for and foreach statements.
  2. The generic collection has other extension method likes GroupBy or OrderBy which are quite powerful. It is definitely easier to write a LINQ statement with Select, Where and GroupBy than for or foreach statement. Moreover, you may end up giving up a good portion of performance gain from for and foreach due to the complicated custom sorting and grouping code.
  3. The LINQ statement can be assigned to a variable. That implies you can reuse the same query statement or pass it to a method.
  4. Since a LINQ statement returns an IEnumerable, you can run subquery on that LINQ statement using the extension methods of IEnumerable. For example:
    tempList =
    (from student in list
    where student.ID == ListCount-1
    select student).ToList<Student>();
    tempList.All(student => student.ID == 10);
  5. Microsoft will continue to improve the performance of LINQ. The performance issues that we see currently may not be such a big factor in the future.
Nevertheless, readers should be careful in selecting which type of construct to use based on the situation. If you are writing GUI code, the readability and simplicity of LINQ may outweigh its performance lag. Remember, human beings are always a lot slower than machine. On the other hand, if you are writing high traffic system level code where performance is critical, you may want to reconsider using other C# construct than LINQ.

I would like to thank my co-workers David Makogon and Damon Squires for reviewing this article and giving me valuable feedback.

WCF and MSMQ

The third in my series of articles on WCF is now available on DevX. The article explores how a WCF application can be used as a client for an existing MSMQ based service using the MsmqIntegrationBinding. Next month, I’ll cover using WCF at both endpoints with NetMsmqBinding.