Massive Data in the Cloud
A significant portion of Barry Briggs's (CTO for Microsoft IT) recent talk at MIT on cloud computing was about data, and for good reasons. The cloud creates tremendous amounts of data. These huge datasets can be mined: Walmart claims their POS data can track a flu epidemic as well as the CDC just by looking at the sale of flu medicines. But Barry also touched on the limitations of even today's tremendous network bandwidth in transferring massive amounts of data. The engineers at the Large Hadron Collider, which generates 27 TB of data per day, looked at their options for the quickest and most economical way to move those volumes of data from one part of the globe to another. They discovered that shipping the actual hard drives overnight was faster and cheaper than using the network.
Data-as-Service and NoSQL
Part of the Azure suite is codename Dallas, which aggregates paid feeds like D&B and free feeds like www.data.gov into the SQL Azure cloud, Microsoft's Cloud version of SQL. Dallas uses protocols like Open Data Protocol (OData) to provide access. However, not everyone thinks SQL is the answer for the scale of data in the cloud. The last year or so has seen the emergence of the "NoSQL movement," which promotes the use of non-relational databases that many claim are more highly scalable tools better suited to the needs of many cloud applications. This has been borne out by the choices of major cloud application providers: Facebook created Cassandra, Google created BigTable, Twitter created FlockDB, and Amazon created Dynamo rather than using the traditional SQL databases.
Data Democratization and Visualization
PowerPivot (formerly codename Gemini) was recently released by Microsoft as part of their campaign to "democratize BI." It is an add-on to Excel 2010 that allows the end users to access millions of rows of data from remote systems, and do ad hoc analysis without requiring an SQL programmer. Internally Microsoft employees informally refer to it as the "safe needle program for BI." It supports rapid and easy creation of various charts and dashboards. For many types of analysis, visualization is a much better way to analyze data. This goes way beyond traditional charts and graphs and is expounded in the fourth dimension series.
Our ongoing Cloud Series discusses these and many more issues, in our effort to bring clarity to this whole arena.
To view other articles from this issue of the brief, click here.