Technology, Big Data

Why Big Data Means a Big Year for Hadoop

Blog-post ,

By Michael Friedenberg

CIO — You can't have a conversation in today's business technology world without touching on the topic of big data.

Simply put, it's about data sets so large—in volume, velocity and variety—that they're impossible to manage with conventional database tools. In 2011, our global output of data was estimated at 1.8 zettabytes (each zettabyte equals 1 billion terabytes). Even more staggering is the widely quoted estimate that 90 percent of the data in the world was created within the past two years.

Behind this explosive growth in data, of course, is the world of unstructured data. At last year's HP Discover Conference, Mike Lynch, executive vice president of information management and CEO of Autonomy, talked about the huge spike in the generation of unstructured data. He said the IT world is moving away from structured, machine-friendly information (managed in rows and columns) and toward the more human-friendly, unstructured data that originates from sources as varied as e-mail and social media and that includes not just words and numbers but also video, audio and images.

Given the rise of big data, I'm sure you're hearing the buzz around Apache Hadoop, the software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes (a thousand terabytes) of data. It certainly looks like the Holy Grail for organizing unstructured data, so it's no wonder everyone is jumping on this bandwagon. A quick Web search will show you that in just the past few months, companies including EMC, Microsoft, IBM, Oracle, Informatica, HP, Dell and Cloudera (to name a few) have adopted this software framework.

What I find even more notable is that companies such as Yahoo, Amazon, comScore and AOL have turned to Hadoop to both scale their businesses and lower storage costs.

According to some recent research from Infineta Systems, a WAN optimization startup, traditional data storage runs $5 per gigabyte, but storing the same data costs about 25 cents per gigabyte using Hadoop.

That's one number any CEO will remember.

So get ready for Hadoopalooza 2012. I'd love to hear what you're doing to tackle big data storage, so please drop me a line anytime.

Michael Friedenberg is the president and CEO of CIO magazine's parent company, IDG Enterprise. Email him at mfriedenberg@cxo.com.

 

(3) (3)

Discussion
Would you like to comment on this content? Log in or Register.
pcalento
Paul Calento 256 Points | Wed, 02/29/2012 - 17:40

Have no doubt that Big Data is a top trend driving IT investment, but outside of media/vendor/analyst circles the folks not talking about it are the IT leaders driving the trend. As an example, early in February, I started two conversations on LinkedIn: Hadoop-centric (at CIO Forum) and Big Data-centric (at CIO Network). Neither generated ant response. This is not the case with many other topics including cloud computing, CIO management, BYOD/personal device use/consumerization etc. Is there a vocabularly breakdown as it applies to Big Data?

--Paul Calento

(note: I work on projects sponsored by EnterpriseCIOForum.com and HP)

Goddardd
Doug Goddard 122 Points | Tue, 02/07/2012 - 15:15

" He said the IT world is moving away from structured, machine-friendly information (managed in rows and columns) and toward the more human-friendly, unstructured data that originates from sources as varied as e-mail and social media and that includes not just words and numbers but also video, audio and images."

 

That may be true for unstructured data related applications.  However, I don't think it is true when it comes to transaction-oriented applications, where 24-hour uptime and data integrity is demanded at a concurrency level. Old world relational databases are certainly not capable of handling the demands of the unstructured world but neither are the NOSQL databases showing a capacity for handling business oriented transaction systems. Once organizations start trying to use NOSQL data stores for transaction applications I'm certain there will be great nostalgia for something like SQL and the ACID model again; as the alternative will be to program all that stuff in the client applications. We are already seeing new relational databases emerge, architected for the big data world and data consistency, both.  

jdodge
John Dodge 1332 Points | Tue, 02/07/2012 - 17:49

There's tons of both and always will be....it's just that volumes of unstructured data will grow much faster....in part, because we thought it was useless and now see it as an asset to be mined. It was never really counted before except by capacity planners. And there's just more and more of it every second of every day, thanks to social media, video etc....

When does unstructureed get pushed into the background? Does it ever get pushed into the background (aka archived)?  

pearl
Pearl Zhu 89 Points | Mon, 02/06/2012 - 18:14

Hi, Michael, excellent blog about Big Data, also like a few statistics you put there, the data growth fact and the storage cost perspective, now big data becomes the significant part of organization's data life cycle management, from data storage, to data quality/integrity, data security and data analytics, it's both opportunies and risks, the right talent with holy spirit, the holy grail platform & process, and the cuttign-edge technology may just need get united to tame the beast, and see through the data to perceive the optimize prodcuts/services, the business trend and the better world. thanks.