Technology, Cloud

Should big data reside in the Cloud?

Blog-post by,
HP Blogger

Have you read this year’s predictions? Where it was all about cloud last year, this year the number one topic of discussions is “big data”. It looks like Cloud is already old news for most of the analysts.

Let’s first define the term, so we’re sure what we are talking about. According to Wikipedia, big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set. Examples include web logs; RFID; sensor networks; social networks; social data (due to the Social data revolution), Internet text and documents; Internet search indexing; call detail records; astronomy, atmospheric science, genomics, biogeochemical, biological, and other complex and/or interdisciplinary scientific research; military surveillance; medical records; photography archives; video archives; and large-scale e-commerce.

Managing the data

Now, that’s clear. Actually, the world’s ‘digital universe’ is in the process of adding 1.8 Zettabytes in 2011 with continuing exponential growth – projecting 8 zettabytes in 2015 and 35 Zettabytes in 2020. (by the way, in preparing this post, I learned a new term, a zettabyte), 70 percent of that data is generated by individuals and 85 percent consists in unstructured data. We call that human information. Every second 97,000 tweets are added, every minute 12 million texts and every day 294 million emails.

Did you know that today, a single commercial flight across the U.S. generates 240 terabytes (TB) of wireless sensor data? The key issue is no longer capturing the data, but actually storing it. In a currently ongoing project, we gather 1.35 TB in a 15 minutes experiment using 1 million wireless sensors. With a stable 56MB wireless link, we need around 42 hours to gather and store the data, so new data transfer mechanisms and approaches need to be invented. It’s interesting to realize that over the last 10-15 years, we have gone from not enough data to be able to take decisions to complete data overload. Now, we still cannot take decisions as we are often unable to sift the relevant data from the noise.

Whether it is for disaster recovery, for back-up or to perform compute intensive analytics or calculations, the time and cost of storing the data in the cloud is often forgotten. So, the first question to be raised is whether the data needs to be in the cloud in the first place, or whether a hybrid approach, integrating public cloud with enterprise IT resources (being it private cloud or legacy), should not be taken. Let’s look at an example where we combine social networking data, already in the cloud, with enterprise information.

Understanding your customer behavior

If you want to know what the world thinks about your product, your brand, your services, you better take a look at tweets, blog entries and forums. In the past, people moaned about how bad a service was at the local bar, today they do it on Twitter. You can no longer ignore that fact if you want to stay competitive.

A senior business and technology executive survey we commissioned showed us that enterprises typically only leverage 5 percent of the available information, that 48 percent do not have an effective information strategy in place and that only 2 percent can deliver the right information at the right time to support enterprise outcomes 100 percent of the time.

The social media data I talked about is located in the cloud, by definition. But ideally, companies want to cross-correlate this data with their own customer information. Actually HPLabs did just that with their “project fusion.” They learned to predict customer behavior by merging social media and company data. Obviously many larger companies are not interested in migrating all their customer data to the cloud so it’s key to be able to integrate data from multiple sources into such common analysis.

HP’s Approach

HP is conscious of the importance of providing the ability to search not just through structured data, but also to take advantage of being able to scan through the huge amount of non-structured data. By combining Vertica’s Analytic Platform focused on the analysis of structured data, with Autonomy’s Meaning Based Computing approach, HP is now offering you an environment through which you can really understand what’s happening. The combination of multiple information sources allows you to keep the data where it is while taking full advantage of the information embedded in it. This is what we call the Human Information Era.

So, big data may still be hype, but the data is there and enterprises need to take that into account. Tools exist today, as we demonstrated with project fusion and there is more to come. You really want to look at this because it may give you an unfair advantage in doing business. And if you don’t do it, your competitor might. That would be a real pity, wouldn’t it?


(1) (1)

Would you like to comment on this content? Log in or Register.
Paul Calento 255 Points | Fri, 01/27/2012 - 18:50

Seems like the challenge with Big Data may be dealing with the myriad of data sources, some that will reside pragmatically in the cloud, along with legacy data sources and additional unstructured data (similar to what HP did with Project Fusion). The issue, to me, may be context. While Big Data is an opportunity, it is also a problem, as well. Every additional data source creates an additional amount of complexity, wouldn't it? The key is distinguishing real, predictable patterns from just data, I'd imagine.

--Paul Calento

(note: I work on projects sponsored by and HP)

Pearl Zhu 90 Points | Tue, 01/24/2012 - 18:34

HI, Christian, insightful blog about big data, I would say, majority of Big Data come from cloud, it may also need reside at loud at certain way, the goal of big data is about predicting the future trends: such as the customer behavior as you pointed out, or the collective wisdom via employees, partners or public about improving products or service

So big data is not the end, it's the mean to the end, yes, how to deliver the single version of truth, how to integrate it with operational data, and how to manage the full life cycle of data, both unstructured and structured, all are great challenges and opportunities for any business today. thanks.