Business Issues, CIO Leadership, Cloud

Big Data Double Jeopardy

Blog-post by,

Big data comes from everywhere or anywhere: existing data bases, mountains of unstructured and semi-structured documents, entirely new sources on the web or in the evolving “network of things” in the enterprise. It creates unprecedented volumes of data to winnow and analyze for meaning. The presence in the data streams of duplicate data—which may be necessary, as the same blocks of information may mean different things when they are found in different contexts—create a kind of double jeopardy for the enterprise.

First, of course, there’s just the question of where to keep it all. Storage capacity is cheap and getting cheaper to acquire, but it isn’t free, and even free capacity is not free to configure, manage, and safeguard.

Faced with the need to cope with new volumes of data, you have an opportunity to drive storage efficiency deeper into the data center. De-duplication is the key technology for making storage itself more efficient (where content management and the like may make our USE of storage more efficient). Making sure that you only store one copy of data, at the file level or better still a the block level, can vastly reduce the amount of space a given business’s data consumes. This reduces the load on backup systems as well, and by extension reduces WAN loads generated by storage array replication and by backup among sites.

Second, there’s the problem of how to make use of the big data. Analytics software turns the mountains of raw data into manageable foothills of useful information. However, its speed and efficiency will be to some degree determined by the volume of data it has to retrieve from storage, ingest and process. Any reduction in the volume retrieved from storage will improve performance.

Not all kinds of data can be deduplicated in this context, and there may be tradeoffs in performance that undercut the advantages.

Bottom line: A savvy CIO will try to avoid the double-jeopardy pinch of redundant data in an emerging big data environment, and examine closely the roles that deduplication can play.


Would you like to comment on this content? Log in or Register.
Paul Calento 255 Points | Mon, 06/25/2012 - 17:56

Re: Big Data Double Jeopardy. John, how do you prioritize priorities if there are conflicting points of context? Should you prioritize at all? Role of metadata? Analytics tool/solution? Something else entirely?

--Paul Calento

(note: I work on projects sponsored by and HP)

Pearl Zhu 90 Points | Mon, 06/25/2012 - 16:28

HI, John, interesting perspectives about Big Data, from Data Storage into Data Analytics, how to transform double Jeopardy into success story takes both technology and methodology, beyond deduplication, storage tiering, thin provisioning, big data framework., are all need considered in order to ahieve high level efficiency, in addition, every bit of data is important, but not every pc of info is created equal, how to handle data life cycle management, retired old data.., are all great challenges. thanks