Technology, Cloud

Hadoop tackles difficult job of getting Big Data into the cloud

Blog-post by,

I read a blog the other day  which got me thinking. Interesting blog The blog challenges the appropriateness, or lack of, cloud as a solution for the burgeoning growth in Big Data. It doesn't say it's NOT a solution, but says technical issues should prevent it becoming the panacea.

The problem lies in the sheer scale of data growth and how to actually transfer it over to a cloud environment. It suggests many enterprises are shipping data via USBs to off-sites. I can't imagine many CIOs being impressed with that solution. Being a Technology Consulting marketer and having some interests in Big Data from the Hadoop perspective I already recognise that we've all only just finished waxing lyrical about Cloud and now comes along Big Data to stimulate the panic juices. And so I read into it more to understand the scale of the problem. I have an interest in the topic from the Hadoop perspective, but challenging cloud as a solution was a new concept for me. It turns out that the analysts believe current data, including unstructured exceeds current storage by more than twice. Not only that, but unstructured data is growing at twice the rate of structured data and that by 2020, depending who you listen to, there will be between 85 and 140 ZB of data. Bringing it down to the level of one organization, if YOU have 1PB today, in 5 years, you will have around 12PBs and then it goes exponential. Between you and I that's a lot.

So why the challenge to Cloud? Well, the challenge is in getting it there. How are you going to move all that data? Moving 1TB through an OC3 will take roughly 2 days, assuming it doesn't drop packets. The other chin-rubbing objection is security. Also, it's not just about moving it, there's the interface with it too - acquiring and processing it before getting it securily stored. Then, what about accessing, using all that unstrucutred data for mining purposes, before finally archiving and destroying it when Sarbanes-Oxley allows you to.

It's at times like this that I'm happy to address just the Hadoop element of migration to the platform and analytics - it isn't easy, but I have access to big brain consulting principles and architects. And, in all honesty, my other big brain engineers design and build data centers, so that's OK as either YOU need them because you need all this extra space, or the cloud host needs them if you risk going there. Both ways I'm covered.

So here's your choice, having accepted the algorithms on data growth and actually using that data. And let's not forget, this isn't just a storage and access issue. it impacts your whole IT infrastructure - servers, storage, network, capacity, power and cooling, data protection, business intelligence back-up and recovery....

1. deal with it within your own infrastructure, in which case you need to get yourself ready for building the storage space, which translates almost instantly into sprawl. Also, good luck with auto-moving data between systems - which kind of defeats the object here.

2. move it to Cloud, in which case how comfortable are you with the objections and practicalities above?

As I pondered this dilemma, the answers struck me. The technical one is all to do with virtual machines using virtual pipes, breaking the data out at the switch level using virtualized I/O, the other one, which is far more practical - why wait for the problem to grow? I would suggest that any technical reasons thrown out there to NOT use the cloud solution are far outweighed by the data tsunami you're going to face. Conclusion? Start moving your data to the cloud NOW.


Discussion
Would you like to comment on this content? Log in or Register.