By Calvin Zito
My co-worker Brad Parks had a great blog post on my Around the Storage Block blog a couple of weeks ago that several people encouraged me to share here. Brad is the ultimate DIY (do it yourself) guy and is in the midst of a remodel project that prompted him to draw some analogies to big data. Here's Brad's blog post:
I’m in the midst of the ongoing remodel of my 1910 brick bungalow. While I love the house and all the character that comes with a historical home, I’m constantly challenged with sloping walls, decaying mortar and strange design choices… like the roof that is on the INSIDE of my kitchen. They’ve led me to the conclusion that architecture matters… a lot.
Every day we generate information. On our daily commutes to work, we generate information for the highway authorities that are monitoring traffic patterns. During our morning stop at the café, we generate data when we order a pastry with our cup of coffee. And while we wait for a meeting to start, we generate data when we share weekend photos on Facebook. To some these are random acts of social interaction but to a well armed few they are the key to new business models, product lines and competitive advantage.
Highway authorities could use big data analytics to help them configure and build new roadways to prevent traffic problem areas. Retailers can predict the success of new products by analyzing buying preferences.
The evolution of big data – and the challenges that come with it
The challenge isn't handling just the increase in data volume, but also dealing with the diversity and unpredictability of it. This brings us back to the subject of architecture… in this case for both for the software used to compile and analyze the information as well as the physical storage infrastructure to support it.
“Big data” is a term that has been used in many different ways recently. Traditionally reserved for analytics applications, it now extends into other areas of extreme content growth. It can come in the form of "human information” – meaning person-to-person communication like emails, videos, phone calls, voicemails and social media updates, as well as extreme data sets filled with information generated through sensors, tags, telecommunications and GPS signals.
Analytic software has moved well beyond the limits of traditional RDBMS designs and row-based databases. Columnar databases like HP Vertica and meaning-based computing engines such as HP Autonomy IDOL platform are new architectural approaches designed specifically for these new data types and massive data sets.
The world has changed and storage architecture needs to catch up
Big data analytics is one of the four major enterprise technology trends HP chief technology officers are talking about today. These discussions include our approach to information optimization to help organizations power, protect, know, integrate, share and create all of the enterprise-relevant information in a consistent, uniform way.
Milan Shetti, VP and CTO of HP Storage, recognizes the need for creating business analytics for meaningful business decisions as a definite trend driving big data. He gives healthcare as an example of how business analytics could benefit the industry: "Imagine if the information from a bio-tech organization doing genomics profiling could be integrated with data from pharmaceutical companies, the FDA and patient data from physicians. All that information could be integrated and correlated with drugs information and the applicable information could be given to the customer. Doctors could analyze groundbreaking research to speed the time it takes to get FDA-approved drugs to market quicker. None of this analysis is integrated today.”
To do information optimization right, you need a flexible, scale-out data storage infrastructure. Legacy storage architectures simply can't support our new world of human information and extreme data.
The monolithic storage architectures that are deployed in many organizations today were designed 20-plus years ago for predictable workloads and predictable growth. However, now we are bumping into the limitations of monolithic architectures. The old world is colliding with the new human information world, where storage needs are highly unpredictable.
Why scale out storage? Because change happens…
If the last five years of massive data growth and economic turbulence have taught us anything, it is that you have got to be able to deal with change in a seamless and non-disruptive way. Scale-out storage architectures provide the scalability and flexibility required to do information optimization right.
Scale-out storage differs from scale-up architectures in traditional storage, which primarily scales by adding many individual disk drives to a single pair of storage controllers. When you need more horsepower or headroom you face the painful and often expensive task of stair-stepping to a bigger pair of storage controllers. Mark Peters, storage analyst with Enterprise Strategy Group, writes that scale-out storage can help to provide timely IT provisioning, improve system availability and provide better resource utilization.
When combined with federated storage technologies, scale-out storage enables the creation of virtually limitless, persistent resource pools. When dealing with a rapidly changing universe of petabyte scale data sets, this is an absolute necessity.
Scale-out allows an incremental and independent scale of capacity, performance and resiliency while federating groups of scale-out clusters provides for seamless workload mobility. The result is an increase of computer power, bandwidth and storage capacity that can dramatically exceed that of a single traditional storage array or high performance computer.
The impact of unpredictability and big data on performance
Storage in our new world also has to be able to handle lots of variable requests and mixed workloads. Some storage solutions work great for large sequential operations but fall short when it comes to random I/O requests or if the information contains a lot of metadata. Enterprises with a lot of unstructured data (and I don't know of any organization that isn't generating a lot of unstructured data) will need to manage many different types of workloads at the same time with predictable levels of performance. Something we describe as dynamic multi-tenancy.
Converged storage, big data and trends in 2012
These requirements around change management and unpredictability are one of the reasons that HP is focused on Converged Storage as part of our approach to Converged Infrastructure. At the heart of our Converged Storage strategy is a design center that includes federated scale-out solutions deployed on industry leading platforms. Yes, architecture matters – and it is important that big data analytic applications are running on modern storage architectures designed from the ground-up for today’s requirements.
That's why the marriage of HP IBRIX, which handles unstructured data, and HP 3PAR, which handles structured data, works very well. These complementary converged storage technologies power some of the largest cloud data centers and unstructured content depots in the world and are a critical infrastructure component to support big data applications.
HP CTOs play a key role in creating technologies and strategies that address your business needs now and in the future. Each day, they analyze business and social trends and create technology solutions that help address business needs.
It’s something I’ll be pondering as I continue my great home remodel project. Why not let me know your thoughts on big data, converged storage and other 2012 trend spotting? We can continue the conversation here. Or follow me on Twitter: @HPBradParks