Decades ago, we handled our most valued data in a very manual and individual manner. The explosion of data sources has made most of us hoarders, without a plan to leverage the value. Here are 5 steps to address big data.
I have been brokering data solutions for over 40 years as a programmer, solution architect and enterprise architect. Every once in awhile I pull out an old punch card deck and green bar computer printer paper from the archives and mull over how we have evolved from the days of fostering and venerating every literal bit of information. In the day, I recall spending hours to write, keypunch, sort, run, correct and rearrange the cards for the purpose of generating some significant data. My head was in the data in those days; typically looking for the one mistyped character in hundreds or thousands of cards. That was the data strategy of the day.
Today, we treat our data like “hoarders”; we collect everything and sort it out later. IDC predicts that big data will join mobile and cloud as the next “must have” competency, as the volume of digital content grows to 2.7 zettabytes (ZB) in 2012, up by 48 percent from 2011, and rocketing toward 8 ZB by 2015 (1 ZB =1 billion terabytes). The data keeps building, and we still haven’t fully realized the value of our hoard, but we don’t want to throw it out either.
Where does this data hoard come from? It is all the sensors we have in place for IT, RFID manufacturing, retail, scans of bar codes, pictures, logs, blogs, web sites, communities, mobility, and the general digitalization of our world. We actually have come full circle to the days of the punch cards – once again looking for needles in the haystack; attempting to transform a pile of dung into gold. Thankfully, it is not a manual process anymore and the process doesn’t have to stink. Here are 5 steps to addressing the big data hoard.
1. Take a Data Inventory - Most enterprise architectures have already been driven to consolidating to an Enterprise Data Warehouse for common leverage and de-duplication. In the realm of big data, we now also consider the logs, blogs, Facebook, and other nontraditional and unstructured data sources stored in files and blobs. Some of this data is sensitive and must be treated as such. We need to understand what is available and what is possible.
2. Create a Big Data Strategy - Understand your business vision and goals. Create a strategy for the data hoard as a part of your data architecture. This should be a part of your annual or periodic Enterprise Architecture review, or perhaps a “spring cleaning” project to address the hoarding. What does your data contain that yields value? This is not just an IT decision about what can be offered, but a discussion with the business – marketing, sales, manufacturing, order fulfillment, and any organization that can potentially benefit from the hoard. This is an important dialogue between the business and IT, resulting in an executable strategy. IT brings data to the table, business brings its goals and needs. This strategy may be related to demographics, ad targeting, churn, risk, predictive analytics or other imaginative uses.
3. Develop or revise the data architecture roadmap – This would typically integrate Hadoop into the architecture. The traditional relational-based Enterprise Data Warehouse or data sources would typically become just one of the sources for the Hadoop Distributed File System (HDFS). HDFS could also leverage log files and other flat files and other SQL databases. The result of the Hadoop integration is to consolidate the data hoard into a useful subset for analysis or other use. HP Vertica is a good example of a Data Warehouse connected to an analytic engine which can leverage Hadoop. HP Autonomy is a premier solution to draw meaning from the “human information” collected, and draw business value from it.
Cleaning the hoard after you have harvested the benefits is part of the roadmap as well.
4. Modify the remaining EA to support – Architecture is an iterative process, and the applications, data, infrastructure, and perhaps governance needs to be reviewed to support a big data solution. Hadoop is generally a key enabler, for instance, and that requires a deployment of servers and storage. How does the web and mobility play into the solution? There is a rapidly developing ecosystem of Hadoop-related solutions that may serve your strategy and purposes
5. Deploy your solution - There is simplicity to deploying this, actually. By their nature, Hadoop projects tend to be read-much and write-little types of efforts that add on to your architecture. The results are potentially a part of your Business Intelligence system, or can perhaps be used in other imaginative ways.
Read how HP Big Data solutions can help you manage, understand, and act on 100 percent of your data.
Learn how the HP Roadmap Service for Hadoop can help you successfully deploy Hadoop within your organization.