Everyone who has run an operations group inside a sizeable enterprise knows the value of truly good insight. I worked as an Operations Director for almost 2 years, but it’s been nearly a decade. I didn’t have access to some of the amazing tools and analytics that today’s OPS managers have, but this new offering from the HP mother ship makes me wish I could have had OpsAnalytics way back “in my day.”
Look, we all strive to have more insight into what is going on in the data center, and virtualization makes this even harder as Operations Managers attempt to link physical servers with performance issues in services in the virtual realm. When a customer reports a poorly performing, or worse – unavailable service, your operations management team typically turns to dashboards and monitoring widgets across your disparate hybrid environment to try and figure out the problem. The thing is, you’re likely stuck manually thumbing through spreadsheets or server documents, or maybe if you’re lucky some dashboard somewhere, to figure out which service or application is on which virtual server, and then mapping that to the physical environment – this all takes time. This is time you don’t have as your customers grow impatient, and you start to lose confidence in your abilities and money.
If we could only have a magic crystal ball that could tell us why a specific service or application is really having issues… and wouldn’t it be awesome if that crystal ball could give us a heads-up before bad things started happening based on past information correlated? We’re actually a lot closer to this being a reality than you may think.
Odds are you’re already an HP Operations Manager (part of the BSM products suite) customer hopefully using OMi. It’s also likely that you’re an ArcSight customer or at least you’re using the (trial) free edition of Logger available today… if you’re not, go get it, you won’t regret it. What you’re probably not doing is combining all of the strengths of the products together with the Service Health Analyzer (SHA). This powerful combination of products gets really, really close to giving you that crystal ball… let me explain.
As you’re monitoring your environmental run data from the modern hybrid data center you see things like CPU utilization, memory utilization, disk and network telemetry and so much more telling you that you’re having a problem right now. Then you add on top of that the ability to tie services/applications to physical or virtual components and we’re able to figure out that you’re having a service health issue which can be remediated by solving hardware-level problems … thus diagnosing and helping you get through troubleshooting quicker. Add to this the ability to collect, store and analyze logging information and you’ve got the closest thing you’ll get to a crystal ball today.
When incidents happen they are categorized and data is gathered from operational telemetry as well as logs of all the various components in and around the incident so that the next time those conditions are starting to show up again the system can predict there will be a strong likelihood of the same kind of incident – voila! Where was this stuff when I was monitoring CPU and memory on servers with different dashboards for Linux, UNIX and Windows 10 years ago? When your environment is virtual, or at least part virtual, you can start to benefit from things like the Virtualization Performance Viewer too … this is just too cool to not talk about.
The platform, as a suite of components working in concert, delivers on the following…
- Remediate known problems before they occur with predictive analytics that forecast problems and prioritize issues based on business impact
- Proactively solve unanticipated issues by collecting, storing and analyzing IT operational data to automatically correlate service abnormalities with the problem source
- Resolve incidents faster with knowledge based on historical analysis of prior similar events through search capabilities across logs and events
You could just keep doing things the way you’re doing them today which is part manual spreadsheets, part multi-dashboard Franken-monitoring-solution, and part hoping that things don’t fail in ways you don’t understand at times that you can’t accept failures. Or, you could give OpsAnalytics a look. You know what they say, “hope is not a strategy” … so if you believe that I strongly recommend you check out the new platform HP has just launched, because as a decade-long security guy, but former Operations Manager too, I’m bursting with excitement at the potential this platform has and can’t wait to start sharing the success stories.