IT management teams in charge of monitoring the IT infrastructure (network, servers, applications, etc.) mostly have little insight into what it is that they monitoring. (obvious or shocking?)
The tools we use clearly indicate this fact. Monitoring tools all have some sort of “discovery” functionality to figure out, what is out there to monitor. More often than not, when we set discovery loose on the network using (SNMP, ICMP, etc.), and it finds out devices and network connections customers did not know that existed.
Server/application monitoring tools start their cycle by scanning ports to see which ones would respond, or by sniffing the traffic to figure out which servers are out there, which applications may be running on which servers, etc.
The process would not be much different if you were attacking the infrastructure to find a way in. (How many of you triggered security alerts when performing discovery?) We’re outsiders. In enterprise environment, we often don’t even know the owners/developers of the applications.
In monitoring field, this has been the norm for so long that it no longer bother us. It should. Of course, monitoring teams & tools don’t do this for fun. It has to be done because in most cases, there is no truth teller; no place to get this kind of information. It is not uncommon for the monitoring tools to feed data they discover to inventory tools etc.
There are efforts like CMDB projects that attempt to create a repository that provides this information to all management tools but these projects often run into organizational as well as technical obstacles, and things are getting harder by the day with the dynamism introduced by virtualization and the cloud technologies.
What if we didn’t have to do all this crap to know what’s what? What if monitoring tools could be told which applications run on which server, where that server is in the network, etc. ?
There is indeed a better way, at least for some use cases. Proliferation of infrastructure automation tools (aka configuration management tools) such as Chef (and the management APIs exposed by VMWare, etc.) have the potential to change not only for how we deploy and maintain servers and applications but also how we monitor them.
Most obvious impact is that using these tools mean monitoring tools can have a reliable source to learn about the infrastructure that should be monitored. What the role of the servers are, how they are configured, which application components run on which server, what the change history is, etc. This is a huge step forward.
When you know how things should be, it’s much easier to detect the exceptions. A significant portion of the problems happen due to changes somewhere in the infrastructure. Ability to automate changes, see the change history and roll back when needed is an invaluable. And being able to correlate the configuration changes with the monitoring data can significantly reduce troubleshooting time and hence improve availability.
Another impact is that a safe framework that enables operations folks to take actions to troubleshoot and resolve problems (combined with run book automation, workflow, wiki, etc.) may finally mean that level 1/2 support folks can do more than record and route without giving them full access to the systems (which is not feasible), reducing number of problems escalated to higher levels and increasing overall productivity.
I should state just for the record that I don’t mean that infrastructure automation tools like Chef introduce brand new technology. Opsware (now HP), BladeLogic (now BMC), ConfigureSoft (now VMWare) for server configuration management, and TrueControl (now HP via Opsware), Voyence (now EMC), AlterPoint for network configuration management have been around for some time. But confluence of factors such as success of the (Apache licensed) open source model of Chef, and increasing acceptance of cloud economics, and patterns such as availability of open APIs move Chef into the center stage. It does not take great wisdom to infer that price point such as $50/month for 20 devices will make Chef very hard to ignore. Price is indeed a feature.
Looking forward to see how infrastructure tools like Chef will evolve as they move further into the enterprise world. Monitoring folks need to pay attention.