Every NetOps organization that we work with tells us the same thing.
In this blog, we’ll explore 5 questions that most companies cannot answer, even though they have best of breed solutions deployed in their network infrastructure.
What is not monitored?
The purpose of most Network Performance Monitors (NPM), like PRTG, Solarwinds, Nagios, and SevOne is to provide an alerting function. Basically the red, yellow, and green light dashboard that provides a plethora of operational data from your network devices. What happens when these tools are not monitoring a device? How do you know something has gone wrong or degrading with this device? That answer is, you don’t. If the device isn’t configured to respond to polling, the network operations team is blind to this device, limiting your visibility into the network.
Is the service redundant, end-to-end, and on all layers?
There are many tools on the market that bill themselves as Application Performance Monitors. Products like Dynatrace, AppDynamics, and New Relic. They do a really good job at monitoring the application from a server and micro-services perspective, but they do not understand the network that sits in between.
APM and network operations teams are always at odds, pointing the finger at each other, resulting in tremendous delays when it comes to turning up new applications, and resolving outages and depredations with existing ones. It’s not just the APM teams that are at a disadvantage here however.
The network team is typically unaware of the applications running on the network, or can’t keep up with the CI/CD teams agile operations. They need a tool that can allow the application and CI/CD teams to check the network functionality in a self-service manner, reducing the friction between the teams and creating a method to define the network intent, and democratize this critical information.
Is the logging level correctly configured on all devices?
The rapid adoption of tools like Splunk and IBM Radar has helped take huge strides in log monitoring by applying AI and machine learning to parse through and corollate the massive data generated by devices. The problem is that these tools only function when devices are configured to send these logs correctly, and with the right level of logging. In most of our engagements with companies, we determine that 20-30% of the network isn’t properly reporting to their SIEM toolset, again, leaving the customer in the dark and lowering the value and ROI of the products purchased.
What caused the problem?
There are a few tools in the ITSM space that focus primarily on service management. They are responsible for tracking incidents and changes through the network, in an attempt to be a single documentation source for everything that happens in the network. Tools like ServiceNow, BMC Remedy, and Jira.
While they are another step in the right direction, these tools are decoupled from the actual work that is performed. Most changes happen through some form of change management tool or network automation, while troubleshooting happens primarily in CLI or web-based GUIs. The onus is on the operator to do a good job at recording the steps taken, and the actual resolution of the problem.
Based on sheer demand for an operators time and SLA or MTTR metric, the record of what was done and how the problem was resolved, is often an afterthought, leaving the rest of the IT organization unable to access this vital information for when it happens again.
How does the change affect the intent of my network?
As discussed in the previous question, most changes are now pushed through change management products or network automation like Ansible, Puppet, and Chef.
The problem is, these tools are not network aware. This means that the process to push these changes, typically only tests that the change was pushed successfully. They cannot check if the desired outcome was achieved and more importantly, if the change impacted another intent for the network.
It is no wonder why it is reported that somewhere between 50 and 80% of all network outages are caused by changes.
DIRE NetOps can help.
If you find yourself struggling to answer these questions, you are not alone. Let DIRE NetOps, our parent company and professional services company, help you find the right tools to fill the gap, and transform your NetOps organization into a world class, well oiled machine.