Internet outages are becoming more common. Can we still predict them?
The Internet has become the new backbone of business operations. Delivery services, banking, VPN connections from anywhere and anytime – it is all powered by the Internet as the new delivery mechanism for customer and employee applications and services.
Ensuring access to an always-on experience of those digital services is essential for any business. While predictive analytics and AI-driven intelligence allow us to build forecasting models that help optimize performance and minimize downtime, outages on the internet can permute in infinite ways. But how do you try to identify, let alone predict and mitigate, these outages, given that internet outages occur on external networks and within third-party providers that are outside your own IT perimeter?
Principal Solutions Analyst at Cisco ThousandEyes.
No longer a finite number of events
A failure is presented to the end user in a standard set of ways, including slower load times or a complete inability to access an application or service. Often there is a commonality in the underlying pattern – or chain of events – that led to that failure.
By itself, each pattern is detectable and observable. Most IT teams perform post-incident analysis to map out the pattern or series of events that led to an outage. This helps to understand the chain of events in detail so that if the same pattern were to repeat in the future, it can be detected and intervention can be made before it ends in a disruption that impacts users.
The challenge facing operations teams today is that things are no longer simple and failures are no longer based on a limited number of isolated events.
The layered and multi-interdependent failure
Networks and applications have become more complex, and this has affected the characteristics of failures. In particular, the underlying patterns of system behavior that cause failures are no longer as repetitively predictable as they once were. Failure causes are now much more complex and difficult to diagnose. For example, a system or application no longer follows a linear client-network-server architecture; instead, it operates as a “mesh” of connectivity links, IT infrastructure, and software components. The challenge for ops teams is that a mesh architecture dramatically increases the number of interconnected components and therefore the permutations of conditions that can cause a failure. Compared to a more linear architecture, the connections between components in the mesh and the number of permutations or sequences that can form a failure pattern are both exponentially higher.
In addition, the number of components in the mesh is also constantly changing. As more functionality is added to an application, more third-party components or services are incorporated into the end-to-end delivery chain of the application – and into the mesh that supports it. As the complexity of the application grows, so does the range of potential causes that could bring down part or all of the application. And it’s not just the direct dependencies that are a problem; third-party infrastructure services and components have their own interdependencies, with systems and services often several steps out of sight.
Is an unpredictable pattern even a pattern?
These failure patterns do not manifest in a predictable manner.
To have the best chance of accurate pattern recognition in this scenario, organizations need a reliable way to “read between the lines” – to understand the complex interplay of observed events and patterns and how they contextually relate to the performance of their specific application or infrastructure.
That level of contextual insight across all domains, even those outside the enterprise’s view and control, requires a new approach to the way we think about detecting and mitigating failures.
Managing such a global network, which spans networks and domains beyond the enterprise’s control, requires a new approach to the volume of data and contextual insights that IT leaders must now care about.
When data-driven insights go beyond the enterprise level
When it comes to seeing, but also predicting, outages, it’s access to high-fidelity data across all environments that matter – including cloud and web – that will ultimately enable us to identify and navigate this new world of patterns upon patterns upon patterns, surfacing where a performance issue exists, why, and whether it matters. Visibility across the end-to-end service delivery chain to see, correlate, and triangulate all the patterns that matter to ensure always-on digital experiences.
So while outage patterns can change endlessly, new technologies are also accelerating faster than human scale. If done right, new technologies can allow us to see outages inside and outside our perimeter, wherever they occur, and generate a new level of intelligence to generate the automated insight needed to predict all the different patterns, and the recommended action to avoid them.
We have highlighted the best cloud backup for you.
This article was produced as part of Ny BreakingPro’s Expert Insights channel, where we showcase the best and brightest minds in the technology sector today. The views expressed here are those of the author and do not necessarily represent those of Ny BreakingPro or Future plc. If you’re interested in contributing, you can read more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro