Network monitoring and visibility solutions have always been the cornerstone of a good network architecture. In modern networks, these requirements have been amplified by the need for simple and scalable real-time solutions to meet operational and business SLAs. Legacy approaches such as SNMP, CLI screen-scraping, query-based APIs lack the scale, analytics and automation attributes to meet today’s requirements.
Here is a recent talk from Google at NANOG’78 that reviews how networking has evolved over past several decades and how Google has evolved their monitoring and visibility frameworks with that. Amazon, Facebook, Microsoft and several others who operate networks at scale have all evolved their own solutions as well in the same vein. All networks big and small can benefit from a modern monitoring, visibility and analytics framework. Here are some of the key attributes I would like to highlight:
Have you ever wondered how most applications today are just available all the time? And how services are seamless?
Today, compute infrastructure is built as a fungible pool of resources and workloads that get seamlessly migrated from one physical server or cluster to another, regularly for various reasons – planned maintenance, load distribution, power or thermal management. This migration is automated and happens in matter of sub-seconds while all of this is abstracted to the end user and in most cases no disruption is noticed by the end user. The network of today plays an equal role in making this happen. EVPN has become the de-facto standard for multi-tenant overlay networks that need distributed Layer 2 and Layer 3 domains. This, however, adds an extra element of complexity in realizing unified visibility from the network.
In this blog, I will highlight how ArcOS and ArcIQ can help provide a simple and elegant solution for enhanced visibility in EVPN VxLAN networks.
ArcOS has been designed with the foundational capability for streaming telemetry data across various system components, such as BGP/ISIS/RIB/FIB/ACLs/EVPN etc. This data can be streamed to any external data collection engine, modeled as JSON schemas over a Kafka bus or gNMI. But using ArcIQ, our deep visibility and analytics platform to process the data from the network, to monitor and report the health of your network and correlate various events in the network spanning different timestamps, you can gain better real-time, cognitive insights into the network.
Let’s see how a typical datacenter network running EVPN VxLAN on ArcOS can gather deep visibility into workload mobility by utilizing streaming based telemetry and cognitive capabilities of ArcIQ.
When a workload has moved locally within the same switch or from remote-to-local across an EVPN instance, ArcOS devices instantly generate telemetry data containing metadata that lets the operator know several parameters such as type of move, time of move, workload mac-address, source VTEP address, source interface, destination VTEP address, destination interface, VNI, and seq number.
ArcIQ collects the data from several such micro-events and produces a single meaningful event and distinctly displays if it’s a mac-move (mobility in L2 only domain) or mac-ip move (mobility in L3 domain). And data from multiple nodes are presented in a common place to help you correlate events or take further actions.
In the context of datacenter networks running EVPN, MAC address duplication is an anomaly. Without proper visibility, such anomalies go undetected and EVPN would either blacklist or pin down the workload to last known location. And events like MAC address duplications are an indication of a potential security breach or a layer2 loop, which the operator needs to know immediately. When ArcOS detects a Mac duplication either locally or in an extended L2 domain, it generates a real-time update and notifies the visibility engine about the duplication. ArcIQ renders this data and gives meaningful info about the VTEP that generated this alert along with metadata about VNI, VTEP address, mac-mobility sequence number and interface.
ArcOS leverages EVPN machinery for workload mobility across stretched L2 domains and generates the telemetry update by gleaning into the EVPN state before and after the workload migration. Say for a remote to local migration, ArcOS already knows the earlier location of the workload from the EVPN Type2 MAC-IP route. After the migration to local, it’s going to take the current location in terms of the interface the workload is behind and the pre-migration information such as VTEP address, VNI, VLAN etc. and pack it into a telemetry update to the visibility engine.
Check out the demo video below that showcases the detection of MAC moves and MAC address duplication in EVPN networks using ArcOS+ArcIQ.
Above, we saw a few examples of how ArcOS+ArcIQ can deliver increased visibility specific to EVPN networks. However, as we noted before, streaming telemetry is an integral part of ArcOS. Similarly, the ability to store, process and analyze data at scale, to produce meaningful information is integral to ArcIQ. So, the possibilities are endless for the ArcOS+ArcIQ solution to deliver analytics and insights important to your deployment, your environment and your ecosystem. While I leave you with that thought, here are some other use cases where we have leveraged ArcOS + ArcIQ to provide security insights.
Visit us at https://www.arrcus.com to learn more about our EVPN products and solutions and our Visibility and Analytics solutions to realize your modern monitoring and visibility framework.