Compare Products

Hide

Clear All

VS

Time: October 21st, 2024

Does Business-Driven Network Technology Innovation Bring “Angels” or “Devils "?

With the vigorous development of Internet business, technologies such as big data, AI (artificial intelligence) and RDMA (Remote Direct Memory Access) have been widely used, bringing about the continuous growth of data centre traffic. At the same time, it requires the basic network to provide end-to-end low-latency lossless forwarding, which promotes the rapid upgrade of Ethernet switch chips.

The chip performance has been upgraded from traditional 10G Ethernet to the now popular 25G Ethernet. Some users have even started deploying HPC (High-Performance Computing) clusters based on 100G Ethernet.

Additionally, the chip provides more enhanced capabilities to support operations and maintenance. These include features such as a fully shared buffer (Shared Buffer), INT (In-band Network Telemetry), PFC (Priority-based Flow Control), ECN (Explicit Congestion Notification), MOD (Mirror-On-Drop), and TCB (Transient Capture Buffer).

When using RDMA technology, the switch requires a complex combination of features to ensure stable operation. Integration with the business has increased the difficulty of operation and maintenance.


Giant Sword of Operation and Maintenance

In the increasingly complex world of network equipment technology, having deep control over internal network equipment and achieving comprehensive visualization is essential for reliable business operation. In today's DevOps (Development and Operations) automated operation and maintenance, the selection of the northbound interface of the switch has become crucial.

Traditional methods such as CLI (Command-Line Interface) and SNMP (Simple Network Management Protocol) clearly cannot meet the needs of automated operation and maintenance in terms of performance, efficiency, and automation capabilities. Drawing on the practices of industry-leading Internet companies and a deeper understanding of gRPC (Google Remote Procedure Call, Google RPC), it can be anticipated that in the future, the operation and maintenance interface based on gRPC technology will be the most important means of automated operation and maintenance. Before we delve into gRPC, let's first analyze the specific bottlenecks encountered in the current operation and maintenance of data centre switches.


Bottlenecks Encountered In Switch Operation and Maintenance

When it comes to automating operation and maintenance tasks, switch requirements boil down to the following actions:
Get: Actively obtaining status and configuration information
The operation and maintenance platform retrieves key configuration information or software and hardware status from the switch device as needed, such as BGP configuration, security configuration, as well as status information like interface traffic, interface status, buffer queue length, packet loss, etc.
● Meet the needs of equipment room inspection, troubleshooting, etc.

Set: Actively sending configuration
● The operation and maintenance platform sends necessary configuration changes to the switch, such as shutting down a port, configuring the IP address, or setting the waterline threshold,
● Meet the needs of daily business changes.

Alarm: Actively reporting abnormal status
● Inside the switch, when certain trigger conditions are met, notification information is actively reported to the operation and maintenance platform, such as CPU utilization exceeding the safety threshold, queue waterline reaching the threshold, port Up/Down, etc.
● Meet the alarm requirements for abnormal conditions.

Push: Actively periodically reporting key status information
● The device proactively reports key status information periodically, such as interface traffic, queue waterline, interface error packets, etc.
● Meet the need for continuous monitoring of key indicators.

For the four daily operations mentioned above, whether using traditional CLI + Syslog, SNMP, or the more popular Netconf, OpenConfig, they can only fulfil some of the current requirements. However, they encounter limitations in performance, compatibility, scalability, standardization, etc. As a result, a combination of multiple operation and maintenance interfaces is necessary to meet the rapid and continuous integration of the automated operation and maintenance platform. These operation and maintenance interfaces are briefly analyzed as follows:

Get
Set
Alarm
Report
CLI + Syslog
Ability
Complete show command
Complete configuration command
Trigger condition generates alarm
N/A
Shortcoming
● Multiple manufacturers show command is inconsistent
● Show information is inconsistent, secondary adaptation processing of platform is complex
● Multiple manufacturers' CLI is incompatible
● Poor development flexibility, continuous standardization of CLI poses risks
● low reliability for remote operations and needs to adapt echo
● Poor flexibility, relying on the system OS to generate alarms for critical events.
● Fewer feature points to tweak
● CLI itself does not have periodic reporting capability
● Need to write scripts on the console and collect them regularly through CLI
SNMP
Ability
Basic and complete MIB, readable
A small number of features can be modified through MIB
Basic and complete Trap alarm
N/A
Shortcoming
● Manufacturer's Private MIB
● Inconsistent MIB coverage among different vendors
● Long polling interval and poor real-time performance
● Poor traversal performance
● Low time accuracy
● The MIB nodes that can be set by different manufacturers are inconsistent
● Limited coverage of functional items
● Manufacturer's Private MIB
● Manufacturer's Private MIB
● Inconsistent abilities among different vendors
● SNMP itself is not supported and requires scripting through the console to periodically retrieve critical information
● Long polling interval and poor real-time performance
Netconf
Ability
Obtain configuration and status based on YANG model
Distribute configuration based on YANG model
Notification alarm based on YANG model
N/A
Shortcoming
● Manufacturer Private YANG
● Manufacturer Private YANG
● Manufacturer Private YANG
● Netconf itself does not support it and requires scripting through the console to periodically obtain critical information
OpenConfig
Ability
Obtain configuration and status based on the standard YANG model
Distribute configuration based on the standard YANG model
Notification alarm based on standard YANG model
Set cycle through subscription
Shortcoming
● Imperfect, many features are not defined
● Imperfect, many features are not defined
● Imperfect, many features are not defined
● There are fewer features available for subscription
Table 1: Capability analysis of four operation and maintenance interfaces

Based on the analysis provided, the following is a summary:
O&M interface
PROS
CONS
CLI + Syslog
● The comprehensive Config and Show commands can complete all configuration and status query tasks
● The CLI of each manufacturer is not compatible, and the operation and maintenance platform needs to adapt commands, echo, or use software for secondary packaging, which requires a large amount of work
● The upgrade of switch OS may lead to changes in CLI and echo information, and the operation and maintenance platform needs to follow the changes
● The ability to report alarms and cycles is weak and inflexible
SNMP
● Mature technology and multiple third-party network management tools
● MIB nodes are basically consummate.
● Manufacturer’s Private YANG, and the platform needs to be adapted multiple times
● Polling mechanism, inefficient, unable to monitor a large number of nodes, limiting network size
● Time grains are too large to obtain real-time data
● Frequent Get may increase the load on the switch’s CPU
● The Set capability is weak and the cycle Push capability is missing
Netconf
● Unified and mature transmission framework
● The technology is widely applied
● Well-developed Get, Set and other mechanisms
● Manufacturer’s Private YANG, and the platform needs to be adapted multiple times
● The writing and readability of XML as a data description language are poor, resulting in low transmission efficiency
● Lack of periodic push capability
OpenConfig
● Define YANG model based on OpenConfiq working group and adapt it uniformly
● Flexible and standardized underlying transmission framework
● The YANG model is described as a tree like structure, which is easy to manage and expand
● The features covered by YANG model are not comprehensive enough, and the standardization progress cannot meet the requirements of architecture evolution
● The definitions of Notification and Periodic Reporting are rigid and incomplete
● Support and application are not yet widespread and mature enough
Table 2: Summary of the PROS and CONS of the four operation and maintenance interfaces

Based on the previous analysis, it is evident that the existing northbound interfaces have limitations and are not adaptable for the unified operation, maintenance, and continuous integration of multi-vendor networking in the future. Furthermore, these interfaces are not easily modifiable and lack control, which means that operation and maintenance personnel do not have the flexibility to redefine them. Therefore, it is important to consider what the ideal northbound operation and maintenance interface should look like.


Ideal Northbound Operation and Maintenance Interface In The Future

After careful consideration, it is clear that there is a need to redefine the northbound O&M interface to seamlessly support the continuous, simple, and unified integration of the O&M automation platform. The ideal northbound O&M interface in the future should possess the following features:

Manufacturer independence:
● The standardized model, with the operation and maintenance platform at its core, should not require differentiation of equipment from each manufacturer for continuous adaptation and change.

YANG model standardization:
● It should be based on a unified standard YANG model defined by its own operation and maintenance system, allowing for continuous iteration and evolution, and not be restricted by the OpenConfig organization or the manufacturer's private YANG model.

Comprehensive operation and maintenance capabilities:
● It should comprehensively support Get, Set, Alarm, and Push capabilities, allowing for the issuance and subscription of these four capabilities on a unified interface.

Single operation and maintenance interface:
● It should redefine a single operation and maintenance interface, enabling the automated operation and maintenance platform to achieve unified management of various manufacturers through a unique standard interface.

The future O&M northbound interface should have the following capabilities:

Structured northbound interface:
● The data encoding, capability model, remote call, data transmission, security, and other modules should be separated and decoupled through a layered protocol architecture, drawing on the Netconf and OpenConfig layered protocol architecture to ensure rapid iteration of standard interfaces.

Intuitive and efficient data description:
● The data model should be described using the JSON language, simplifying writing complexity and increasing readability, without affecting the serialization transmission of the underlying data.

Unified tree-like YANG model:
● Tree-like YANG modelling should be implemented for different functional modules such as BGP, OSPF, security, Interface, etc., integrating Get, Set, Alarm, and Push capabilities under different functional modules.

Efficient data transmission:
● Binary serialization and deserialization should be used for efficient data transmission in traditional text mode.
● Additionally, a single TCP connection should be reused to achieve multi-stream transmission and improve efficiency.

Remote call decoupling based on RPC:
● Remote calls should be made based on the interface implemented by the RPC framework to decouple the switch and the operation and maintenance platform transparently and independently.

Safe and reliable data transmission:
● Remote RPC calls should require a complete authentication mechanism.
● Data transmission be securely encrypted.

While the above description outlines the future northbound operation and maintenance interface, there is a real and rigid demand to carry out comprehensive and unified management of switch devices with the operation and maintenance platform as the core to uniformly meet the Get, Set, Alarm, and Push operations. Such an interface may exist in reality, and gRPC + Protocol Buffer may be a potential choice.


Design of Unified Operation and Maintenance Interface Based on gRPC Framework

The operation and maintenance model based on gRPC + Protocol Buffer follows these steps:



1. The controller subscribes/unsubscribes to real-time/periodic events.
2. The switch saves/deletes the subscribed server address, port number, and subscribed events.
3. Based on the subscribed events, the switch constructs the JSON format of the corresponding data, encapsulates the message using Protobuf, and sends the Proto Request message to the server through the gRPC protocol.
4. The server receives the Proto Request message, uses Protobuf to decapsulate the message, restores the data structure in JSON format, and performs business processing.
5. After the server processes the data, it uses Protobuf to encapsulate the response data and sends a Proto Reply message to the switch through the gRPC protocol.
6. The interaction machine receives the Proto Reply message, and the gRPC interaction ends.

In the unified operation and maintenance interface design of the framework, gRPC is a key transmission framework, but not the only one.

Data: This includes instructions, supporting Get, Set, Alarm, and Push operations.
Unified YANG model: Unified description of the data model based on JSON, a unified YANG tree model integrating network architecture and operation and maintenance requirements.
gRPC: A unified northbound interface that uses RPC methods to send or retrieve data, similar to calling local objects.
Protocol Buffer: Defines RPC interface services (.proto files) and completes data serialization and deserialization encapsulation, improving data transmission efficiency and reducing bandwidth requirements.
Netty + HTTP/2: Provides bidirectional stream multiplexing on a reliable network connection and simplifies network programming with Netty.

gRPC is a high-performance, open-source, and universal RPC framework based on the HTTP/2 protocol. The most important and challenging part of its implementation is the establishment of a unified YANG model. Although OpenConfig has defined a large number of standard YANG models and solved the problems of unification and compatibility, this standard working group approach cannot meet the needs of rapid iteration of current basic network operation and maintenance. Therefore, we call on leading Internet companies to take the lead in sorting out and forming a unified YANG model, and everyone will continue to supplement and improve it on this basis. From then on, the cost of multi-party docking of the operation and maintenance platform will be reduced, and the focus will be on the operation and maintenance capability requirements themselves.


Summary

Ruijie switches have implemented gRPC + Protocol Buffer to meet the operation and maintenance requirements of various features. This includes comprehensive management of switch buffers, real-time monitoring of ingress/egress port/queue buffers, periodic collection of indicators such as the number of times the port/queue buffer exceeds the threshold, and automatic triggering of alarms for problems such as insufficient packet loss due to ingress/egress port buffers and port buffer over limits. This setup meets the visualization and real-time performance requirements for operation and maintenance. However, it's still a work in progress to fully replace protocols like SNMP. It is believed that more unified management and control of operation and maintenance capabilities will be achieved in the future based on gRPC.



Related Blogs:
Exploration of Data Center Automated Operation and Maintenance Technology: Zero Configuration of Switches
Technology Feast | How to De-Stack Data Center Network Architecture
Technology Feast | A Brief Discussion on 100G Optical Modules in Data Centers

Research on the Application of Equal Cost Multi-Path (ECMP) Technology in Data Center Networks

Technology Feast | How to build a lossless network for RDMA
Technology Feast | Distributed VXLAN Implementation Solution Based on EVPN
Exploration of Data Center Automated Operation and Maintenance Technology: NETCONF
Technical Feast | A Brief Analysis of MMU Waterline Settings in RDMA Network
Technology Feast | Internet Data Center Network 25G Network Architecture Design

Ruijie Networks websites use cookies to deliver and improve the website experience.

See our cookie policy for further details on how we use cookies and how to change your cookie settings.

Cookie Manager

When you visit any website, the website will store or retrieve the information on your browser. This process is mostly in the form of cookies. Such information may involve your personal information, preferences or equipment, and is mainly used to enable the website to provide services in accordance with your expectations. Such information usually does not directly identify your personal information, but it can provide you with a more personalized network experience. We fully respect your privacy, so you can choose not to allow certain types of cookies. You only need to click on the names of different cookie categories to learn more and change the default settings. However, blocking certain types of cookies may affect your website experience and the services we can provide you.

  • Performance cookies

    Through this type of cookie, we can count website visits and traffic sources in order to evaluate and improve the performance of our website. This type of cookie can also help us understand the popularity of the page and the activity of visitors on the site. All information collected by such cookies will be aggregated to ensure the anonymity of the information. If you do not allow such cookies, we will have no way of knowing when you visited our website, and we will not be able to monitor website performance.

  • Essential cookies

    This type of cookie is necessary for the normal operation of the website and cannot be turned off in our system. Usually, they are only set for the actions you do, which are equivalent to service requests, such as setting your privacy preferences, logging in, or filling out forms. You can set your browser to block or remind you of such cookies, but certain functions of the website will not be available. Such cookies do not store any personally identifiable information.

Accept All

View Cookie Policy Details

Hubungi Kami

Hubungi Kami

How can we help you?

Hubungi Kami

Get an Order help

Hubungi Kami

Get a tech support