Orchestration for SD-WAN Environments

There is a lot of discussion and ideas around what we mean by “orchestration.” In essence, and in the context of software-defined environments, the consensus is it refers to the centralization of network element configuration and management. It also appears to suggest this centralized “brain” becomes the controller of the network for rapid, seamless configuration changes. These changes may be driven by factors other than network configuration, for example, or application demand for network resources.

Centralized configuration is nothing new. Nearly all vendors have some kind of management paradigm that allows network elements to be provisioned from a single location. However, orchestration is much more than that. We need to evolve beyond configuring our WAN in this way. Orchestration should be about considering the network as a single thing that is derived from the user or business need. As the network grows or changes, it needs to self-adjust to compensate for this change, without the need for manual reconfiguration.

“Orchestration should be about considering the network as a single thing derived from the user or business need.”

We hear a lot of talk about the separation of the control plane from the forwarding plane. In the case of the SD-WAN, I think this is also a little skewed. The forwarding plane needs be able to act intelligently and be able to react to network and application changes. If we rely solely on the central network brain, do we lose our ability to react quickly? Physics is still physics, after all. It takes time to measure and then react. I think a better statement (and a more likely reality) is that we separate our configuration/management plane from our forwarding plane, i.e., we don’t use the box level CLI/UI any more.

WAN networks are complicated, and that is unlikely to change. So the idea of zero-touch configuration is somewhat of a false idol. What I think they really mean is node discovery, either a phone home mechanism to the controller or some kind of simple provisioning process. But you still have to build a config. Sure, you can automatically derive some things (such as bandwidth discovery), but largely we have to create an element/branch level configuration somewhere. Simplifying this becomes the first challenge and is the bane of every vendor. This is further complicated in a SD-WAN solution, because there are so many factors to consider, including network, application, location and user, not to mention the variations in network quality, which is the actual thing we are trying to fix.

The bottom line is that no one solution can solve all our problems…yet. As we look at standards, I think a common definition language for networks becomes key. A standard way of communicating these elements also is important. REST is a likely candidate for that.

Understanding the WAN network elements
In the very early days of Talari, trying to configure devices was a barrier to success. There were too many options that needed to be tweaked to achieve the desired network configuration. It just didn’t work. Our engineering team tossed away the first attempt and started again, because it was going to be too hard to fix. What the team came up with was a configuration compiler that would take type definitions and build a network-wide master configuration file that would then create the individual node config files. This was a major step because it allowed us to focus on defining the key elements and the relationship between them rather than trying to build this at the network level (which was nearly impossible).

Quite frankly, that first compiler was a pain to use. Everything had to be syntactically perfect to even work, and it still had to be done by hand. It took me three days to build the first config, but that config worked.

SD-WAN was not easy to develop, but it's easy to deploy and brings many advantages.
SD-WAN was not easy to develop, but it’s easy to deploy and brings many advantages.

Fast forward eight years, and we now have a nicely polished UI that is easy to follow and does a lot of inference based on what is defined. There is still a very deep level of tunability, but the defaults and automation work in 99 percent of the deployments. Many years of real-world networks have enabled us to keep making it better, but there is always room for improvement. Learning or deriving network topology from actual traffic and physical state I think is the next area of interest for us.

If we think of a WAN network in simple terms we can generally assume some things:

  • There are multiple branches with users defined as IP endpoints or subnets.
  • There a fewer data centers with applications.
  • These sites all have WAN links.
  • These WAN links have bandwidth (in and out).
  • Not all WAN links can see all other WAN links.
  • There is a relationship between the branches and data centers which we will call traffic (application packets to and from).
  • Not all traffic should be treated equally.

So we define some of these elements and we set reasonable defaults. Now the key here is self-adjustment. We don’t want to have to go back and manually edit existing element configs when we add or change something. In our world the controller (what we call Talari Aware) recompiles the configuration based on the changes made. So adding a site or WAN link and associating it with what it needs to connect to then automatically adjusts the other existing sites and WAN links automatically to compensate for the new element. Also, being able to clone an existing site configuration is useful and allows for repetition to be reduced.

The final piece of this is being able to push the change seamlessly to the network. Since our system is very tightly controlled, we have to push updates to the affected nodes when a new branch/site is introduced. In prior years this could affect service, but today we can seamlessly update the configurations on all nodes without impacting the forwarding plane.

In summary, a good SD-WAN controller must be able to:

  • Automatically adjust the existing configurations across the network as changes are made.
  • Have deep error checking to prevent misconfiguration.
  • Allow for change to be made seamlessly without causing impact to the forwarding plane.
  • Not require individual node configuration (CLI/UI).
  • Allow default sets (profiles) and cloning of existing nodes/elements where possible.

Enjoy this video on Defining Software Defined by Talari’s CTO and Co-Founder, John Dickey.

Categories: Software Defined WAN (SD-WAN), Network Reliability