OpenStack Networking, Explanation for Humans

Long time ago I published an OpenStack Networking principles, you can find it here:

Based on the feedback I got, it's too complex and hard to digest, basically - not written in a language that humans can understand. This motivated me to try and explain it in a simpler way so that anyone, even Network engineers as myself, could get it, ergo the name of the post.

OpenStack is an open source platform that is basically composed from different Projects. Networking Project is called Neutron. To fully understand how this all comes together, I will cover the following concepts:
  • Linux Networking
  • OVS (Open Virtual Switch) Networking
  • Neutron
  • Why OpenStack requires SDN.


Linux Networking
In virtualization network devices, such as Switches and NICs, are virtualized. virtual Network Interface Card (vNIC) is a NIC equivalent for a Virtual Machine (VM). Hypervisor can create one or more vNICs for each VM. Each vNIC is identical to a physical NIC (VM doesn’t "know” that its not a physical server).

Switch also can be virtualized as a virtual switch. A virtual switch works in the same way as a physical switch, it populates the CAM table that maps different Ports to MAC addresses. Each vNIC is connected to the vSwitch port, and  these vSwitch access external physical network through the physical NIC of Physical Server.

Before we get into how this all comes together, we need to clarify 3 concepts:
  • Linux Bridge is a virtual Switch used with KVM/QEMU hypervisor. Remember this, Bridge = L2 Switch, as simple as that.
  • TAP and TUN are virtual network devices based on Linux kernel implementation. TUN  works with IP packets, TAP operates with layer 2 packets like Ethernet frames.
  • VETH (Virtual Ethernet pair) is created to act as virtual wiring. To connect 2 TAPs you would need a Virtual Wire, or VETH. Essentially what you are creating is a virtual equivalent of a patch cable. What goes in one end comes out the other. It can be used to connect two TAPs that belong to two VMs from different Namespaces, or to connect a Container or a VM to OVS. When VETH connects 2 TAPs, everything that goes in on one TAP goes out on another TAP.

So, why the hell are all these concepts needed, TAP, VETH, Bridge…? These are just Linux concepts that are used to construct the Virtual Switch and give connectivity between VMs, and between VM and the outside world. Here is how it all works:
  • When you create a Linux Bridge, you can assign TAPs to it. In order to connect the VM to this Bridge, you need to then associate the VM vNIC to one of the TAPs.
  • vNIC is associated to the TAP programmatically, in software (When Linux bridge sends Ethernet frames to a TAP interface, it actually is sending the bytes to a file descriptor. Emulators like QEMU, read the bytes from this file descriptor and pass it onto the ‘guest operating system’ inside the VM, via the virtual network port on the VM. Tap interfaces are listed as part of the ifconfig Linux command, if you want to make sure everything is where it should be.)


OVS
OVS is a multilayer virtual switch, designed to enable massive network automation through programmatic extension. Linux Bridge can also be used as a Virtual Switch in a Linux environment, the difference is that the Open vSwitch is targeted at multi-server virtualization deployments where automation is used.

Open vSwitch bridge is also used for L2 Switching, exactly like the Linux Bridge, with a pretty important difference when it comes to Automation: it can operate in two modes: Normal and Flow mode.
  1. OVS in a “normal” mode, where it acts as a normal switch, learning and populating CAM table using ARP.
  2. OVS in a “flow” mode is why we use OVS and not the Linux Bridge. It lets you “program the flows”, using OpenFlow, OpFlex (whatever instructions come from the SDN controller), or manually (calling ovs-ofctl add-flow). Whatever flows are installed are used and no other behavior is implied. Regardless how the flow is configured, it has MATCH and ACTION part.  The match part of a flow defines what fields of a frame/packet/segment must match in order to hit the flow. You can match on most fields in the layer 2 frame, layer 3 packet or layer 4 segment. So, for example, you could match on a specific destination MAC and IP address pair, or a specific destination TCP port. The action part of a flow defines what is actually done on a message that matched against the flow. You can forward the message out a specific port, drop it, change most parts of any header, build new flows on the fly (For example to implement a form of learning), or resubmit the message to another table. Each flow is written to a specific table, and is given a specific priority. Messages enter the flow table directly into table 0. From there, each message is processed by table 0’s flows from highest to lowest priority. If the message does not match any of the flows in table 0 it is implicitly dropped (Unless an SDN controller is defined – In which case a message is sent to the controller asking what to do with the received packet).

Additional differences between the Linux Bridge and the OVS are represented in the Table below:


Open vSwitch can operate both as a soft switch running within the hypervisor, and as the control stack for switching silicon (physical switch).


OpenStack Networking and Neutron

One of the mail features that OpenStack brings to the table is Multi Tenancy. Therefore, the entire platform needs to support a Multi Tenant architecture, including Networking. This means that different Tenants should be allowed to use the overlapping IP Spaces and Overlapping IP Addresses should be allowed in different Tenants. This is enabled using the following technologies:
  • Network Namespaces, which are, in a networking language, equal to the VRFs.
  • Tenant Networks are owned and managed by the tenants. These networks are internal to the Tenant, and every Tenant is basically allowed to use any IP addressing space they want.
  • Provider Networks are networks created by Administrators to map to physical network in data center. They are used to publish services from particular Tenants, or to allow OpenStack VMs (called Instances) to go out of the OpenStack Tenant environment.

To understand the concept of Provider Networks, I'll explain the two types of Provider Networks, as the only possible way of VMs to achieve the connectivity to the outside network.
  • SNAT (Source NAT) is similar to a NAT service an Office uses on a Firewall to go out to the Internet. All the VMs can use a single IP (or a group of IPs) that Admin configured when deploying OpenStack, to get to the Network outside of OpenStack environment (Internet, or a LAN network).
  • Floating IP is used for publishing services. To all the VMs you need to be accessible from the outside, you will manually need to assign a Floating IP.


Bridges and Bridge Mappings are a crucial concept when it comes to OpenStack Networking, it’s all about how different BRIDGES and TAPs come together.
  • br-int, integration network bridge, is used to connect the virtual machines to internal tenant networks.
  • br-ext, external network bridge, is used for the connection to the PROVIDER networks, to enable connectivity to and from virtual instances. Br-ext is mapped to a Physical Network, and this is where the Floating IP and SNAT IP addresses will be assigned to the instances going out from the OpenStack via the Provider Networks.

Let's check the Data Flow now, on an example of a single OpenStack instance (Instance-1) being assigned a Floating IP and accessing the Public Network.


As previously explained, a NAT is done on a br-ex, so the Floating IP is also assigned on a br-ex, and from that point on the Instance is accessing the Public Networks with the assigned Floating IP. In case the Floating IP has not been assigned, the Instance is accessing the outside world using the SNAT.


Why OpenStack requires SDN

As explained before, Neutron is an OpenStack with an API for defining network configuration. It offers multi-tenancy with self-service. Neutron uses plugins for L2 connectivity, IP address management, L3 routing, NAT, VPN and Firewall capabilities.

Here is why SDN is an essential requirement for any OpenStack production deployment:
- OpenStack cannot configure a physical network in accordance with it's needs to interconnect VMs in different Compute nodes.
- Neutron does the basic networking correctly. It cannot do routing the correct way, security policies, HA of the external connectivity, network performance management etc.
OpenStack Neutron defines services for a VM provisioning within an OpenStack deployment, these services include: NAT, DHCP, Metadata, etc.  All of these services have to be highly available and scalable to meet environment’s demands. 
SDN reduces Load on Neutron.
- Last, but not the least, I've never seen a production OpenStack deployment with no SDN. Just saying...

On PaloAlto and NSX Integration

The VM-Series firewall for VMware NSX is jointly developed by Palo Alto Networks and VMware. NetX APIs are used to integrate the Palo Alto Networks next-generation firewalls and Panorama with VMware ESXi servers. Before getting into the technical part, make sure you understand what NSX is and how micro segmentation is deployed, what the difference between the Distributed Firewall and a traditional Firewall that protects the perimeter is. You can check out some of my previous posts in the Blog Map.

The idea is to deploy the Palo Alto Networks firewall as a service on a cluster of VMware ESXi servers where the NSX has been enabled. The objective is to protect the East-West traffic in your VMware environment and "steer" the FW rules between the NSX "native" Firewall and the Palo Alto Firewall. We are doing this integration in order to be able to later enforce different type of Security Policies depending on whether we want to protect the traffic within the VMs of the same Tier (intra-tier), or between different Tiers (Inter-Tier). Best practice would be:

  • Inter-tier traffic (Web server to App or DB server) is protected by PaloAlto Networks VM-series  firewall which provides advanced security capabilities with its single pass architecture in the form of App-ID, Content-ID, and User-ID. On a diagram below a PA NGFW is protecting the traffic between the HRs Web and DB Servers.
  • Intra-tier traffic (web server to web server) is protected by NSX DFW which provides near line rate performance for L2-L4 security functions. On a diagram below, NSX DFW is protecting the traffic between the two HRs Web servers.

Components


Before we proceed with the detailed explanation of how to deploy and configure the environment, let's clear out what are the components of the VM-Series for NSX Solution,  how they work together and what are the benefits. The components of the integrated solution are the following:

  • vCenter Server, the centralized management tool for the vSphere suite. The vCenter server is required to manage the NSX Manager and the ESXi hosts in your data center. This joint solution requires that the ESXi hosts be organized into one or more clusters on the vCenter server and must be connected to a distributed virtual switch.
  • NSX Manager, the VMware Networking and Security platform, or simply said - SDN. The NSX Firewall and the Service Composer are key features of the NSX Manager. The NSX firewall is a logical firewall that allows you to attach network and security services to the virtual machines, and the Service Composer allows you to group virtual machines and create policy to redirect traffic to the VM-Series firewall
  • Panorama, centralized management tool for the Palo Alto NGFW (Next Generation Firewalls). In this solution, Panorama works with the NSX Manager to deploy, license, and centrally administer configuration and policies on the VM-Series firewalls for NSX. Panorama is used to register the VM-Series firewall for NSX as the Palo Alto Networks NGFW service on the NSX Manager. This allows the NSX Manager to deploy the VM-Series firewall for NSX on each ESXi host in the ESXi cluster. When a new VM-Series firewall is deployed in NSX, it communicates with Panorama to obtain the license and receives its configuration/policies from Panorama. Panorama must be able to connect to the NSX Manager, the vCenter server, the VM-Series firewalls and the Palo Alto Networks update server.
  • VM-Series Firewall for NSX (VM-100, VM-200, VM-300, VM-500, and VM-1000-HV, support NSX). The VM-Series firewall for NSX is the VM-Series firewall that is deployed on the ESXi hypervisor. The integration with the NetX API makes it possible to automate the process of installing the VM-Series firewall directly on the ESXi hypervisor, and allows the hypervisor to forward traffic to the VM-Series firewall without using the vSwitch configuration. The VM-Series firewall for NSX only supports virtual wire interfaces. On this firewall, ethernet 1/1 and ethernet 1/2 are bound together through a virtual wire and use the NetX dataplane API to communicate with the hypervisor. Layer 2 or Layer 3 interfaces are neither required nor supported on the VM-Series firewall for NSX, and therefore no switching or routing actions can be performed by the firewall.


Ports/Protocols you need to enable for the Network Communication:

  • Panorama: To obtain software updates and dynamic updates, Panorama uses SSL to access updates.paloaltonetworks.com on TCP/443; this URL leverages the CDN infrastructure. If you need a single IP address, use staticupdates.paloaltonetworks.com.
  • The NSX Manager and Panorama use SSL to communicate on TCP/443.
  • VM-Series Firewall for NSX: If you plan to use Wildfire, the VM-Series firewalls must be able to access wildfire.paloaltonetworks.com on port 443. This is an SSL connection and the App-ID is PaloAlto-wildfire-cloud.
  • The management interface on the VM-Series firewall uses SSL to communicate with Panorama over TCP/3789.
  • vCenter Server The vCenter Server must be able to reach the deployment web server that is hosting the VM-Series OVA. The port is TCP/80 by default or App-ID web-browsing.


Which version of Panorama, vSphere, NSX and PA-VM should I use?

  • Panorama: For a long time the VM-1000-HV was the only available PaloAlto VM Firewall for this integration. Don't get me wrong, it's a great option, but if cost of the solution is something that you might worry you - I've got some good news. Since Panorama 8.0 all the PA-VM versions are supported (VM-100, VM-300, VM-500 and of course VM-1000). It gets even better - you can start with the VM-100, and upgrade from there if you need more capacity in the future.
  • NSX: For my lab I used NSX 6.2.5. I recommend you go directly for 6.3.x, all the concepts explained here apply.

Integration

Now that we know the components, let's see how it all fits together. NSX Manager, ESXi servers and Panorama work together to automate the deployment of the VM-Series firewall, as shown in the diagram below. Lets get deeper into this...





1.1 Install the VMware NSX Plugin

Before you start the integration, you need to make sure that your NSX is operational, NSX Controllers in the "Connected" state (vSphere > NSX > Installation > Management). I strongly advise you to upgrade your Panorama to 8.0.x, if you haven't already. In my Lab I used only 2 hosts at first. Once I had everything fully functional, I added the other hosts.


You need to Download the Plugin from here (you will need a Palo Alto Support account):
https://support.paloaltonetworks.com/Updates/SoftwareUpdates/1904

Log in to Panorama, and go to "Panorama Tab > Plugins". Upload the Plugin, and press "Install". A new "VMware NSX" sub-menu will appear on the left, as shown below.



Next you need to set up access to the NSX Manager. Select Panorama > VMware NSX > Service Managers and click Add. Enter the Service Manager Name and the other required info. If you do this step correctly, on the NSX Manager, this name will be displayed in the Service Manager column on Networking & Security > Service Definitions > Service Managers.

IMPORTANT: The ampersand (&) special character is not supported in the NSX manager account password. If a password includes an ampersand, the connection between Panorama and NSX manager fails.

TIP: Once the Services are Synchronized, in PANOS 8.0 you won´t be able to see the Service Manager Status. Don´t panic, this is Ok. As long as you see the new Service Manager has been configured in NSX (Networking & Security > Service Definitions > Service Managers) - you´re good to go.

In Panorama you will also see a new Administrator user called "__vmware_nsx" has been configured. In NSX try to edit the newly created "Service Manager". You will notice that the credentials are associated to this new user.

1.2 Create Template(s) and Device Group(s) on Panorama

To manage the VM-Series firewalls for NSX using Panorama, the firewalls must belong to a device group and a template. Device groups allow you to assemble firewalls that need similar policies and objects as a logical unit; the configuration is defined using the Objects and Policies tabs on Panorama. Use templates to configure the settings that are required for the VM-Series firewalls to operate on the network and associate; the configuration is defined using the Device and Network tabs on Panorama (Groped as Templates). And each template containing zones used in your NSX configuration on Panorama must be associated with a service definition; at a minimum, you must create a zone within the template so that the NSX Manager can redirect traffic to the VM-Series firewall.

Go to Panorama > Device Groups, and click Add. Name your Device Group something Intuitive, like NSX Firewalls. After the firewalls are deployed and provisioned, they will display under Panorama > Managed Devices and will be listed in the device group.

Now add a template or a template stack. Select Panorama > Templates, and click Add. After this you need to create the Zone for each template (be sure to set the interface Type to Virtual Wire.). Panorama creates a corresponding service profile on NSX Manager for each qualified zone upon commit.

IMPORTANT: For a single-tenant deployment, create one zone. If you have multi-tenant deployment, create a zone for each sub-tenant.

Now you need to add a new Service Definition. This is basically used for Panorama to know how to provision a PaloAlto Firewall on the Hosts where it is needed. Select Panorama > VMware NSX > Service Definitions.

TIP: Before you define the Service Definition, you need to place your PA-XXX.ova file on a Web Server. I know, not as cool as the Architects of the solution imagined it, but still... it´s logical that Panorama needs an Image Repository with different types of PA-VM, because a big environment might require a variety of different Firewalls.

Once the Service Definition is created, Select Panorama > VMware NSX > Service Manager and click the link of the service manager name. Under Service Definitions, click Add and select your service definition from the drop-down.

Now you need to Add the authorization code to license the firewalls. I hope you already have the Auth Code by now. Select Panorama > Device Groups and choose the device group you associated with the service definition you just created. Under Dynamically Added Device Properties, add the authorization code you received with your order fulfillments email and select a PAN-OS software version from the SW Version drop-down. When a new firewall is deployed under NSX and added to the selected device group, the authorization code is applied and the firewall is upgraded to the select version of PAN-OS.

IMPORTANT: You need to Install a License Deactivation API Key in Panorama before you proceed with the FW Deployment in the ESXi Cluster. This is important before you want your Panorama to take care of the Licenses using the Auth-Code.

admin@Panorama> request license api-key set key bea265bdb4c832793b857cfa1bf047845dc82e3b3c1b18c1b2e59796147340eb

API Key is successfully set
admin@Panorama>

2. Register the VM-Series Firewall as a Service on the NSX Manager

2.1 The first step is to register the Palo Alto Networks NGFW as a service on the NSX Manager. The registration process uses the NetX management plane API to enable bi-directional communication between Panorama and the NSX Manager. Panorama is configured with the IP address and access credentials to initiate a connection and register the Palo Alto Networks NGFW service on the NSX Manager. The service definition includes the URL for accessing the VM-Series base image that is required to deploy the VM-Series firewall for NSX, the authorization code for retrieving the license and the device group and template to which the VM-Series firewalls will belong. The NSX manager uses this management plane connection to share updates on the changes in the virtual environment with Panorama.

2.2 Deploy the VM-Series automatically from NSX —The NSX Manager collects the VM-Series base image from the URL specified during registration and installs an instance of the VM-Series firewall on each ESXi host in the ESXi cluster. From a static management IP pool or a DHCP service (that you define on the NSX Manager), a management IP address is assigned to the VM-Series firewall and the Panorama IP address is provided to the firewall. When the firewall boots up, the NetX data plane integration API connects the VM-Series firewall to the hypervisor so that it can receive traffic from the vSwitch.



2.3 Establish communication between the VM-Series firewall and Panorama : The VM-Series firewall then initiates a connection to Panorama to obtain its license. Panorama retrieves the license from the update server and pushes it to the firewall. The VM-Series firewall receives the license (VM-1000-HV) and reboots with a valid serial number.

2.4  Install configuration/policy from Panorama to the VM-Series firewall : The VM-Series firewall reconnects with Panorama and provides its serial number. Panorama now adds the firewall to the device group and template that was defined in the service definition and pushes the configuration and policy rules to the firewall. The VM-Series firewall is now available as a security virtual machine that can be further configured to safely enable applications on the network.

2.5 Push traffic redirection rules to NSX Manager : On Panorama, create security groups and define network introspection rules that specify the guests from which traffic will be steered to the VM-Series firewall. See Integrated Policy Rules for details.

2.6 Receive real-time updates from NSX Manager : The NSX Manager sends real-time updates on the changes in the virtual environment to Panorama. These updates include information on the security groups and IP addresses of guests that are part of the security group from which traffic is redirected to the VM-Series firewall. See Integrated Policy Rules for details.

2.7 Use dynamic address groups in policy and push dynamic updates from Panorama to the VM-Series firewalls : On Panorama, use the real-time updates on security groups to create dynamic address groups, bind them to security policies and then push these policies to the VM-Series firewalls. Every VM-Series firewall in the device group will have the same set of policies and is now completely marshaler to secure the SDDC. See Policy Enforcement using Dynamic Address Groups for details.

3. Create Steering Rules

IMPORTANT: The default policy on the VM-Series firewall is set to deny all traffic, which means that all traffic redirected to the VM-Series firewall will be dropped. Have this in mind before you activate PA NGFW in your VMware environment.

Panorama serves as the single point of configuration that provides the NSX Manager with the contextual information required to redirect traffic from the guest virtual machines to the VM-Series firewall. The traffic steering rules are defined on Panorama and pushed to NSX Manager; these determine what traffic from which guests in the cluster are steered to the Palo Alto Networks NGFW service. Security enforcement rules are also defined on Panorama and pushed to the VM-Series firewalls for the traffic that is steered to the Palo Alto Networks NGFW service.

Steering Rules —The rules for directing traffic from the guests on each ESXi host are defined on Panorama and applied by NSX Manager as partner security services rules.


For traffic that needs to be inspected and secured by the VM-Series firewall, the steering rules created on Panorama allow you to redirect the traffic to the Palo Alto Networks NGFW service. This traffic is then steered to the VM-Series firewall and is first processed by the VM-Series firewall before it goes to the virtual switch.

Traffic that does not need to be inspected by the VM-Series firewall, for example network data backup or traffic to an internal domain controller, does not need to be redirected to the VM-Series firewall and can be sent to the virtual switch for onward processing.

Rules centrally managed on Panorama and applied by the VM-Series firewall —The next- generation firewall rules are applied by the VM-Series firewall. These rules are centrally defined and managed on Panorama using templates and device groups and pushed to the VM-Series firewalls. The VM-Series firewall then enforces security policy by matching on source or destination IP address—the use of dynamic address groups allows the firewall to populate the members of the groups in real time—and forwards the traffic to the filters on the NSX Firewall.

Policy Enforcement using Dynamic Address Groups

Unlike the other versions of the VM-Series firewall, because both virtual wire interfaces (and subinterfaces) belong to the same zone, the VM-Series firewall for NSX uses dynamic address groups as the traffic segmentation mechanism. A security policy rule on the VM-Series firewall for NSX must have the same source and destination zone, therefore to implement different treatment of traffic, you use dynamic address groups as source or destination objects in security policy rules.

Dynamic address groups offer a way to automate the process of referencing source and/or destination addresses within security policies because IP addresses are constantly changing in a data center environment. Unlike static address objects that must be manually updated in configuration and committed whenever there is an address change (addition, deletion, or move), dynamic address groups automatically adapt to changes.

Any dynamic address groups created in a device group belonging to NSX configuration and configured with the match criterion _nsx_ trigger the creation on corresponding security groups on the NSX Manager. In an ESXi cluster with multiple customers or tenants, the ability to filter security groups for a service profile (zone on Panorama) on the NSX Manager allows you to enforce policy when you have overlapping IP addresses across different security groups in your virtual environment.

If, for example, you have a multi-tier architecture for web applications, on Panorama you create three dynamic address groups for the WebFrontEnd servers, Application servers and the Database servers. When you commit these changes on Panorama, it triggers the creation of three corresponding security groups on NSX Manager.



CONCLUSION: Panorama Dynamic Address Group = NSX Security Group

On NSX Manager, you can then add guest VMs to the appropriate security groups. Then, in security policy you can use the dynamic address groups as source or destination objects, define the applications that are permitted to traverse these servers, and push the rules to the VM-Series firewalls.

Each time a guest is added or modified in the ESXi cluster or a security group is updated or created, the NSX Manager uses the PAN-OS REST-based XML API to update Panorama with the IP address, and the security group to which the guest belongs.

When Panorama receives the API notification, it verifies/updates the IP address of each guest and the security group and the service profile to which that guest belongs. Then, Panorama pushes these real-time updates to all the firewalls that are included in the device group and notifies device groups in the service manager configuration on Panorama.

On each firewall, all policy rules that reference these dynamic address groups are updated at runtime. Because the firewall matches on the security group tag to determine the members of a dynamic address group, you do not need to modify or update the policy when you make changes in the virtual environment. The firewall matches the tags to find the current members of each dynamic address group and applies the security policy to the source/destination IP address that are included in the group.

Is this a Multi Tenant environment? For enabling traffic separation in a multi-tenancy environment, you can create additional zones that internally map to a pair of virtual wire sub-interfaces on the parent virtual wire interfaces, Ethernet 1/1 and Ethernet 1/2.

Nuage Networks VSP Deep Dive

Ever since Cisco bought Insieme and created Cisco ACI, and VMware bought Nicira and created NSX, I've been intensively deep-diving and blogging about both these solutions, how they compare to each other and to some Open Source SDN solutions out there, such as OpenDayLight and Open Contrail |(check out the Blog Map section for some of my older posts). I even did boot camps and got the highest certifications in both NSX and ACI. SDN is still a rather new technology, and I wanted to make sure I have enough expertise to always explain to a customer which SDN solution is the right one for their Organization and why. Apart from ACI, NSX and open source solutions, there is another player on the SDN market, and from what I've seen - they mean business! I'm talking about Nuage Networks, acquired by Nokia from Alcatel-Lucent in November 2016. Even though I've known about this solution for a while, my opinion was that their strongest side was marketing, so I didn’t spend a lot of time investigating about Nuage (it's also pretty difficult to find the information about Nuage, there is an apparent lack of experts/blogs/technical info about the product). I finally decided to give them an opportunity, I did a boot camp, a lots of Hands-on, and recently I passed a certification 4A0-N01 Nuage Network Professional – Datacenter (NNP-DC). Let me share with you what Nuage Networks is all about, and give the unbiased opinion about their understanding of SDN, and how they compare to other SDN solutions on the market.

Disclaimer: Some of the materials I used come directly from Nuage technical documentation, which is for some reason not available to the public (and it should be!). If someone from Nokia is reading this, please note that revealing technical information about what your product provides more market, and gives more visibility to your product. I strongly advise you to make as much Nuage documentation public as possible, because if your product is good (and in my opinion - it is), invite bloggers and technical experts to give you feedback, if they feel comfortable with your product, they will feel free sharing it with potential customers.

Before I get deeper into what Nuage VSP is good for, let's make sure we understand the difference between the IaaS, PaaS and SaaS. In order to really get what your company is doing (or should be doing), whether you are a Service Provider (SP), or an Enterprise consuming resources provided by other Service Provider, you need to have a clear distinction of what is handled by whom in each of these architectures. Basically:

  • IaaS (Infrastructure as a Service) - SP Provides Network, Compute and Storage, Customer builds OS and Apps
  • PaaS (Platform as a Service) - SP also Provides the OS. Who takes tare of the OS Upgrades and other stuff? Good questions… Depends on the PaaS Provider, could be either way.
  • SaaS (Software as a Service) - SP owns everything, including the application

Let's start with the basics. We already know what SDN is all about, separating the Control Plane from the Data Plane, and providing a single Management plane that exposes the Northbound APIs. Nuage  follows the same concepts. Nuage created a platform called VSP. VSP  stands for Virtualized  Services Platform, and it does the Orchestration of the Deployment, handling  the following Planes:

  • Management plane, represented by Nuage Virtual Service Directory (VSD) and the Cloud Management System or CMS (OpenStack, CloudStack etc.)
  • Control plane, handled by Nuage Service Controller (VSC)
  • Data Plane, handled by a Virtual Router & Switch (VRS)



VSP includes the software suite comprising of three key products:

  • VSD (Virtual Services Directory), which holds the policy and network service templates.
  • VSC (Virtual Services Controller), which is the SDN controller that communicates to the hypervisors.
  • VRS (Virtual Routing and Switching) agent that resides within the hypervisor on the server hardware.

Let's now take a deep dive into what communication protocols are deployed between different VSP components:

  • Communication between the CMS (Cloud Management System, such as OpenStack, CloudStack, vCenter, vCloud, etc.) and the VSD is done via RESTful APIs. We're talking about the Northbound APIs that allow us to configure Nuage Platform, or VSP.
  • Communication between the VSD and VSC is via industry standard XMPP (Extensible Messaging and Presence Protocol), using the Management network. SSL is optional, but recommendable.
  • Communication between the VSC and the hypervisors (including the VRS) is via OpenFlow, using the Underlay Network. SSL is again, optional but recommended.
  • SDN is all about virtualization, but luckily - physical servers have not been forgotten. To integrate “bare metal” assets such as non-virtualized servers and appliances, Nuage Networks also provides a comprehensive Gateway solution: software-based VRS gateway (VRS-G) and hardware-based 7850 VSG.

I'd recommend you to get acquainted with the individual components of the architecture by reading the rest of this post first, and then re-visit the previous paragraph. It will all make much more sense for you.

Let's now check out individual Nuage VSP components, and see what each one does. Once again, I'll try to be methodical (not an intuitive task for my mind), and try to structure the post, so that you can follow:

  1. VSD, or the Virtualized Services Directory at the Network Management Plane
  2. VSC, or the Virtualized Services Controller, at the Network Control Plane
  3. VRS and VSG , or the Virtualized Routing & Switching and Virtualized Services Gateway, at the network Data Plane
  4. Security Policies: NFV and Service Chaining


1. VSD  - Virtualized Services Directory, holds the Policy and Network Templates. VSD uses XMPP protocol to communicate with VSC.

VSD is where we do the Service Definition by defining Network Service Templates. The service definition includes domain, zone, subnet and policy templates. A domain template can also include policies (e.g. security, forwarding, QoS, etc.) to be applied at the different levels (vPort, subnet, zone, domain). I will cover all these concepts in just a while. It´s an essential component that will manage everything. It can be deployed as a Physical or a Virtual machine. it comes as an OVA file (for ESXi) and QCOW2 file (for KVM), or as an ISO image (recommended for the production environment). You can choose whether you want to do a standalone deployment, or a cluster of 3 VMs. To work properly, VSD requires an NTP server and a DNS server in the network.


The VSD also contains a powerful analytics engine (optional, based on Elastic Search). The VSD supports RESTful APIs for communicating to the cloud provider’s management systems. In the case of OpenStack, it is between nova and nova-compute, while vCloud uses the vCenter API to access the ESXi HVs.

VSD has two types of users:
Administrator/CSP Users, who will have full visibility into all of the functionality of VSD
Enterprise/Organization Users. An enterprise user belongs to one, and only one, specific enterprise.

TIP: Have in mind that if you're interesting in LDAP, users must be manually created in VSD, even if they have already been created in the LDAP directory.

VSD Service Abstraction - a VSD way of creating an Object Tree, where the Domain is the one Root and Zones, Subnets and another Object have an exact place in the Tree. VSD then translates the Service Abstraction into the Service Instances, following the same Object Tree. Having in mind that domain is mapped to a distributed VPRN instance (dVPRN) while a subnet is mapped to a distributed RVPLS instance (dRVPLS), we are counting with:

  • L2 Service Instances (vRVPLS)
  • L3 Service Instances (dRVPN)


Domain:  An enterprise contains one or more domains. A domain is a single “Layer 3” space, which can include one or more subnetworks that can communicate with each other. In standard networking terminology, a domain maps to a VPRN (Virtual Private Routed Network) service instance. Route distinguisher (RD) and route target (RT) values for the VPRN service are generated automatically by default, but can be modified. CSP Root users can create domain template for all the enterprises. Enterprise Administrators and Network Designers group users can create domain templates for their enterprises. Users that belong to other groups cannot create domain templates.
Layer 2 Domain:  A standard domain is a Layer 3 construct, including routing between subnets. A Layer 2 domain, however, is a mechanism to provide a single subnet, or a single L2 broadcast domain within the datacenter environment. It is possible to extend that broadcast domain into the WAN, or legacy VLAN.
Zone:  Zones are defined within a domain. A zone does not map to anything on the network directly, but instead it acts as an object with which policies are associated such that all endpoints in the zone adhere to the same set of policies.
Subnet: Subnets are defined within a zone. A subnet is a specific IP subnet within the domain instance. The subnet is instantiated as a routed virtual private LAN service (R‐VPLS). A subnet is unique and distinct within a domain; that is, subnets within a domain are not allowed to overlap or to contain other subnets in accordance with the standard IP subnet definitions.
vPorts: Intended to provide more granular configuration than at the subnet level, and also support a split workflow. The vPort is configured and associated with a VM port (or gateway port) before the port exists on the hypervisor or gateway. Ports that connect the Bare Metal Servers to an Overlay are also called vPort. Whenever an vPort is instantiated, an IP address is assigned to it, unique at the Domain level, from the Subnet that the vPort belongs to. VSD is responsible for assigning the correct IP address, regardless if the VM asks for a specific IP (statically configured on the OS), or from a DHCP pool. The same Virtual IP can be assigned to multiple vPorts for redundancy (must be different then any of the IPs assigned to the vPorts).
All ports will have a corresponding vPort, either auto-configured or configured via REST API. Configuration attributes may optionally be configured on the vPort.

VM is formed from its profile, which contains the VM metadata. This metadata defines which Domain, Zone, Subnet and vPort to apply to every vNIC of the VM. It also defines which Enterprise and User Group it belongs to.  Additionally, some metadata may be specified if attaching to a specific vPort is required. When a new VM is created, a VM creation request is sent to the VSC from the VRS agent in an OpenFlow message using the Underlay Network. This message contains the VM-related metadata. VSC forwards the request one level higher in the hierarchy, to the VSD in an XMPP message using the Management Network. The VSD receives the VM creation request, reads its metadata and checks them against the policy definitions. The VSD learns the MAC address assigned to this VM from the metadata, and in a VSD managed IP address allocation scenario, it assigns an IP address for it from the subnet (usually the next available IP address).

VSD has a somewhat complex architecture. The components of the VSD can be centralized on a single machine or distributed across multiple machines for redundancy and scale. Some of the most important to have in mind at this point are:

  • TNC stands for trusted network connect, which is an open architecture for network access control.
  • Policy management engine evaluates the policy rules configured on the VSD (Security and QoS policies, IP assignments etc.) It sends policies to VSC based on network events.
  • VSD mediator is a VSD Southbound interface used for communication to the VSC. It receives requests for policy information and updates from the VSC, and pushes policy updates to the VSC. The VSD itself is an XMPP client: it communicates with an XMPP server, or server clusters.
  • Statistics engine collects fine-grained network information at the VRS, VSC and VM levels. It can collect various packet-based statistics such as Packets in/out, dropped packets in/out, dropped by rate limit etc. It provides an open interface for Nuage and third-party analytics applications. Have in mind that by default, Statistics collection is disabled on the VSD. A separate VSD node running Elastic Search needs to be deployed (can also be deployed as a Cluster).
  • REST API is the VSD Northbound interface, which exposes all the VSD functionalities via API calls. It can be used by Nuage CMS plug-ins for integration with many CMSs.


2. VSC - Virtualized Services Controller - SDN Controller, controls the Network, communicates with the Hypervisor and collects the VM related information such as MAC and IP addresses . VSC uses OpenFlow to control the VRS. On each VRS we need to define which VSC is active, and which is standby (you can configure various active VCS for Load Balancing). OpenFlow uses TCP port 6633, and it is used to download actual L2/L3 FIBs to the virtual switch components on the Hypervisor.


VSC is only installed as a VM (or as an Integrated Module on a Nuage NSG, when NSG is used as VxLAN Gateway), and it comes as OVA file, a QCOW2 file and a VMDK file. VSC has a control interface connected to the Underlay. It is based on Nokia Service Router Operating System (SROS), which is somewhat similar to Cisco IOS (not the same commands, but… intuitive, if you come from Cisco).

Now comes a really cool part about why Nuage. Controllers act like Router Control Plane, and routing is established between VSCs and other routers. This makes is so much easier to implement DCI. VSC needs a routing protocol to exchange the routes with the other VSCs. It can be ISIS, OSPF or Static Routes. MP-BGP EVPN also needs to be established between all the VSCs.

The VSC has three main communication directions:
Northbound: to the VSD via XMPP
East/West: federation functions to other VSCs or IP/MPLS Provider Edge nodes via MP-BGP
Southbound: to the VRSs via OpenFlow

3. VRS (Data Plane) - Virtual Routing and Switching plugin inside the Hypervisor. It´s based on OVS, and it´s responsible for L2/L3 forwarding, encapsulation.

 On VRS you can define various VSC for redundancy and load balancing (one active and one standby), and each of them establishes an OpenFlow session using the Underlay network, not Management , using TCP port 6633 (SSL is optional).

VRS includes two main Nuage components:

  • VRS Agent, that talks to VSC using OpenFlow. It's responsible for programming L2/L3 FIBs, and it replies to all ARP (no flooding). It also reports changes in VMs to the VSC. The forwarding table is pushed to VRS from VSC via OpenFlow. It has not only a view of all the IP and MAC addresses of the VMs being served by the local hypervisor, but also those which belong to the same domain (L2 and L3 segments), that is, all possible destinations of traffic for the VMs being served by that HV.
  • Open vSwitch (OVS), provides Switching and Routing components and Tunneling to forward the traffic.

VRS supports a wide range of L2 and L3 encapsulation methods (VXLAN, VLAN, MPLSoGRE) so that it can communicate with a wide range of external network endpoints (other hypervisors, IP- or MPLS-based routers).


Let's get even deeper into the connection between the Control Plane and Data Plane, or VSC and VRS in Nuage Language. Nuage Networks uses Open Source components, such as libvert, OVS and OpenFlow. Nuage Networks makes use of the libvirt library in the VRS component that runs in Linux-based hypervisor environments (Xen and KVM) to get VM event notifications (new VM, start VM, stop VM, etc.). Libvirt is a package installed on the Hypervisor. Nuage also installs Nuage VRS. This enables the usage of User space tools:

  • Virt-Manager: For GUI
  • Virsh: Commands (CLI)

Before we continue, let's make sure we understand the basic concepts needed to understand the VRS and VGS (VRS-G included). Basically we need to understand:

  • What is OVS (Open virtual Switch).
  • Difference between the Underlay and the Overlay.
  • What is VxLAN, what are VTEPs, and how it all works.

Open vSwitch (OVS) is a major building block for Nuage SDN. It implements a L2 bridge includingu MAC learning. . OpenFlow is used to configure the vSwitch. It's used for Linux Networking and it's part of Linux Kernel, now used instead of Linux Bridge. OVS can be configured via CLI, OpenFlow or OVSDB management protocol. OVS doesn’t work like VMware VDS or Cisco 1000v. Instead, it only exists on each individual physical host, and it makes it easier for developers of virtualization/cloud management platforms to offer distributed vSwitch capabilities. In Nuage, OpenFlow is used to program the virtual switch within the hypervisor, with the vSwitch becoming the new edge of the datacenter network. The OVS becomes the access layer of the network. The access is where control policies are typically implemented: ACLs, QoS policies, monitoring (netflow, sflow), OVS has these features, and also provides an SDN programmatic interface (OpenFlow and OVSDB management).

The three main components of OVS are:

  • ovsdb-server is the configuration database which contains details about bridges, interfaces, tunnels, QoS, etc.
  • OVS kernel module handles the data path, including packet header handling, table lookup and tunnel encapsulation and decapsulation. The first frame of a flow goes to ovs-vswitchd to make the forwarding decision; the following frames are then processed by the kernel.
  • ovs-vswitchd matches the first frame for a “flow” action (L2 forwarding, mirroring, tunneling, QoS processing, ACL filtering, etc.) and caches these in the flow table in the kernel module.


The Open vSwitch is configured by the “control cluster” through a combination of the following methods:

  • SSH and the CLI can be used to manually configure the switch locally
  • The OVSDB management protocol is used to create switch instances, attach interfaces and define QoS and security policies.
  • OpenFlow is used to establish flow states and the forwarding tables for these flows
  • Netlink is the Linux communication API used between kernel and user space

Open vSwitch can also be implemented on hardware switches, for example an SDN white box switch, as OVSDB management protocol is also implemented on some vendors’ switches.

Overlay Network: Virtual abstraction built on top of a Physical Network. There are Network-Centric overlays (VPLS, TRILL, Fabric Path) where hosts are not aware of the Overlay, and Host-Centric (VxLAN, NV-GRE, STT) where hosts help create the virtual tunnels.

VxLAN: You can check out my previous posts (go to Blog Map) for more details on how VxLAN Control Plane and Encapsulation take place. VXLAN has a 24 bit VXLAN identifier, which allows for 16 million different tenant IDs. The VXLAN UDP source port is set on the sending side with a special hashing function that allows for load balancing of traffic by ECMP (equal cost multiple path) in the datacenter network. Destination Port is 4789. On the data plane, each VTEP capable device needs to have a forwarding table with each possible destination MAC address within the same L2 domain and the hypervisor hosting it. The VNI identifies the L2 domain within the DC.

More and more server NIC cards support VXLAN offload functionality, which improves the encapsulation/decapsulation performance.

All VTEPs (Virtual Tunnel End Points) in the VxLAN Control Plane need at least the IP connectivity. VTEP needs to act as the default gateway for all the subnetworks that its hosted VMs belong to. In order to do this, VTEP will be assigned a MAC address and an IP address within each of such subnetworks. The combination of the IP and MAC addresses corresponding to a given VM is known as EVPN prefix. When a packet is sent by a VM to its default gateway, because its final destination is an IP address in a different subnetwork, the VTEP will look into its EVPN route table, swap the destination MAC address (presently pointing to the default gateway) to the MAC address of the VM intended to receive the packet, and send the frame to the VTEP hosting the destination VM using the corresponding VXLAN tunnel.

BGP EVPN is an Address Family that can include both, IP and MAC address for a given end point. Forwarding tables on each hypervisor contain information about all VMs in all subnets (each subnet corresponds to a different EVPN instance). VXLAN tunnels exist to reach these subnets on all the hypervisors. Backhaul VPLS brings optimization and enhanced scaling for the number of EVPN MAC addresses and tunnels. With this optimization, each VRS receives only complete forwarding information related to subnets (EVPNs) locally hosted on itself. Each VRS is still aware of every VM in remote subnets, the hypervisor hosting it and its IP address (but not its MAC address). Consequently, when a VM wants to communicate with another VM in a remote EVPN, the VRS (acting as the default gateway) only has to do a route-table lookup to identify which hypervisor is hosting the relevant IP address. This way, it can use the VXLAN tunnel indicated by the backhaul VPLS to forward the packet. There is no need in this case to find the corresponding VPLS and to do an additional L2 FDB lookup to determine the destination MAC address, as would happen if the subnet were not remote.


VRS is in the Underlay Network, and OVS is in the Overlay Network. All Hypervisors need at lease one interface connected to the Underlay Network. You can also have the VTEP assigned to the ToR Switch instead of a Hypervisor, but the concepts don't change.


VSG - Virtual Services Gateway allows the interconnection between Physical and Virtual domains. It basically translates VLAN to VxLAN (VxLAN towards Nuage Overlay, and VLAN to Legacy Infrastructure). There are two Nuage versions (physical and virtual), and a version for the "White Boxes":

  • Software (VRS-G) which offers Network ports via Overlay (VxLAN) and access ports to the traditional network (VLAN)
  • Hardware, 7850 VSG is a 10/40G  Switch providing VTEP GW functionality (VTEP in Hardware).
  • Hardware VTEP on a White Box


VSN is a Virtual Service Node, composed of VSC and a group of VRS. VSC is like a Control Plane, and VRS are like a Hypervisors. The VSN provides the network operator with a unified view of all the elements being handled by it, making HVs appear as line cards in a chassis when compared to a classic router. It provides a one-stop management and provisioning point for all the HVs under the VSN control.



4. Don’t forget the Security: NFV and Service Chaining

Security Policies are defined at a Domain level, define From/To Zones and/or Subnets. It is important to understand the relative directions of security policies before implementing them. The easiest way to understand the directions is to imagine it from the OVS point of view. This would mean that the INGRESS would me traffic entering the OVS, and Egress - traffic going OUT of the OVS:
Ingress refers to the direction of traffic flow from the VM towards the network (or the OVS component).
Egress refers to the direction of traffic from the network (or the OVS component) towards the VM

Policies have priorities which allows defining the order. They can be Imported/Exported between the Domains, or to/from a File. Before you apply the policies:
By default all INGRESS traffic is dropped (INGRESS means from VM to the OVS).
By default all EGRESS traffic is accepted (from OVS to VM).

When defining the Security Policy, it's important to have in mind Nuage mode of operation, shown on the diagram below.


At the time of creation, a Policy Group Type is assigned to each Security Policy:
Hardware, for hosts and bridge vPort hosted in Nuage VSG/VSA Gateways
Software, VRS and VRS-G hosted vPort, including VM, host and bridge vPort

Important: When you do Stateless, you need ACLs for "returning" traffic. For stateful you just need one policy in one direction.

ACL Sandwich feature enables a network admin to define a supra-list that will drop specific traffic that should NEVER reach the VM. The end user who owns the domain instance can then combine ACL rules into ACLs defined on the domain instance level.

Logging can be able on ACL entry level.

Service Chaining
VSP provides so called Forwarding Policies to control the redirection of packets. This is what later enables Service Chaining. In my opinion, Nuage has the most elegant implementation of Service Chaining of all SDN products out there. All is implemented through flow-based redirection.

Nuage supports Physical and Virtual L4-7 Appliances/Cluster of Appliances as redirection targets, and it gives you the option of creating the Advanced Redirection Policies, where you're given the option to redirect only the traffic destined to a certain TCP/UDP port.

How to sell SDN

The most important thing about presenting SDN to a potential Customer, and about how you need to focus your Presentation, and I cannot stress this enough: your entire speech needs to be adapted to your audience.

1. Networking and Security Department

What you need to know before you start planning the presentation:
Before we get to the point, you need to understand that the Networking guys do not want SDN. Within the Networking department you will easily distinguish two types of engineers:
- The ones who hate SDN, hate you for presenting it, and just want to continue doing things their own way.
- The ones who understand that unless they understand and learn SDN, the System guys will choose the product, learn it, and take care of Networking themselves, making the Networking department obsolete. You should always direct to this group in your presentations.

What's the most positive thing SDN brings to the table?

SDN is a concept of a Network that is Multi-Tenant, that has a single point of control of the entire Network, and most importantly - allows you to "consume" the Network using the APIs. This means that the Network department can give your Developers, or Cloud Admins, the tools and teach them to consume the Network. This way you avoid a usual delay that Networking department needs to configure Networking and Security for the new Apps and Services, and most importantly - the concept of Tenant allows them to use overlapping IPs, VLANs, Names, without ever being able to compromise the stability of your Network.

What will they want to know? Here is a day to day of a Network Admin:
- Something stops working.
- In average it takes around 10 micro seconds before someone says "Hey, maybe it's a networking issue?"
- Regardless if your Network Admin has "more important stuff to do", he ends having to verify the entire Networking environment, because of a "issue in a production environment" and everything goes on top of the network.
- The issue gets resolved. More often then not, Network Admins get no feedback about how they solved it.

Network Admins just want no one to shout at them because something isn't working correctly.




This just means that the only thing the Network Admins will be demanding from your SDN solution is a set of easy-to-use  Troubleshooting tools. Have this in mind when preparing your presentation.


1. Systems/Cloud Department
What's the most positive thing SDN brings to the table?
Networking department is handling so many "critical production issues" that they hardly have any time to provision the networking for new services. Even when they have time, they have to take so much care just not to break something in the network while configuring new stuff. In the world where it takes us seconds to bring up a new instance of VM or a Container, the current Network model just won't do. System guys need a way to simply provision the networking without writing an essay to the Networking department detailing why their request needs to be prioritized. This is why it will be really easy to make these guys understand (and probably love) any SDN solution you might be presenting.

What will they want to know? This depends on the solution you're trying to position. Have in mind that these guys will love "graphical" solutions, such as VMware NSX and Nokia Nuage , and since they have a limited knowledge of Networking, it will be complicated to explain the advantages of the solution that handles Physical + Virtual Network, such as Cisco ACI and OpenDayLight.


3. Software Developers
Developers have similar "needs" like the System guys, they need a way to simply provision and secure the communication flows. If you tell them that the solution you're presenting gives them the possibility to consume the Network using API calls - they're on board.

4. Mixed audience
This is probably the most complex audience you can possibly have when talking about SDN, because each of the departments will understand the concept in a different manner. Be sure that you can handle the opened discussion, you have to be a true SDN Ninja to handle the "lost in translation" paradox that will occur. I strongly advice you to bring both, Networking/SDN and Systems Experts to a presentation of this type, and make sure that YOUR experts agree on what SDN is before you let them approach the client as a team.


What are Cisco Cloud Center (CliQr) and UCS Director, how to choose/integrate?

Before we get into the details about each technology, and how you should choose which one best fits in your environment, I would strongly advise you to sit down and think about what exactly you need, what would be your ideal target environment. While doing this here are a few questions you need to ask yourself:

  • What do I want to offer, IaaS, PaaS, SaaS, or a combination of these?
  • Do you want to automate the Application Deployment or Infrastructure Deployment?
  • Are you really ready for automation? I strongly believe that once you choose your Platforms, you should stick to it, because everything can be done in each of these… It's just that some are more suitable for certain tasks/ways of use then the others.


USC Director is used for the Infrastructure Automation and Management (yes, management as well!). UCS has a huge Task Library for Infrastructure Elements such as Cisco Nexus and ACI, UCS, NetApp, EMC, vCenter, VMware vSAN etc.


The main competitors of UCS Director are:

  • vRealize Suite (Automation, Orchestration) by VMware. I've seen very cool projects done with vRealize, but typically it's optimized for a mostly VMware environment.
  • Terraform by HashiCorp. Linux geeks tend to love this one, as it only has a command mode, and you can deploy your infrastructure directly using the Code.
  • Ansible by RedHat. You right your own Playbooks, they are human-readable. Very flexible.


Why choose UCS Director? It really depends on your environment and what you want to do. In my opinion it's a perfect fit when you want to include the automation of the physical infrastructure in your Workflow,  and get the unified support by Cisco. Out of the Box UCS Director has bunch of Tasks already at your disposal (as you probably guessed, most Cisco products, such as ACI, Nexus, UCS etc. are already included). If you need to add tasks, there is a pretty nice community. Just check this one, it's a UCSD Workflow INDEX (UCSD Technical Content Index):
https://communities.cisco.com/docs/DOC-56419

TIP: If you are really interested in UCS Director I strongly advise you to build your own Lab and test it, before you make a purchase. Don’t trust that Power Point, the stakes are to high. There is a built in Evaluation License in a UCS Director, and you can download it as an OVA or VMDK from Cisco.

Cisco Cloud Center (ex Cliqr) is a CMP (Cloud Management Platform). It is was a pretty pleasant surprise for me to see that Cisco is finally learning how to do Software products. In all fairness most of the original code comes from the company they acquired (Cliqr), but still… they also bought Insieme and turned it into ACI, and… well, you know how ACI GUI is.



The main competitors of Cisco Cloud Center are:

  • CloudForms by RedHat. While CloudForms is more flexible, it doesn’t come with Libraries so you will need to do most coding yourself.
  • vRealize Suite again, since it now supports Public Cloud.
  • Rightscale, which purely follows SAAS model. You can not deploy Rightscale in your environment. Its already hosted somewhere, all you do is, login to it, add your cloud solution account and start managing it.
  • Others (CloudBolt, Oracle etc.).
  • Dell Multi-Cloud Manager (don’t use this one, sorry @Dell).


These sound similar, should I use UCS Director, Cloud Center, or both?

A short answer would be - UCS does Infrastructure, Cloud Center does Application. This does not mean that UCS Director couldn’t automate the Application deployment, or that Cloud Center cannot do the infrastructure. It means that both products are better suited doing what they were designed to do. Now, go back to the first paragraph and answer the questions. At this point you should have a clearer picture which is the right product for you.


What if you need both, Application Deployment automation with Infrastructure modifications in accordance with the Application needs? In such case, you would use both, UCS Director as Day 1 product, and the Cloud Center for Application Deployment in Multi Cloud. On top of both these you would need an Orchestrator of Orchestrators. This is where you would place your Service Catalogue which would then use UCS Director and Cloud Center Northbound APIs to Automate your Application Deployment, doing the Application Tiers and Infrastructure deployments separately.


If you don’t want to build your own Service Catalogue Web, Cisco has a product of this type called PSC (Platform Service Catalogue). It's simple, but I'm not really sure how expensive it is… after all, it is Cisco.

How DevOps and Cloud raise the importance of System Integrator

System Integrators, buckle up, DevOps is coming, and if you play your cards right - your role is about to get crazy important.

Let me start this post by telling a story. It's a story that involves a stubborn customer, 3 big vendors and a Cloud. The reason I need to start this way is simple - the same scenario with different "players" has happened so many times in the last few years that someone should sum up what we've all learned (or haven't, in some cases). I guess this would be a great place for a Disclaimer, and  I'll quote my favourite disclaimer ever, from South Park: All Customers and events in this post, even those based on real people, are entirely fictional.

The story starts with Customer learning that Cloud is cool, and starting wanting it. The problem is that there is no manual on Google on how to build a personalised private cloud. That's no problem, why not just promote (rename) your head systems engineer to a Cloud Architect and follow his ideas and his experience_ "But… he's got no experience" you'll say, and you'd be right. What he does have is a lot of vendors at his disposal with fancy PowerPoint presentations explaining how cool and awesome OpenStack is. Everything is awesome,  (check out the LEGO movie soundtrack below, but be warned - It will stay in your head for days), not let's go build a cloud!



Now comes the tricky part. A majority of the bigger customers prefer working directly with the Vendors Professional Services, in order to lead the deployment themselves and hold someone accountable for a potential lack of functionalities, or problems with the product. While there is logic to this philosophy, it applies more to the legacy infrastructure, where there is no complexity of the integration between different technologies. So… why doesn't it apply to the Cloud? In the past few years I have seen so many different scenarios where a customer followed this strategy and either rolled back the entire environment, or is still struggling to make a Lab working. For now - lets just state the 3 most obvious you should not use this strategy:

  • "Lost in Translation" bug in the Integration phase: Just like your Network, Systems and Apps engineers don't really understand each when you try putting them in the same room and making them collaborate, the Vendors of different types also wont be able to easily collaborate. Don't think that you'll be able to lead this collaboration, you will most definitely end up with different vendors pointing a finger at each other when asked why the integration is not working.
  • Support: Each Vendor will support their own product, but no one will give you a support of the Integration, which is the most complex part. A Cloud environment is difficult to build, but easy to operate. If something goes wrong - it goes REALLY wrong, and you will need an expert who understands the integration of the components in depth. It's impossible to demand this from your Cloud Architect or a Lead Engineer. Once the environment was initially built, they most probably didn’t spend their afternoons reviewing the Plugins/Drivers/Manual code modifications done in the implementation phase, and they will not be able to troubleshoot anything.
  • Upgrades/Modifications: Now imagine the moment you realize that your OpenStack/SDN/Orchestrator/etc. is obsolete, and you need to Upgrade. How does Upgrading each of the components impact the stability of the entire system? You will basically go back to the 1st problem, each time you need to modify anything on your Cloud.

What should be your strategy when deploying the Cloud?

The answer is rather simple actually. You need a partner, most likely a System Integrator, with a strong partnership with all of the Vendors whose products you wish to include in your Cloud environment. Here are the main reasons to involve a System Integrator:

  • They also had the "Lost in Translation" problem, but it was most likely a long time ago. At this phase different area specialists know how to talk to each other, and they can even help you teach your own employees how to do the same.
  • All the disputes between the different Vendors will be transparent to you, and the System Integrator is more likely to figure out why the Integration isn't working, and either work with a Vendor to resolve the issue giving you a full transparency. They can even engineer a custom code within the solution for you and give you support for it.


Conclusion

This is not an easy task for the System Integrator, but as soon as everyone starts understanding how the new system should work, it will be so much easier to deploy a stable and fully supported and Upgradable Cloud environment, without an Engineering department that companies like Google, AWS and Facebook have managing their clouds.

Cisco ACI Unknown Unicast: Hardware Proxy vs Flooding Mode

Before we start, lets once again make sure we fully understand what Bridge Domain is. The bridge domain can be compared to a giant distributed switch. Cisco ACI preserves the Layer 2 forwarding semantics even if the traffic is routed on the fabric. The TTL is not decremented for Layer 2 traffic, and the MAC addresses of the source and destination endpoints are preserved.

When you configure the Bridge Domain in the ACI, you need to decide what you want to do with the ARP packets, and what you want to do with the Unknown L2 Unicast. You can basically:

  • Enable ARP Flooding, or not.
  • Choose between the two L2 Unknown Unicast modes: Flood and Hardware Proxy.




Hardware Proxy

By default, Layer 2 unknown unicast traffic is sent to the spine proxy. This behaviour is controlled by the hardware proxy option associated with a bridge domain: if the destination is not known, send the packet to the spine proxy; if the spine proxy also does not know the address, discard the packet (default mode).

The advantage of the hardware proxy mode is that no flooding occurs in the fabric. The potential disadvantage is that the fabric has to learn all the endpoint addresses.

With Cisco ACI, however, this is not a concern for virtual and physical servers that are part of the fabric: the database is built for scalability to millions of endpoints. However, if the fabric had to learn all the IP addresses coming from the Internet, it would clearly not scale.


Flooding Mode

Alternatively, you can enable flooding mode: if the destination MAC address is not known, flood in the bridge domain. By default, ARP traffic is not flooded but sent to the destination endpoint. By enabling ARP flooding, ARP traffic is also flooded. A good use case for enabling ARP flooding would be when the Default Gateway resides outside of the ACI Fabric. This non-optimal configuration will require ARP Flooding enabled on the BD.

This mode of operation is equivalent to that of a regular Layer 2 switch, except that in Cisco ACI this traffic is transported in the fabric as a Layer 3 frame with all the benefits of Layer 2 multi-pathing, fast convergence, and so on.

Hardware proxy and unknown unicast and ARP flooding are two opposite modes of operation. With hardware proxy disabled and without unicast and ARP flooding, Layer 2 switching would not work.

This option does not have any impact on what the mapping database actually learns; the mapping database is always populated for Layer 2 entries regardless of this configuration.

What is NFVi or Cisco NFV Infrastructure, and where exactly does it "fit"?



First let's establish the difference between the NFV and the VNF:
  • VNF (Virtualized Network Function) refers to the implementation of a network function using software that is decoupled from the underlying hardware. It simply moves network functions out of dedicated hardware devices and into software. Cisco currently has around 90 VNFs ready to be implemented, mostly for the SP environment.
  • NFV (Network Functions Virtualization) represents a concept, and it's based on running SDN functions, independent of any specific hardware platform.

This all simply means that we need the network functions virtualization (NFV) architecture to support the deterministic placement of virtualized network functions (VNFs).

Network Functions Virtualization is "the new black" in the Networking Security, and all of us Network Bloggers have been talking about it extensively within the past few years. What it basically means is that we are finding a way to virtualize one of the Functions of our Network/Network Security Elements, such as LB or FW. The concept is rather simple, and while the entire industry is wondering why we're not there yet, the Network Engineers (meaning - the ones with real understanding of how networking protocols work) are having a really hard time explaining why it can't all work as simply as the OpenStack enthusiasts expect it to.

Server Virtualization is not complicated. Once you have a Hypervisor - you can create numerous Virtual Machines on a single Bare Metal server. Networking is much more complicated. In order to implement the NFV, you need to have the Networking part underneath it completely handled and controlled, you need the ALL-to-ALL connectivity provisioned in your underlay, and just "apply" the desired connectivity in accordance with what your VNF needs. This might be simple if we're talking about a couple of switches where we would simply extend a big group of VLANs all over the place, but as soon as we get into a bit more complicated Networking Architecture (as in - any serious companys DC network) - we add Spanning Tree, Routing, VxLAN Control Plane (and all other control planes that use MCAST) etc. If we don't have an SDN solution capable to handle both Physical and Virtual Network elements - we shouldn't even start thinking about the NFV. It would be like trying to breathe in space, you know WHAT you want to do and which organs you need to activate, but there is simply no all-to-all elements connectivity, which would be an oxygen in this case. Therefore - SDN is the enabler for NFV, and the two concepts go hand in hand.

What Cisco did is they came up with an alternative that allows them to offer the NFV solution using al alternative (OpenSource) SDN solution instead of Cisco ACI. NFVi is a reference of the architecture which does not depend on the SDN Solution at all, and it´s primarily made for the Service Providers. NFVi would be the Infrastructure component of Ciscos NFV Platform. A key part of Cisco NFVi is Cisco Virtualized Infrastructure Manager (VIM).

If you ever played with OpenStack, you know that we are talking a platform that is pretty complex to deploy and operate. This is where VIM really shows it´s value. VIM takes care of the Installer and the Life Cycle management of the entire NFVi Storage, Network and Compute components, and it fully integrates:
  • OpenStack Platform (Red Hat distribution)
  • CEPH (for reliable storage) 
  • All this on Cisco UCS (Unified Computing System)





There are so many different SDN and NFV ecosystems out there that it gets overwhelming for the end users, which is kinda why I wrote this post. NFVI is an Open Network Architecture compatible with any SP End-to-End Service Creation. There are a few Cisco solutions to have in mind when thinking about the Service Provider:

  • WAE: WAN Automation Engine that complements the Cisco NSO (Network Services Orchestrator, enabled by tail-f) and Cisco´s distribution of OpenDayLight.
  • VTS (Virtual Topology System) is a true controller, designed to be Open, and it works with any other Vendors Networking equipment. VTS only requires BGP-EVPN in the underlay to be able to build VXLAN overlay.
  • Mercury is an Internal Cisco´s OpenStack platform specific for SP, based on RedHat OpenStack, made to do a successful, reliable and stable installation via GUI every time.

I could write an entire post about Cisco VTS (Virtual Topology System). It´s basically an SDN Controller for Service Provider Datacenter, a Hybrid (Physical and Virtual Overlay) Provisioning & Management System. In the context of NFVi the diagram bellow will tell you what you need to know.




Where exactly does NFVi fit in then? It´s quite simple actually. NFV Infrastructure is simply a tested and validated design that is, as Cisco claims, easily extensible and expandable. You could build a similar architecture yourself, or get a System Integrator to do it for you, but if you opt for NFVi - you get a Cisco label on a support contract. The following diagram shows the most common use cases of NFVi Platform.




Most Popular Posts