Welcome to Mat's Cloud: 2015

SDN Wars: Cisco ACI vs VMware NSX

In the last few years, with an exponential growth of interest in the SDDC (Software Defined Data Center), many vendors have shown an interest, and some have even managed to engineer a more-or-less decent SDN (Software Defined Networking) solution. Some are an experienced Networking Hardware vendors, while the others are Startups trying to to enter the big markets using this new tendency. Cisco ACI and VMware NSX are the top two SDN solutions according to Gartner, and according to various other entities (Network World, SDxCentral etc.).

If you have doubts regarding the concept of SDN, or a difference between SDN and Network Virtualization, check out my previous posts [Check out the Blog Map]:

Why do I consider myself to be the "right" person to analyse and compare these 2 SDN Solutions? Because I've worked a lot with both technologies, and I can be objective because:

I've worked a lot with both Cisco and VMware, and I'm a big fan of great number of both vendors solutions.
I'm a CCIE and a certified ACI Field Engineer, which kinda defines me as pro-Cisco (and pro-ACI).
I'm a VCIX-NV (VMware Network Virtualization Expert). If you don't know this certification, it's like a CCIE in NSX, so I'm also pro-NSX.

Before we get to the actual comparison, and the advantages of each of these solutions, make sure you at least know what they are and what are the basic concepts they follow. There are many Documents, Videos and Data sheets you could read, but I recommend you go through my previous posts and at least get a quick look at the components and how they interact.

Cisco ACI

VMware NSX

Now that we are experts in both of these solutions, it's time to compare what each one does, and how good it does it.

Micro Segmentation

The concepts of Distributed Firewall and Micro Segmentation have been around for a few years, and have been proven as a perfect weapon to introduce the NSX to a new client.

A word about micro segmentation: In the latest statistical analysis it's been determined that around 86% of the Data Center traffic is East-West, meaning the traffic that never "leaves" the Data Center. Having in mind that we normally position our Firewalls as close to the CORE of the Network as possible (or as close to the WAN/Internet, depending on the architecture), we leave the Data Center internal traffic unprotected.

How do we secure the Internal Data Center traffic then? We could use a Firewall on the north and maybe create contexts for every Client/Tenant within the Data Center. What happens if we have 10 Tenants? What happens to the performance of our network, if all the traffic flows have to travel Northbound to the Firewall and back?

Another solution would be adding a dedicated Data Center Firewall. This would improve the performance, but the Tenant concept would remain, we would need a separate Context for each tenant. Having in mind that in todays Data Center we are mostly talking about the Virtualized environment, the problem that remains is - when we need to allow the communication between two VMs, the traffic originated from the first VM still needs to go out of the Host in order to have the Routing/Firewall policy applied by a Physical or Virtual FW/LB/Router, and then back into the Host and to the other VM.

Micro Segmentation solves these problems by applying all the L4-7 policies directly on a Kernel-level. This means that all the policy are "provisioned" by the Control Plane, so that the Data Plane is completely optimized. Two VMs that should be able to communicate always use the most optimal data path. This saves a lot of needless inter-DC traffic, as illustrated on the diagram below:

Both ACI and NSX support the Micro Segmentation. NSX supports it on a Host (Hypervisor) level, and ACI supports it on a Hardware interface level of a Leaf, or on a Hypervisor level.

This is where we can make the first conclusion based on the possible scenarios. If both NSX and ACI support the Micro Segmentation, and if the clients only requirement is to protect the inter Data Center traffic - NSX is the way to go. ACI might be an overkill. This is because when you propose ACI - you are proposing the change of the entire DC Networking Hardware, and the client might have just bought Nexus 5k/7k last year.

Partner Integration

Both Cisco and VMware have a large list of partners they've been collaborating with for years. This makes it quite easy to get them onboard, even more so when it's about a technology so new and "cool" as is the SDN.

VMware NSX handles the integration on a Hypervisor level. This means that the Advanced Services are added in the following manner:

NSX Manager integrates with the other vendor product manager.
A VM is added to each Host within the NSX domain of operation to handle the Services locally.
In case of Palo Alto the Panorama (Manager of all Palo Alto Firewalls) communicates with the NSX Manager, and through the Cluster Controller adds a small Palo Alto Firewall on each ESXi host, as shown in the diagram below. I'm a big fan of NSX and PaloAlto integration, it's all just pretty well engineered.

VMware already has a big list of partners on its NSX ecosystem:

Cisco ACI handles the integration a bit differently. Have in mind that in ACI architecture, APIC controller is the "brain" of the operation, or the Management Plane, and the ACI Fabric is the Control Plane. APIC handles all the commands towards the ACI Fabric. This makes the integration a bit easier to "digest". The other vendor Advanced Services directly communicates with the APIC controller, and the APIC controller handles the "commands" that are later deployed to the objects (VMs or physical machines) via ACI Fabric.

And here is the current ACI ecosystem:

Multi Data Center

Both NSX and ACI support the Multi-DC architecture, and it was introduced in 2015 (yes, for both technologies). The concept sounds different, but it's quite similar actually.

Workload Slicing (Slicing Method) is used to spread the workload. The concept is to divide every Job into Slices, so when the new controller is added – slices need to be redistributed. These slices are also called "Shards", which represent the particles of the "sliced" workload. In the case of Multi-DC environment, each DC handles a certain number of these Shards, which enables a distribution of Workload between the Data Centers.

In both architectures are 3 "Machines" doing the function of the the SDN Controller, made for the redundancy and avoiding the split-brain of the management plane (NSX Controller/APIC Controller). Normally in the case of two Data Centers, the first two Machines are in the First DC and the third is in the Second Data Center. If we have 3 or more Data Centers - we can distribute the Controllers the way we like.

The other big advantage of Shard concept is the High Availability. All the Shards will exist on at least 2 SDN Controllers, and therefore no data is lost as long as only 1 or 2 SDN Controllers (APIC or NSX Controllers) die. Data loss begins only when we lose all 3 SDN Controllers.

Conclusions

Both of these technologies bring a huge amount of innovation and OpEx reduction, but they are different. This means that depending on the environment, one of them will be a better fit, so they are both "Winners" in their own way.

NSX is the winner because:

GUI is much better, more intuitive. The feeling is like we are making a Network out of Lego pieces.
Micro-Segmentation is easier to understand and implement.
When a client has VMware-only environment, NSX has a "hative" full integration many different components, such as vRealize.

Where does the NSX fail?

You still need a separate Network Admin(s) to take care of a physical network. NSX only takes part of the Networking within the Hosts.
When something "goes slow", and the problem seems to be a physical network, we're helpless, because the Network Admins will just see the VXLAN-encapsulated flows.

In what cases is NSX a better fit then ACI?

Primary objective is Automation, as vRA + NSX is an unbeatable combo.
When the only requirement is an Inter-DC Security and Distributed FW (Micro-Segmentation).
VMware only environments with relatively small and non-changing L2/L3 Fabric. In a case like this, ACI might be an overkill.

ACI is the winner because:

It actually replaces your network, improving performance and making the Troubleshooting faster and more efficient (Check out the Atomic Counters for TS and Flowlets as LB Optimization instead of ECMP within the fabric).
The concept of "Tenants" is perfectly implemented in the ACI architecture. Apps can be developed in the Lab tenant, and then just "moved" to the Production environment, and the performance won't change cause it's the same Infrastructure.
You can use it with any Hypervisor.
Cisco has well designed "Migration Paths" from the standard Data Center Architecture to ACI.

Where does the ACI fail?

The ACI architecture including it's N components is really complex.
The GUI is far from intuitive.
Cisco is failing to send the correct message about what ACI really is, so you need to ignore all the Application talk, learn ACI, and then see for yourself what it really is.

In what cases is ACI a better fit then NSX?

Companies with more aggressive Data Center requirements, and many changes within a Data Center PHYSICAL network.
Application Developer companies that need to be fast in Network Service provisioning.
Small Service Providers that need to be competitive by performing the changes faster.

Cisco ACI: AVS and Hypervisor Integration

At this point I will assume that you already read my previous posts about:
Cisco ACI Fundamentals.
Application Network Profiles, Contracts and Outside World connection.

If you did, then great. We may proceed with the "cool" stuff, such as ACI Virtual Switch and Hypervisor Integration.

AVS (Application Virtual Switch)

AVS (Application Virtual Switch) is the ACI version of Nexus 1000v or a Cisco alternative to a VMware vSphere VDS (Virtual Distributed Switch). If you are not familiar with these - these are virtual Switches, and they "live" on the Hypervisor, such as VMware ESXi (vSwitch), Hyper-V or KVM.

AVS also has VEM (Virtual Ethernet Modules) like the OVS (you may read about the OVS in my OVS introduction for Network Engineers), but instead of the VSM (Virtual Supervisor Module) it has the APIC Controller. It can be used instead of the VDS in the vSphere, or any other Compatible Hypervisor. It uses VLAN or VXLAN encapsulation, so - a pretty standard setup.

What is a key benefit of having a Virtual Switch, such as AVS, residing directly on a Hypervisor? Most of all, AVS is not just another Switch, it´s a Remote Leaf, which means - we can extend our ACI Architecture to the Hypervisor. What we get is the possibility of Micro-Segmentation, and a huge Security and Performance improvements.

AVS can operate in the following modes:

Non Switching mode (previously called FEX mode), which means that you extend the leaf into the Hypervisor. This way you take all the control of the Networking from the Hypervisor and to the Leaf.
Local Switching (LS) Mode, that enables the Inter-EPG switching within the Host. This is exactly what VDS is today.
Full Switching (FS) Mode, which is still not available (On the path, apparently), and it will provide a full L3 routing on the Hypervisor.

There are 2 sets of Multicast addresses, one for the AVS and also there will be a Multicast Address per EPG.

Compared to other hypervisor-based virtual switches, AVS provides cross-consistency in features, management, and control through Application Policy Infrastructure Controller (APIC), rather than through hypervisor-specific management stations. As a key component of the overall ACI framework, AVS allows for intelligent policy enforcement and optimal traffic steering for virtual applications.

Key features include:

A purpose-built, virtual network edge for ACI fabric architecture.
Integration with the ACI management and orchestration platform to automate virtual network.
provisioning and application services deployments.
High performance and throughput.
Integrated visibility of both physical and virtual workloads and network paths.
Open APIs to extend the software-based control and orchestration of the virtual network fabric.

There are 3 Standard AVS Topologies

AVS host directly connected to N9K leaf switch.
AVS host connected to N9K leaf switch via FEX.
AVS host connected to N9K leaf switch via UCS FI.

I recommend that you focus on the official Cisco documentation for the AVS details, as things are changing quite often. Cisco ACI and Cisco AVS

VMware integration

APIC Integrates with the VMware vCenter via the APIs, so from the APIC you can create the VDS, add the VMs to a Port Group (EPG, in ACI Terms).

VMM Domain is the Virtual Machine Manager, which basically means – the Hypervisors Manager. The number of VLANs is limited to 4k, so the VMMs expands this limitation and the same VLANs can be created in each VMM Domain. You can however configure the VMs from any of the VMM domains to belong to the same EPG.

Microsoft Integration

Windows Azure Pack is an Orchestration Layer that tells the System Center (Microsoft’s Hypervisor manager that has an APIC Plugin installed) and it tells APIC what to do.

OPFlex is running between vSwitch inside the Hyper-V and the Fabric. APIC Admin doesn’t do much here, he does the Fabric Discovery and the configuration of the vPCs and the Port Profiles, and the Azure takes care of the System Center.

OPFlex

OPFlex is an Open Protocol designed by Cisco as an alternative to OpenFlow. It uses a Declarative resolution, which means Push + Pull API support. It is used between the APIC and the Physical Leaf, and also between the Leaf and the “Virtual Leaf”, or the AVS (Yes, the AVS can be observer as the Virtual Leaf).

The Policy Repository needs to be the APIC for now, but an OpenDaylight controller might also be able to take this role in the future. The Policy Element is the Leaf Switch. The Endpoint Registry is a database of the TEP mappings throughout the infrastructure, and it’s stored on the Spines.

Cisco ACI: Application Network Profiles, Contracts and ACI Connection to the Outside Network

By know you should know the following facts about ACI:

Cisco Nexus 9k Switches make the ACI Fabric, which is the Control and the Data plane of ACI Architecture.
The main components of the ACI Architecture are Bridge Domain (BD), EPG (End Point Group) and the Private Network.
VXLAN is the encapsulation mechanism that enables ACI remote L2 connectivity.

If you have any doubts about any of the "facts" on the list, you should read my previous post about the ACI Fundamentals: Components.

N9k can run in one of the two Operational Modes:
- NX-OS Mode (by default)
- ACI Mode

There are 3 types of chips in the 9k devices. You should be very careful when buying these switches because depending on the N9k models you buy, you might get only one or two of the possible ASIC chipsets:

T2 ASIC by Broadcom is a default chipset as a Nexus in a standalone mode (NX-OS mode)
ALE – APIC Leaf Engine (ALE performs ACI leaf node functions when the Nexus 9500 switch is deployed as a leaf node in an ACI infrastructure).
ASE - APIC Spine Engine, when the 9k are deployed as a Spine Switches.

Nexus 9000 supports Python in Interactive and Scripting mode. Python was chosen by Cisco because of its robust selection of Libraries. Data is returned to NXOS as XML or JSON, not as the pure Commands. You can invoke Python from CLI using a simple commands:

# python bootflash: /script.py

You can also use the NS-OS NX-API, which is basically a tool that allows you to program in Python on NX-OS platform, and it supports all the Libraries you can use normally. It can be downloaded from the GitHub.

Leaf Switching Tables: LST Station Table and GST Station Table

These two tables are found on the Leafs, and they represent the:

LST (Leaf Switching Table): All hosts attached to the Leaf
GST (Global Switching Table): Local cache of fabric endpoints, or all the endpoints that are reachable via Fabric on the N9K in ACI mode.

These are used for all kinds of Bridge Domain ARP floods and the Unknown Host requests. They contain the MAC and the IP routing tables. The ARP would be the Multicast traffic within the ACI, and the ARP would be Broadcasted everywhere where the same BD is configured.

Sharing the Workload: The concept of SHARDS

Shard is a unit of Data Management, which reminds a lot of Slices concept in the NSX Architecture. Data is placed into shards; each shard has a Primary and 2 Replicas. If we, for example, want to use the ACI Multi-Data Center Architecture, this is where we will see the full added value of this concept. Every task will be sliced into Shards, and each part of the Fabric will handle the Shards assigned to it. This improves the performance drastically.

Application Network Profiles (ANP) and Contracts

ANP (Application Network Profile) is a very simple, but a VERY IMPORTANT concept within the ACI architecture. ANP is a combination of the EPGs and the Contracts that define the communication between them, where the Provider-Consumer relationship defines the connectivity in the application terms. EPG is a “child” of the Application Profile. Have in mind that each EPG can have one, and only one Contract that it provides.

An Application Profile is essentially a logical grouping of policies. For example, you may create an Application Profile for “Network Services,” this Application Profile could contain standard services such as DNS, LDAP, TACACS, etc. Here is an Example of the Application Profile Cisqueros_Basic, and two EPGs:

Contract is a Policy Definition, and in consists of a number of Subjects (Such as for example "Web Traffic", or similar) and it contains:

Filters (Example: UDP port 666).
Action (Permit, Deny, Redirect, Log, Copy, Mark…).
Label (additional optional identifier, for some additional capabilities).

Contracts can define the ONE-WAY (users consume Web services for example) or TWO-WAY communication (Web and App consume and provide the services between each other).

One thing that us, Network Engineers, fail to understand is the actual difference between the Provider and the Consumer within a Contract. I think that the example that "paints the picture" in the most logical way is an Application Server accessing the Data Base Server. The DB Server provides it's data to another entity, and is therefore the Provider. We can define a Provider Contract on the DB Server which allows a group of Application Servers to access it's Data Bases. The Application Server is, on the other hand, the Consumer of the DB, and it therefore requires a Consumer Contract towards the DB Server(s).

Taboo is a group of Filters that DENIES the communication you need to be denied, and it´s applied before any other filter, regardless if the ANP is Enforced (default) or not.

Below is the Diagram of how an Application Network Profile could be implemented on the level of the entire Application Architecture:

Service Contract and Service Graph

Service Contract and Service Graph represent the way Cisco ACI Integrates the L4-L7 Services into the Contracts between the Client (Consumer) EPG and the Server (Provider) EPG. The basic way it works is shown using the Policy-Based Redirection, and it´s shown on the diagram below.

The Cisco APIC offers a graphical drag and drop GUI to easily create L4-L7 Service Graphs that specify network traffic routing; any of the L4-L7 ADC features available in the NetScaler device package can be included in a Service Graph definition, allowing comprehensive NetScaler integration with the Cisco APIC.

Once created, a Service Graph can be assigned to an Application Profile and contracted to a data center tenant, thereby defining the network traffic flow for that specific application and tenant.

L3 Routing in ACI

L3 Routing to the External Networks can be done using big or OSPF for now, but soon eBGP and EIGRP will be added (Dec2015). The routes learned from peering routers will be marked as “outside”. The decisions are made having in mind that Single Data plane is used with the Two Control Planes:

Inside Networks are used for Tenants and their Bridge Domains (BDs).
Outside Networks are associated with the Tenants on which the Peering with the External Routers is configured.

Route Redistribution is done on the Leaves. Right now there is no MP-BGP configured by default, but it can be configured (Fabric -> Fabric Policies). If you configure it within the entire Fabric, the Spines will take the role of the BGP Route Reflectors, reflecting all the BGP prefixes to leafs.

If there is OSPF between the Leaf and the External Route, then the Leaf is redistributing BGP into OSPF, and OSPF to BGP. Have in mind that the OSPF has to be configured as the NSSA.

For Routing Enthusiasts such as myself: Check out the following article regarding Connecting Application Centric Infrastructure (ACI) to Outside Layer 2 and 3 Networks.

Cisco ACI Fundamentals: ACI Components

Before we get deeper into the ACI (Application Centric Infrastructure) as the Cisco's official SDN solution, we need to clarify a few terms that will be used:

SDN is a concept that introduces the Networks that are configured and defined using the Software. You can read more about the SDN and Network Virtualization in one of my previous posts.
APIC (Application Policy Infrastructure Controller) is the SDN controller that Cisco ACI architecture uses as the Management Plane.
Spine and Leaf is also known as the ACI Fabric. This architecture was explained in my VMware NSX introduction here. In the ACI world Spine and Leaf are the Cisco Nexus 9000 Series Switches (N9k) in the ACI mode, and they are the Control and the Data plane of the ACI.
VXLAN (Virtual eXtensible LAN) is the encapsulation technology on which all the SDN solutions are based, because it permits users on different subnets, even on remote routed networks, to see each other as if they were on the same L2 Segment. Read more about how VXLAN "works" in my previous post here.
TEP (VTEP) or VET is the Virtual Tunnel End Point. An VTEP is like an SVI on the Switch, because those are the Software defined interfaces. When you deploy the APIC (ACIs SDN Controller), it is used to define the DHCP pool, and when this information is passed on to the Spine Switches - the Spines are used as the DHCP server for all the VTEPs. By default the assigned range of the addresses is 10.0.0.0/16 (this range can be changed). These addresses are INTERNAL to the system.

APIC Controller simply works as the Management of the entire Fabric, and the Fabric itself is doing the entire job, because the Fabric is the Control plane. If you unplug the APIC –your network keeps working exactly the same, because the policies are already deployed, and the fabric itself is handling the control and the data plane.

Dealing with ARP is one of the biggest advantages that the new ACI Fabric provides, and all the other SDN solutions on the market simply ignore this part. Every time there is a change in the VM-s, a Gratuitous ARP (GARP) is sent to the LEAF, and forwarded to SPINE using the COOP protocol. Then the COOP table of the SPINE is being Updated, and propagated to the other SPINEs. You have the option to set the ARP flooding, and it this case the ACI fabric tarts ARP like the traditional Network. There is no interconnection between SPINEs and no interconnection between LEAFs.

How ACI applies the VXLAN: In ACI VXLAN is applied in a different way then, for example, in the VMware NSX, or any other VXLAN-enabled solution, because it has the TrustSEC header "built" into the VXLAN native implementation. It's applied as the Overlay, exactly like the MPLS Label, with all the PUSH, POP and SWAP operations, where the VTEPs represent the Endpoints where all the VXLAN bytes are added. Why did they use VXLAN and not MPLS Labels? All the Virtual Switches use and understand the VXLAN, and not MPLS. It's also similar to LISP, but there is a difference in some of the control bits.

IMPORTANT: In cases of FCoE, OTV or the DTI is needed – Nexus 9k does not apply to the customers needs, so we need to use N5k or N7k series and we need to exclude it from the ACI environment. FCoE and OTV are on the ACI Roadmap, so - lets see what happens.

Why do we normally use a Merchant Silicon: Basically it's used cause it's built to support the standard based capabilities. The Fabric in the ACI must be a 40G Fabric. There are Cisco custom ASIC specific to the Nexus 9K Switches only:

ALE – Application Leaf Engine
ASE – Application Spine Engine

There is another major difference between the "packet forwarding" on Leaf and Spine level:

Spines: FIB is the Major Forwarding Table, and the Nexus 9336 is the only switch that can assume this role (Jan2015).
Leafs: CAM is the Major Forwarding Table.

Restful API Calls

Restful API: If you don’t have something you would like to have in the Web GUI of the ACI – you can program it and integrate is using the API. You may pick up the tool you want for this, but my favourite is the POSTMAN, which can be found the Chrome App Store. Using the POSTMAN you can GET (capture) the API commands that are used to do some operation, and modify and repeat it using the POST command. Each GUI action can be captured as the API code and then be modified and executed. APIC supports:

Northbound APIs (REST API and Python SDK called “Cobra”).
Southbound APIs (OPFlex and Dev. package).

Every time the GUI contacts the ACI Infrastructure, it uses the REST API. CLI, GUI and SDK all actually use the REST API, and the payloads are XML or JSON. On the diagram below there is an Object Browser called Visor (view, in Italian), but at this moment (Apr2015) it's only available to Cisco employees.

There is a very Programmable Object Model of ACI and its based on MIT (not the University, the Management Information Tree) and used as a Data Base, where every branch represents a functional area, and every node is a managed object (has a CLASS and a globally unique Distinguished Name formed by a parent name and the relative name, such as - polUniverse is where all the policies are, and the compUni is the Computer Universe).

There are the 3 basic components of ACI:

ACI Fabric, which are basically the N9k Leaf and Spine Switches.
APIC, which is the API controller, the cluster that runs the system.
Logical Model designed in accordance with the concept of ANP – Application Network Profiles (a concept that will be explained later in the post).

All of this is based on Linux, and we can get anywhere within this Architecture using the CLI.

The main idea is that we need to create the Logical Model to support the application. Two hosts should talk if we want them to, and should not talk if we don’t want them to, regardless if they’re on the different Subnet, different IP Network, different VRF (Private Network in the ACI architecture) or the same one.

ECMP (Equal Cost Multi-path Routing) is the routing protocol on the Access Layer that adds the Routing, load balancing and eliminates the STP. The concept is related to Spine-Leaf Architecture (also called Clos architecture).

End Point Group (EPG)

EPG is a group of objects that use the same policy (something like the VMware vSwitch Port Group, and it can be what you want it to be, some of the examples are Security, QoS, or L4-L7 services). This is also a Security Zone, and we use the EPG as the Policy Instatiation point. We actually group the End Points (Servers for example, or any other Grouping characteristics) that need to have the same security policy applied. AEP (Attachable Entity Profile) links the Physical Interfaces to the EPGs. AEP provisions a VLAN pool on a Leaf Switch, and the EPGs enable VLANs on the port.

Domains, VMM or Physical, are tied to pools of VLANs, and then to Attachable Entity Profiles (AEPs). By combining a pool of VLANs with a Domain, you are fundamentally defining which VLANs are allowed to be used by the Fabric for each Domain. The AEP is the final piece, and for now we will simply say that it is the glue that takes all of theses “back end” Fabric configurations and attaches them to an interface.

Like in the traditional L2 Network the VLAN is isolated from everything else, and it represents a Security Group. By that analogy, the EPG represents the Security Group within the ACI.

TIP: It is really important to understand that the VLANs in the ACI architecture have only the local significance on the LEAF. One EPG can use the same logical VLAN, but it can be mapped to 2 different VLANs on two different LEAFs.

TIP: vCenter (or any other Hypervisor) communicates with the APIC via the API-s, and that is how the APIC has all the Virtual Machines information, such as their Virtual Ports (Port Groups on vSwitch).

TIP: Any time you create an EPG, you HAVE to associate it to a BD, and vice versa.

Extension of the EPG is done manually. We need to assign the VLAN to an interface, and map it to the EPG. Within the APIC you would need to statically assign the EPG to the interface. This is used when you have the VMs that are directly connected to the Leaf, and you don´t really have a way to place them into a certain EPG. You basically define the Interface they are attached to as the EPG you want to place them in.

Extension of the BD is a bit more complicated, because we need to create a VLAN for each Subnet that the Bridge Domain contains. That is solved by creating a Bridge Outside for each Subnet, which will then be sent within a Trunk port towards the next switch down the path (Client switch connected to the Leaf).

The APIC Policy can be applied to an End Point via the following 3 steps:

End Point attaches to fabric.
APIC detects End Point and derives a source EPG.
APIC pushes the required policy to Leaf switch.

ACI Contract

ACI Contract is a policy between the EPGs, and it basically defines the relations/actions between EPGs. These Contracts are the Policies that are sent to APIC, which then distributes it into the Data Path, or the Fabric.

Tenants

Tenant is a Logical separator for Customer or a Group. Pre-Configured Tenants are:

Common – Policies that can be accessed by all tenants.
Infra – Vain overlay infrastructure configuration (Private L3, Bridge Domain).
Mgmt – Inband and OOB configuration of fabric nodes.

Private Network (Context)

Private Network is basically a VRF. This is what used to be called a Context before the official ACI Launch, June of 2014. So a Private Network is a Child of a Tenant, and the default policy is Enforce (meaning that if there are 2 TEP-s in the same Network, they cannot talk to each other).

The built in multi-tenant support in ACI means that a separate Tenant may be a better solution than simply using a VRF, but it depends on your needs. Less intuitively – the Private Network is where timer configuration for routing protocols are configured – this is important for the configuration of routed connections within the VRF to routers outside of the fabric (L3 Outs).

Private Network can contain multiple Bridge-Domains for separation of layer-2 domains within a VRF.

Bridge Domain (BD)

One of the hardest concepts to explain to the Network engineers, because it reminds to a concept of VLAN, but it can contain various VLANs (Subnets). In ACI VLANs don´t exist inside the Network, only on the Ports. Bridge Domain is basically a Subnet Container. When you define the unicast IP address/L3 Network and it´s automatically in the routing table. The MAC address must be unique inside the Bridge Domain the same way as the IP address needs to be unique within the VRF. A BD belongs to ONLY ONE Tenant, but a Tenant can, of course, have various BDs. A BD may have more than one subnet.

In the following Screenshot you can see the parameters you need to define when defining a new BD. Private Network is hierarchically above the Bridge Domain, so you should associate your BD with Private Network, as it ties the L2 domain to the parent L3 domain essentially.

L2 Unknown Unicast is either flooded within the BD or sent to Spine Proxy (default), while the Unknown Multicast is set to flood.

The logical model of all the elements is shown below. A TENANT has “child” Private Networks, and each Private Network can have one or more “child” Bridge Domains, each related to one or more EPG-s.

IMPORTANT: One of the star features of the ACI is the Proxy Database or Endpoint Repository, which helps you see all the details about any IP/MAC that is on the network. You can see when it was connected, where exactly, and where it´s been before. It´s a big Historical Database that helps you see, for example, what happened to any machine in each point of time.

Can OpenStack Neutron really control the Physical Network?

This is a question I´ve been hearing a lot when we present the OpenStack to a new client, mostly from the guys who control the Networking infrastructure. So, can the OpenStack Neutron module really control and configure the Physical Network? The answer might disappoint you. It depends! One thing is for sure - there is no better way to make a group people put on the Poker Faces, then to try to explain how OpenStack Neutron works to a Networking Operations team.

There are 3 of us doing the technical part of the OpenStack presentation:

OpenStack Architect. Typically this will be a young fella, enthusiastic about stuff, and the impression that he gives away is that he is completely ignoring how Data Center is traditionally defined, and his answer to almost all of the questions is - "OpenStack will control that too!"
Virtualization Engineer. Seen as openminded by the traditional Mainframe experts, and completely ignored by the OpenStack guy.
Network Engineer (me, in our case). Seen as a dinosaur and the progress-stopper by both, the OpenStack and the Virtualization guy.

This is what happens to us, 100% of the cases: We start doing the presentation. The entire Networking department has their laptops opened, and they pay 0 attention to us. The Architects and the Openminded bosses are the ones who want to understand it, and they will have many questions during the entire presentation. Once we´re almost done, one of the networkers "wake up". This is where we enter the storm of crossed-looks, weird facial expressions, and we "get the chance to" repeat around half of the presentations again, for the Networking guys. It all ends with a single question on everyones mind - Can OpenStack Neutron really control the Physical Network?

To answer this, lets start with the ML2. ML2 is the OpenStack Plugin designed to enable Neutron to, among other things, controthe Physical Switches. This can be done using the manual API calls to Switches, or a Vendor-designed ML2-compatible Driver for the particular Switch Model.
Before getting deep into the ML2, here are the popular plug-ins:

Open vSwitch
Cisco UCS/Nexus
Linux Bridge
Arista
Ryu OpenFlow Controller
NEC OpenFlow

ML2 Plugin vs MechanismDriver: ML2 Plugin Works with the existing Agents. There is an ML2 Agent for each L2 Agent of the Linux bridge, Open vSwitch and Hyper-V. In the future all these Agents should be replaced by a single Modular Agent.

Cisco Nexus driver for OpenStack Neutron allows customers to easily build their infrastructure-as-a-service (IaaS) networks. It is capable of configuring Layer 2 tenant networks on the physical Nexus switches using either VLAN or VXLAN networks.

Note: This driver supports the VLAN network type for Cisco Nexus models 3000 – 9000 and the VXLAN overlay network type for the Cisco Nexus 3100 and 9000 switches only.

Cisco Nexus Mechanism Driver (MD) for the Modular Layer 2 (ML2) plugin can configure VLANs on Cisco Nexus switches through OpenStack Neutron. The Cisco Nexus MD provides a driver interface to communicate with Cisco Nexus switches. The driver uses the standard Network Configuration Protocol (Netconf) interface to send configuration requests to program the switches.

The Cisco mechanism plugin also supports multi-homes hosts in a vPC setup, as long as the interconnection requirements are fulfilled. The data interfaces on the host must be bonded. This bonded interface must be attached to the external bridge.

There are the APIs for each of these Modules so that the Tenant can “talk” to them. Cisco Switches are, for example, connected to the Neutron Module via the Plug-in to enable the OpenStack to communicate with them and configure them. There is a DRIVER for the Neutron for the Nexus Switches (ML2 Drivers), and the Switches can be configured from the OpenStack thanks to this driver. This way the resources of the Nova are saved, because we are offloading the Routing on to the Switch.

Using all these Drivers and Plug-ins the OpenStack Neutron can manage the connectivity and configure the Networking within the physical infrastructure. It can add/remove/extend the VLANs, manage the 802.1q Trunk ports and Port-Channels. The question is - What happens in a bigger network, can this "scale"? And, the answer is - NO! Not yet, at least. Yes, you can provision all the VLANs you want, even extend them if you have just a few Switches if there is no need to use some of the "advanced" control or security protocols. So, what happens with the Spanning-Tree, what controls the Routing? What if you have a few OSPF areas in the "middle", and you need to advertise a newly configured network? What happens to the Load Balancing between the VLANs, or is 1 Switch always the Root Bridge of all of the VLANs created by the OpenStack?

Conclusion:
There is a way for OpenStack to provision all the Networking it needs, but in order to make it "scalable"(meaning - not a PoC in a greenfield only) - we need a Controlled Fabric. It can be an SDN, such as Cisco ACI or a VMware NSX (almost), or it can be a clients Networking Team that just assigns a group of VLANs for the OpenStack to use. This might change in the future, but for now - always consider OpenStack + SDN.

Open Virtual Switch (OVS) Deep Dive: How L2 Agent "wires" a new VM

The basics of the OVS (Open Virtual Switch) and OpenStack Neutron module were described in my previous post. Time to get a bit deeper into the OVS.

A Virtual Machine (VM), a part from the CPU and Memory, needs the Connectivity. L2 Agent (OVS in this case, or an External L2 agent) is used to connect the VM to the physical port. OVS resides on the Hypervisor of each OpenStack Node.

To understand how exactly the L2 Agent Works, and how it provides the VM connectivity to the “outside world”, we first need to get a bit “deeeper” into the Linux-y nature of the OVS, and understand all the Bridge Types, what they are used for and how they interconnect. This might look a bit complicated in the beginning, specially if you come from traditional Networking background.

These are the OVS Bridge Types:

br-int (Integration Bridge): All the VMs use the VIF (Virtual Interfaces) to connect to the Integration Bridge.
br-eth (Ethernet Bridge): OVS Ethernet Bridge is the entity that allows us to decide if we want to use the default VLAN, or tag the VLAN before the traffic goes out.
br-tun: Tunel Interface, where the Headers are added depending on the Tunnel type (in case there is a tunnel, such as GRE).
br-ex (External Bridge): Used for the interconnection with the External network, or from the OpenStack point of view – to the Public Network. Have in mind that this Public Network doesn't necessarily mean that it´s an actual Public Network (with a public IP address), its just outside of the OpenStack environment.
veth (virtual Ethernet): Used to interconnect the “bridges” (OVS-to-OVS bridge or Linux-to-OVS bridge).

On the diagram below you may get a better representation of what is actually happening when 2 VMs on different Hypervisors communicate. Notice that these are in the same subnet (10.5.5.0/24), and focus on all the elements of their interconnection.

All this sounds simple enough, but how does it go over the Physical network? We are missing 2 parts of the Puzzle:

A way to do an Overlay Tunneling over our Physical Network (overcome the limitation of L2 connectivity by enabling the transport of the L2 packets over the L3 Network), and the “candidate” protocols are VXLAN, GRE and STT.
Let OpenStack Neutron configure the Physical Network.

OpenStack Neutron and OVS (Open Virtual Switch) translated to the Network Engineers language

Introduction to Open Virtual Switch (OVS)

IaaS (Infrastructure as a Service) is provided by a group of different, interconnected Services. OpenStack is an Operating System that makes the IaaS possible, by controlling the “pools” of Compute, Storage and Networking within a Data Center using the Dashboard (later we´ll discuss some more about what Dashboard really is).

NaaS (Network as a Service) is a part we will mainly focus on. in this post NaaS is what OpenStack brings to Networking. The NaaS is in charge of configuring all the Network Elements (L2, L3 and Network Security) using the APIs (Application Programmable Interfaces). Users use the NaaS as the interface that allows them to add/configure/delete all the Network Elements, such as Routers, Load Balancers and Firewalls.

Neutron is an OpenStack module in charge of Networking. Neutron works using its Plug-ins. A Neutron Plug-in is used for different external mechanism, such as:

Open vSwitch (OVS), or external L2 Agents.
SDN Controllers, such VMware NSX, Cisco ACI, Alcatel Nuage etc.

All of you who know networking will stop here, make a "Poker face" and say - "Wait... what?". This brings us to the KEY point of this post: The OpenStacks Weak Point is the Network. Without an additional SDN controller, it isn’t capable of controlling a physical network infrastructure. Well, at least not the one that exceeds a few Switches.

Neutron consists of the following Logical Elements:

neutron-server: Accepts the API calls, and forwards them to the corresponding Neutron Plugins. Neutron-server is a Python daemon that exposes the OpenStack Networking API and passes tenant requests to a suite of plug-ins for additional processing. Neutron brings Networking as a Service (NaaS). This means that the user gets the interface to configure the network and provision security (Add/Remove Routers, FWs, LBs etc.), without worrying about the technology underneath.
DataBase: Stores the current state of different Plugins.
Message Queue: The place where the calls between the neutron-server and the agentes is queued.
Plugins and Agents: In charge of executing different tasks, like plug/unplug the ports, manage the IP addressing etc.
L2 Agent (OVS or the External Agent): Manages the Layer 2 Connectivity of every Node (Network and Compute). It resides on the Hypervisor. L2 agent communicates all the connectivity changes to the neutron-server.
DHCP Agent: DHCP Service for the Tenant Networks.
L3 Agent (neutron-l3-agent): L3/NAT Connectivity (Floating IP). It resides on the Network Node, and uses the IP Namespaces.
Advanced Services: Services, such as FW, LB etc.

Physically, Neutron is deployed as 3 isolated systems. In a Basic Deployment, we will have the Neutron Server connected to a Database. The integration to the L2, L3 and the DHCP agent, as well as to the Advanced Services (FWs, LBs) will be done through a Message Queue.

Controller Node, that runs the Neutron API Server, so that the Horizon and CLI API Calls all “land” here. Controller is the principal Neutron server.
Compute Node is where all the VMs run. These VMs need the L2 connectivity, and therefore each Compute Node needs to run the Layer 2 Agent.
Network Node runs all the Network Services Agents.

*There is a L2 Agent on every Compute and Network Node.

Layer 2 Agents run on the Hypervisor (there is an L2 Agent on each host), and monitors when devices are Added/Removed, and communicates it to the Neutron Server. When a new VM is created, the following happens:

Layer 3 Agents don’t run on a Hypervisor like the L2 agent, but on the separate Network Node, and they use the IP Namespaces. The L3 agent provides the isolated copy of a Network Stack. You can re-use the IP Addresses where the Tenant is what marks isolation between them. L3 Agent works on a Network Node, and for now it only supports the Static Routes.

Namespace: In Linux the Network Name Spaces are used for routing, and the L3 Agent relies on the L2 Agent to populate the “cache”. A Namespace allows the isolation of a group of resources, on a kernel level. This allows the Multi-Tenant environment, from the Network point of view. Only the instances within the same Network Namespace can communicate with each other, even if the instances are spread across OpenStack compute nodes.

Here is one of the examples of how to deploy the OpenStack (Juno release) Networking. In this example there is a separate API Node, but in the most deployments you will actually have this node integrated with the Controller node:

[Integrate NSX with PaloAlto] Solve OVF Import Certificate problem using the OVFTool

In my next post I'll be focusing on the NSX and Palo Alto integration, and all the improvements this brings to the Micro Segmentation. For now, lets just focus on importing the Palo Alto Virtual FW VM (NSX Version) to the existing vSphere environment.

VMware Environment Details:

ESXi 6.0 on a Physical Host + 5 Nested ESXi 6 (deployed in my Demo Center, as explained here)

vSphere 6.0 Managing Compute and Management Clusters

NSX Vestion 6.2

Palo Alto 7.0.1, Model PAN-PA-VM-1000-HV-E60 (Features: Threat Prevention, BrightCloud, URL Filtering, PAN-DB URL Filtering, GlobalProtect Gateway, GlobalProtect Portal, PA-VM, Premium Support, WildFire License).

IMPORTANT: You will need to be a Palo Alto partner, as their permission is required in order to download their products.

What is OVFTool, and why did I need it?

OVFTool is a Multi-use VMware tool for various OVA/OVF files operations using the Command Line. I found it really handy in this occasion, while trying to deploy the Palo Alto NSX Version of Virtual FW into the existing vSphere 6 environment with NSX 6.2 deployed. The issue was that there was no way to deploy the .OVF due to the certificate error, presented below. The original 3 files in the PA7.0.1 folder are the .MF, .OVF and the .VMDK file, all with the same name (PA-VM-NSX-7.0.1.*).

I tried talking to Palo Alto support, and they proposed signing an .OVF manually, due to a possible corruption of a .MF file. Basically, sometimes when you try to deploy a OVA/OVF, the Manifest File (.mf) will be missing, or corrupt. In this case you will need to sign the file "manually". Before you're able to sign the .OVF VM, you will need two files: file.PEM and file.MF.

Before you start, you will need to download the OVFTool. To do this, you will need a valid VMware username/password.

Before you start "playing around", I strongly suggest you to read a bit about it, and the operations you can perform in the Official VMware OVF Tool User’s Guide

Create a PEM file

To sign a package, a public/private key pair and certificate that wraps the public key is required. The private key and the certificate, which includes the public key, is stored in a .pem file.

The following OpenSSL command creates a .pem file:

> openssl req -x509 -nodes -sha1 -days 365 -newkey rsa:1024 -keyout x509_for_PA.pem -out x509_for_PA.pem

You will need to specify the standard x509 certificate details while doing this. Check if the .PEM file has been successfully created:

MJ-MacPro:VMware OVF Tool iCloud-MJ$ ls | grep pem

x509_for_PA.pem

MJ-MacPro:VMware OVF Tool iCloud-MJ$ openssl x509 -text -noout -in x509_for_PA.pem

Certificate:

Data:

Version: 3 (0x2)

Serial Number:

f6:a0:f3:72:e5:5f:0b:bf

Signature Algorithm: sha1WithRSAEncryption

Issuer: C=es, ST=Madrid, L=Madrid, O=Logicalis, CN=Logicalis/emailAddress=mateja.jovanovic@es.logicalis.com

Validity

Not Before: Oct 20 09:38:14 2015 GMT

Not After : Oct 19 09:38:14 2016 GMT

Subject: C=es, ST=Madrid, L=Madrid, O=Logicalis, CN=Logicalis/emailAddress=mateja.jovanovic@es.logicalis.com

Subject Public Key Info:

Public Key Algorithm: rsaEncryption

RSA Public Key: (1024 bit)

Modulus (1024 bit):

00:c4:38:e0:75:5f:34:73:44:e7:fe:9b:35:e5:4b:

11:ab:d9:41:e9:e2:d4:cd:fa:f3:d9:e4:04:3b:72:

d2:33:a1:b6:f7:99:8d:c2:00:04:07:13:0b:14:d5:

3e:cb:ea:7d:b7:3b:5d:d4:82:1d:da:78:09:52:cd:

be:7e:cf:01:a0:0e:db:ef:c7:01:74:9e:88:2d:7c:

3a:7f:db:3f:a7:f5:7d:38:41:36:ff:55:46:16:d2:

76:3d:3a:2d:8d:a7:d4:03:25:d0:31:03:8d:d8:57:

d3:5b:6a:e2:db:2f:c6:19:8c:36:bf:b0:e6:c0:f5:

8b:c6:67:59:39:ec:83:b9:bb

Exponent: 65537 (0x10001)

X509v3 extensions:

X509v3 Subject Key Identifier:

71:FD:B9:D9:67:46:0B:2D:47:1D:A9:CF:02:9A:B8:E0:80:87:8A:B9

X509v3 Authority Key Identifier:

keyid:71:FD:B9:D9:67:46:0B:2D:47:1D:A9:CF:02:9A:B8:E0:80:87:8A:B9

DirName:/C=es/ST=Madrid/L=Madrid/O=Logicalis/CN=Logicalis/emailAddress=mateja.jovanovic@es.logicalis.com

serial:F6:A0:F3:72:E5:5F:0B:BF

X509v3 Basic Constraints:

CA:TRUE

Signature Algorithm: sha1WithRSAEncryption

27:14:fc:7d:b5:9f:63:1d:08:84:1e:13:b4:9d:85:58:a5:77:

8a:fa:a9:34:76:4e:a4:91:7e:98:0f:a8:54:2d:a5:1d:cf:5d:

b7:8c:7c:42:a6:18:da:b4:38:a8:4f:8a:df:c6:c3:92:a5:22:

e1:40:90:5f:04:97:b4:c2:79:97:5e:1a:74:c1:6f:b6:a4:0f:

cd:b2:7e:f3:cb:79:5b:ac:71:bb:56:00:8d:7f:58:89:4a:f3:

f3:b9:dc:a4:5b:ce:09:ad:4b:2e:a4:81:9e:c8:a7:81:11:ec:

b7:21:8d:58:9e:b2:03:f2:de:fb:84:7e:ac:f7:2e:d3:f6:25:

9a:53

Create a Manifest (.MF) file

To create the manifest file, run the following command for all files to be signed:

openssl sha1 *.vmdk *.ovf > Final-Signed-VM.mf

Once you´ve created the .MF and .PEM, you can proceed to signing the OVF file using the OVFtool. I had the files in C:/PA7 Folder, but to avoid copy-pasting the entire path, I simply copied them to the folder where OVFTool.exe is (C:\Program Files\VMware\VMware OVF Tool> in Windows environment, /Applications/VMware OVF Tool in Macbook)

You may continue the procedure in Linux/Mac. OVFTool commands are exactly the same. I switched to Windows environment due to a Fusion Library errors (details at the end of this post).

Sign the OVF using the OVFTool

The final step is to execute the OVFTool command in order to create the new, signed OVF:

ovftool --privateKey="x509_for_PA.pem" PA-VM-NSX-7.0.1.ovf Final-Signed-VM.ovf

TIP: Beware of the CAPITAL/non-capital letters errors in your command:

C:\Program Files\VMware\VMware OVF Tool>ovftool --privatekey="x509_for_PA.pem" PA-VM-NSX-7.0.1.ovf Final-Signed-VM.ovf

Error: Unknown option: 'privatekey'

Completed with errors

C:\Program Files\VMware\VMware OVF Tool>

C:\Program Files\VMware\VMware OVF Tool>ovftool --privateKey="x509_for_PA.pem" PA-VM-NSX-7.0.1.ovf Final-Signed-VM.ovf

Opening OVF source: PA-VM-NSX-7.0.1.ovf

The manifest does not validate

Error: Invalid manifest file (line: 1)

Completed with errors

C:\Program Files\VMware\VMware OVF Tool>ovftool --privateKey="x509_for_PA.pem" PA-VM-NSX-7.0.1.ovf Final-Signed-VM.ovf

Opening OVF source: PA-VM-NSX-7.0.1.ovf

The manifest validates

Opening OVF target: Final-Signed-VM.ovf

Writing OVF package: Final-Signed-VM.ovf

Transfer Completed

OPENSSL_Uplink(000007FEEDE66000,08): no OPENSSL_Applink

C:\Program Files\VMware\VMware OVF Tool>

Now we copy the files BACK to the original folder (C:/PA7). The content is displayed below.

C:\PA7>dir

El volumen de la unidad C no tiene etiqueta.

El número de serie del volumen es: B416-28D0

Directorio de C:\PA7

20/10/2015 12:13 <DIR> .

20/10/2015 12:13 <DIR> ..

20/10/2015 12:11 1.552.252.928 Final-Signed-VM-disk1.vmdk

20/10/2015 12:11 0 Final-Signed-VM.cert.tmp

20/10/2015 12:11 121 Final-Signed-VM.mf

20/10/2015 12:11 10.256 Final-Signed-VM.ovf

4 archivos 1.552.263.305 bytes

2 dirs 6.033.895.424 bytes libres

You will now be able to deploy the .OVA to your vSphere.

Note: As you probably noticed, I created the .PEM and .MF in my MacBook, and then passed the files to a Windows VM because of a few Fusion Library errors I´ve been getting.

Error Details (if someone is interested):

VMware Fusion unrecoverable error: (vthread-4), SSLLoadSharedLibraries: Failed to load OpenSSL libraries. libdir is /Applications/VMware OVF Tool/lib A log file is available in "/var/root/Library/Logs/VMware/vmware-ovftool-16747.log".