Which SDN solution is the right one for me? NSX vs ACI vs Nuage vs Contrail

This is a question I've been getting A LOT in the last few years, and even though it sounds rather simple, somehow it gets really complex to convince all the parties (Developers, Systems/Virtualization and Network engineers and the CEO/CTO) why the solution you're proposing is a perfect fit. There are 2 simple explanations for this:

  • A so-called "language barrier" between the different departments.
  • SDN vendors being way too aggressive pushing their solution in the environments where it doesn't fit [understandable when you consider how much money they've invested in SDN, and with how much fear and hesitation the new clients are considering the migration of their production network to SDN].

What I want to try to do in this post is help you get a more objetive a non vendor-bias picture of the SDN solutions out there, and the environments each of them should be considered for.
*If you're not sure you understand the difference between Underlay and Overlay please refer to my previous posts.


There are 2 types of SDN solutions at the moment:
  1. SDN as an Overlay (VMware NSX, Nokia Nuage and OpenContrail)
  2. Underlay and Overlay controlled via APIs (Cisco ACI and OpenDayLight)

SDN as an Overlay solutions tend to be much easier to understand and more graphical and user friendly solutions. This can be explained by the fact that they only handle the Overlat of the network, completely ignoring the physical network underneath, considering it a "commodity". Even though NSX and Nuage are both great solutions and there are environments where these would be definitely the SDN solution that I would recommend, there is a pretty serious conceptual problem with this approach, especially if your network isn't 100% virtual and if your physical topology has more then a few switches.

Systems and Virtualization engineers tend to love this kind of solutions, due to 2 factors:
  • They don't have a deep level understanding of Networking protocols.
  • They kinda get the impression that they will handle both, Compute and Networking environment in the Data Center, pushing out the Networking department [kinda true, if you ignore the fact that you actually end up with 2 departments handling your network, Systems guys taking care of the Overlay and Networking guys taking care of the physical infrastructure].


Network engineers tend to not like this kind of solutions, due to 2 factors:
  • They lose the visibility of what's going on in their Network.
  • They know that when the things don't work, or when there is a performance issue, the CEO will knock on their door, and they will have no idea what to do or where to look.

Why SDN as an Overlay is not as great as they explained in that Power Point?

Let me try to explain why SDN as an Overlay should not be considered for the environments with a Physical Network Topology with more then just a few switches. Bare with me here, because the explanation might seem a bit complex at first.

The concept of Virtualization is based on optimising the physical resources in order to get better performance using the same physical resources. This concept should apply to Server Virtualization and the Network Virtualization. Now imagine the Software that handles Server Virtualization "as an Overlay", taking the Physical servers as "commodity". For example, let's imagine that the 10 physical Servers on a picture below have 16GB of RAM, 4 Cores and 512GB of SSD each. Now let's say that we need to provision 100 VMs, each with 8GB of RAM and 2 Cores. Our Virtualization Software, having no visibility or control of the Physical Servers, will just randomly provision these machines in the physical infrastructure. In this way some of our physical servers will contain 20+ VMs and therefore start having performance issues due to the insane oversubscription, while the others will work with less then 20% capacity with just a few VMs.



While this seems to be pretty easy to understand, most of the Systems departments have trouble understanding that the exactly same thing happens to our Network when we assume that our SDN should be treated as an Overlay only. Yes, RAM and number of Cores are concepts far easier to understand then Switch Throughput, IP Flows and Interface Buffer Capacity, but the concept is the same - if we want to provision our applications to run over our network ignoring the importance of the Physical Network, even if your IP network is redundant and highly available as the topology below - some of our Links will have high drop numbers while the others will have almost no traffic, some of our Switches will have CPU 99% while the others get under 10% (this data is actually from the real SDN implementations). What can we do? We have two options. We either over-provision our Network Infrastructure and spend way more money then planned, or we suffer the performance issues and blame the guys who take care of the Physical Network.



If after this paragraph you still don't understand why your traffic wouldn't be magically balanced through the Physical Network but saturate a single group of Links and Switches instead, it's yet another sign that you should probably involve your Networking experts in the decision making process. Let's face it, Overlay is based on  VxLAN, and VxLAN is basically the tunnel between the two VTEPs, and therefore - a single IP flow. What happens with an IP flow in an IP Network? It's routed via the best IP path, a decision made locally based on every routers routing table. This means that ALL the traffic between any two Hypervisors will always go through the same links and same Network devices.

The worst of all is that none of these problems will show themselves in the Demo/PoC environment, as we are mostly testing the functionalities. The problems will get more and more serious as we're adding more applications/Network loads, and tryig to scale up the environment. In any case, 100% of the wrongly chosen SDN solutions in the beginning that I've seen ended up with the clients complete frustration and a rollback to the Legacy network, at least until the SDN is "more mature". No... there are mature SDN solutions, you were just convinced too easily and chose the wrong solution.

Conclusion

Before I get to the recommendations which solution is the perfect one for you, there is one thing that most of my clients are trying to avoid - every SDN solution is a vendor lock-in. Some of them lock you in with their Hardware, some with Software and Licences, and some with Support (Including Upgrades and Additional engineering when adding/upgrading other components in your Data Center).

To sum all this up, I'll give you a simple list of advices to help you decide which SDN solution I recommend you consider.

Is VMware NSX a perfect fit for my environment?

If your environment is 100% virtual and 100% VMware (or on the path to become 100% virtual in the next few years), and your Data Center Network Topology is rather simple and made of 100% high-end high-throughput Network Devices - NSX is the way to go! With the vRealize Network Insight you'll be able to get the basic picture of whats going on in the Physical Network and as VMware says do the "Performance optimization across overlay and underlay", and the NSX micro-segmentation just works perfectly. Have also in mind that Cisco and VMware are the two companies with the greatest number of experts, so you don't have to worry about the product support.

*There's a multi hypervisor version of NSX, called NSX Transformers (previously known as NSX-mh). At the moment (December 2016) this is not something that you should consider, as it has a very limited number of functionalities, and there is no way to get your hands on it (not even as a VMware employee or a partner)

Is Nokia Nuage a perfect fit for my environment?

If you have a multi-hypervisor 100% virtual environment (or on the path to become 100% virtual in the next few years) and your Data Center Network Topology is rather simple and made of 100% high-end high-throughput Network Devices - Nuage might be the way to go. Within the Nuage VSP (Virtualized Services Platform) there is a product called Nuage VSAP (Virtualized Services Assurance Platform). Have in mind that VSAP can give you a basic overview of what's going on in your physical network, but this is more of a Monitoring then a Network Management platform. On the Nuage web page you will find that if, for example, your physical link goes down, the triggered action would be sending an email to the Networking department or similar.

If you have many Branch Offices - you should definitely consider Nuage, as Nuage Networks Virtualized Network Services (VNS) solution can literally extend your VxLANs (and therefore your applications) in a matter of hours using a simple Physical or Virtual device.

Also worth mentioning - Nuage GUI is simply awesome, fast and intuitive. Your SDN admins will appreciate this (at least in the migration process, till you migrate to all-API Data Center environment).

Is Cisco ACI a perfect fit for my environment?

ACI is definitely one of my favourites on the market, and probably the only one that gives the entire control of Overlay and Underlay as a single Network, and Out of Box with the Support model defined. The problem is that the only switch supporting the Cisco ACI is Cisco Nexus 9k. So if you have a serious Network Topology and you're planning a renovation of your Switches (or you already have a significant number of Nexus 9k) - ACI is definitely the way to go. It lets you control your network (Physical and Virtual) from a single controller, and the Troubleshooting tools are just INSANE. You can even do a trace-route including Overlay, Underlay and Security with a graphical output.

Is OpenDayLight a perfect fit for my environment?

OpenDayLight is an open source solution, which means that if you don't already have a big team of motivated R&D Network Engineers - you should go for one of the distributions out there by a major vendor, such as Ericsson, Huawei, NEC, HP etc.

The advantage of OpenDayLight is the flexibility, because it has numerous projects that you can use or not in your environment. This allows you to make a perfect fit custom solution that handles the Overlay and the Underlay using open source projects. There is again an issue of handling the Physical infrastructure with the half-engineered protocols such as OpenFlow and OVSDB, but a good system integrator can overcome this, and I've seen it happen.

The disadvantage is that this kind of solutions requires a great number of engineering hours, and an update of a certain component in your hardware may require a re-engineering of a part of your SDN solution. There is also a question of customer support, having in mind that the only one who knows the details of the personalised solution that your system integrator of choice implemented is the proper integrator.

No comments:

Post a Comment

Most Popular Posts