Just married: IBM and RedHat. What does this mean for Cisco and VMware Multi-cloud offer?

As per yesterdays announcement, IBM is acquiring Red Hat in deal valued at $34 billion (more about this here). This is another one in a row of deals I did not expect to happen:

  • Oracle acquired Sun Microsystems
  • Microsoft acquired GitHub
  • Dell acquired VMware

How disruptive can a Purple Hat really be? VMware survived being acquired by Dell quite well... will RedHat have the same luck, or not? What I know for sure is that the RedHat employees are panicking right now...

Sure, 3k billion is a big sum, but also a bold move by IBM on the conquest to the Multi-Cloud market. Combined we're looking at (to name a few):

  • Ansible for the Automation
  • OpenShift, as the best of breed PaaS based on Kubernetes
  • CloudForms as a potential CMP (I wonder how this will work out...)
  • Watson for all AI/Machine Learning related
  • IBM Cloud as a Public Cloud platform

Is this a winner combo? Or do other Hybrid Cloud promoters, like Cisco and VMware have equally good lock-in-free proposals?

As a Hybrid Cloud and DevOps advocate, and a European CTO, I've had the experience to "casually chat" to many European companies about their Cloud strategy. Two things are evident:

  • The buyer is changing, Multi-Cloud is an APPLICATION strategy, not the infrastructure strategy (read more about this here).
  • Companies don't really know who to trust, as what they're being told by various vendors and providers is not really coherent. This makes is pretty difficult to actually build a Cloud strategy (don't get me started on CEOs who'll just tell you "We've adopted Cloud First", and actually think they have a cloud strategy).

Due to all this:
- IBM and RedHat, as software companies, will be able to get to the Application market.
- Neither of the two can do Infrastructure as well as VMware & Cisco.

How important is this? Very! And here is why.

Cisco has:

  • Cloud Center, a true application oriented micro-services ready CMP, Public Cloud and Automation Tool agnostic, equipped with the right Benchmarking and Brokering tools, that integrates quite well with the infrastructure, and workflow visibility platforms.
  • ACI and Tetration, that enable the implementation of coherent and consistent Network and Security Policy Model across multiple private and public clouds, along with the workload visibility.
  • HyperFlex and CCP, providing enterprise production-ready, lock-in-free Kubernetes solution on a Hyper Converged infrastructure.
  • AppDynamics and Turbonomic, a true DevOps combo for the Day 2 we're all fearing in the Cloud, letting the application architects model their post-installation architecture, and monitor the performance of each element, latency between different elements, and assure the optimal user experience.

VMware has:

  • vRealize Automation, the best of breed Automation and Orchestration Hybrid Cloud ready platform.
  • PKS and VKE, KaaS platforms that provide the enterprise production-ready Kubernetes solution, with a fully prepared Operations component, in both - private and public cloud.
  • Wavefront, application visibility tool running on Containers, designed with Cloud applications and Micro Services in mind, with just insane performance.
  • NSX, including the full SDN stack in both, Data Center and Cloud, with probably the best API (both documentation and usage wise).
  • Partnership with AWS, Azure, GCP and IBM, to leverage the most demanded Hybrid Cloud use cases in a "validated design" fashion.

What does this all mean?

Multi-cloud is still a space that, based on Gartner and IDC, over 90% of Companies are looking at. Big companies are making their moves... so just grab your popcorn, and observe. It's going to be a fun ride!

Why SDN isn't where we thought it would be

The SDN hype started a few years ago. Everyone was talking about it as the next big thing, and it all made so much sense. I started exploring SDN while Nicira and Insieme were just two startups, and got even deeper into it when they were bought by VMware and Cisco and rose as ACI and NSX.

SDN makes perfect sense. A single point of management and operations of the entire data center network, micro Segmentation as an embedded feature, REST API support for automation, possibility to move the workloads between Sites without having to reconfigure the security policy, and a bunch of others. It truly is a missing piece, arriving a bit too late. So… why hasn't the same happened like when we started using server virtualization? Why isn't everyone implementing these technologies, and celebrating the benefits while singing their favourite tune?

In my opinion, two reasons: misleading PowerPoints and vendors with the wrong go to market strategy.

Misleading PowerPoints

Networking tends to be more complex then Compute and Storage in the Data Center. You have a group of independent network devices that need to transfer an insane number of packets between different points, with zero latency, and no time to talk to each other and coordinate the decisions. When you introduce Automation into the equation, it all gets really interesting. With SDN we introduced overlay, and managed to somehow make all this easier. Where is the problem then?

Automation is an awesome concept. If you automate, you will improve the delivery times, and always end up with the same results. Automation is not new… it's been here since the 70s, and even though the execution premises have changed, one thing stays the same:
- If you automate, you will save a lot of time and resources.
- To create automation, you need a lot of knowledge, experience and a lot of time and effort.

The truth about misleading PowerPoints lies in the second point. Everyone rushed to explain to their customers how their SDN has an API, and how you can automate everything in an instance, I saw bonus hungry AMs and SEs singing the songs to the customers about how they can use the automation tool of choice. "There's an API so you're good bro!" Unfortunately, this is far from the truth. Yes, SDN supports automation of your network, but it takes a lot of hard work to set it up right, and if you sold something to the customer without setting their expectations right… well, he will be disappointed.

What is the truth? Both ACI and NSX are mature solutions, but the SDN is no longer a group of independent switches, it needs to be integrated in the wider ecosystem, and it makes all the difference who integrates all this in your Data Center. If the customers were prepared for this from the beginning, I think we'd all bee seeing a whole lot more SDNs.

Vendor Strategy

I'll talk about 2 big ones here - VMware and Cisco. Have you noticed how these two vendors have the same number of production references in each moment? Like there is some kind of secret synchronisation behind the curtain. Ever wondered why that is?

The truth is that both, ACI and NSX, are great products. Yes, GREAT! It's also true that a surprisingly small number of "SDN experts" out there understands HOW and WHY these products need to be introduced in the data center ecosystem, so a majority of the happy SDN customers that Cisco and VMware are referencing are kind of fake, meaning - yes, they are using the product, and yes, it's in production, but it is not used as SDN. Sure, Cisco has DevNet, and VMware has VMware Code, and these are both great initiatives, but they still lack a critical mass… both of them do. [if you don’t know what these are, I STRONGLY recommend that you stop reading this post, and go check out both these websites, they are AWESOME].

What is Cisco's mistake?

Cisco counts on their traditional big partners to deliver ACI. These guys can sell networking to a networking department, they get BGP and VxLAN, they can build the fabric in what Cisco brutally named "a networking mode", and they can train the networking department to use ACI. That’s it. So… what about automation, IAC (infrastructure as code), what about the developers who are actually the true buyer here, and they just need to provision some secure communication for their code? Well, I'm afraid there's nothing here for them, because neither a Cisco's networking partner not the customers networking department are able to configure and prepare the ACI for what these guys really need SDN for. Customer simply isn't getting what they paid for, and they are pretty vocal about it in the social networks, so the product gets bad marketing.

And yes, there are companies out there (such as mine) who are able to implement ACI as a part of a Software Defined ecosystem and help customer build automation around it, but Cisco somehow still isn's seeing the difference, and is still promoting same old networking partners to the customers to implement their ACI. Oh well… let's hope Cisco starts to understanding this before it's late.

What is VMware's mistake?

NSX is an entirely different story. The problem isn't VMware's strategy, but rather - the buyer. SDN is still networking, so the buyer is a Networking Department, but… Networking guys don’t know VMware, they know Cisco and Juniper. On the other hand there are System Admins who are desperate to gain control over network and not depend on the slow networking departments, but… they lack advanced networking knowledge. So NSX, being a brilliant product as it is, ended up in no mans land. VMware did everything to promote NSX to Network experts, if you're a CCIE, like me, you can actually do NSX cert exams without doing the training, and NSX is easy to learn and understand, but still, not enough hype around it among network admins. So, what happened? Well, for now there are many implementations of NSX used the way System and Security experts are able to promote and manage it, Micro Segmentation with some basic networking, but not even close to the NSX full potential, and again - not used as an SDN.

What about the other SDN vendors? 

There are a few worth mentioning: Nokia Nuage, Juniper Contrail, some distributions of OpenDayLight (HP, Dell, Huawei, Ericsson, NEC, etc.). Two things are happening with these guys:

  • Due to all mentioned above, the Customers are under the wrong impression that not even ACI and NSX are fully mature and stable solutions… If Cisco and VMware aren't able to invest what it takes and make it stable, what do you expect from the others?
  • In one moment all these guys made huge investments in their technology, and there was still no sales to support the investment, so - they lowered the prices and started selling the solutions that weren't yet mature. This caused customers dissatisfaction, and the rumor on the market that SDN just "isn't there yet". They can still recover… as long as they actually invest in product development and engineering skills, and let product sell itself. 

What should we expect in the next 2-4 years?

SDN is here to stay, even more so with IoT and Containers with a whole set of new micro Segmentation and Network automation requirements. It just takes it longer then anticipated to find it's place. I think the customers are slowly starting to get the non-planned effort to actually move from installing the SDN product - to using it as a Software Defined technology, which is good, so if you're considering SDN as a potential career path - add some automation and programming skills, and you're on the right track.

Migrate HyperFlex Cluster to a new vCenter

Get ready to have your mind blown. One of the easiest procedures I've encountered. You just need to follow these 3 steps, to migrate the entire HyperFlex vSphere Cluster with all its hosts from vCenter 1 to vCenter 2.

Before you start:

- Your environment might be different. I'm not responsible if something goes wrong, you're welcome to look for the official guides. I've tested it to migrate from vCenter 6.0 to 6.7 in August 2018.
- VDS WILL NOT be migrated automatically, BUT - you can Export it into ZIP from the old vCenter, and import into the new one AFTER you've done all these steps, and the Uplinks will be automatically mapped. Be sure to include all the configuration, portgroups and all, both when you export and import.

Step 1.

Deploy your vCenter Server Appliance. I'll asume you're setting the standard username, administrator@vsphere.local

Step 2.

Create both Datacenter and Cluster in the empty new vCenter. For the ease of migration, use the same names. Connect all ESXi hosts from HyperFlex to the new Cluster. Just accept re-assigning of the licence, and wait to see the new host as Connected.

Step 3. 

Re-register the Cluster to a new vCenter. I recommend that you observe the new vCenter in the background, so that you can follow the progress. To do this you need to SSD into your HyperFlex, and execute the following command (set your own parameters, of course):

stcli cluster reregister --vcenter-cluster CLUSTER_NAME --vcenter-datacenter DATACENTER_NAME --vcenter-password 'NEW_vCENTER_PASSWORD' --vcenter-url NEW_vCENTER_IP --vcenter-user administrator@vsphere.local

You will get this message:

Reregister StorFS cluster with a new vCenter ... [this is where you wait for approx 10 minutes]
Cluster reregistration with new vCenter succeeded

Additional Step:

If you are using VDS, this is when you need to import them to the new vCenter.

And - you're done! Let me know in the comments if it worked as easy as this.

Install PowerCLI on Mac, start using PowerNSX

This is something I've been wanting to publish for a while, and finally my Mac got formatted (no questions will be taken at this point...) and I had to re-install it all, and I just couldn't find the instructions on how to do it without just having to read pages and pages of disclaimers and stuff...

Why PowerCLI? Cause it's a simplest way to automate your vCenter tasks, via the command line, fast and furious. Sure, one day a working vCenter web plugin will come, but who knows when...

Why PowerNSX? Same... but for the NSX admins. Trust me, my life got so much better the day I stopped depending on vCenter Web GUI.

How do I install and start using it? Simple. Just follow this 5 Steps guide...

Step 1: Install PowerShell

Make sure you have GitHub:

# git

Clone the PowerShell installation package from GitHub:
# git clone --recursive https://github.com/PowerShell/PowerShell

Once you got it, enter the Folder, and install the package (you'll be asked for a Password a few times):
Submodule path 'src/libpsl-native/test/googletest': checked out 'c99458533a9b4c743ed51537e25989ea55944908'

MatBook-Pro:~ mjovanovic$ cd /Users/mjovanovic/PowerShell
MatBook-Pro:PowerShell mjovanovic$ ./tools/install-powershell.sh

Get-PowerShell Core MASTER Installer Version 1.1.1
Installs PowerShell Core and Optional The Development Environment

Run "pwsh" to start a PowerShell session.
*** NOTE: Run your regular package manager update cycle to update PowerShell Core
*** Install Complete

MatBook-Pro:PowerShell mjovanovic$ pwsh
PowerShell v6.0.2
Copyright (c) Microsoft Corporation. All rights reserved.

Type 'help' to get help.

PS /Users/mjovanovic/PowerShell>

You're in the PowerShell!!!

Step 2: Install PowerCLI

Now lets procede with the PowerCLI. More details on this Link, if you happen to need more details... but basically all you need is the following command: https://blogs.vmware.com/PowerCLI/2018/03/installing-powercli-10-0-0-macos.html

PS /Users/mjovanovic/PowerShell> Install-Module -Name VMware.PowerCLI -Scope CurrentUser

Untrusted repository
You are installing the modules from an untrusted repository. If you trust this repository, change its InstallationPolicy value by running the Set-PSRepository cmdlet. Are you sure you want to install
the modules from 'PSGallery'?
[Y] Yes  [A] Yes to All  [N] No  [L] No to All  [S] Suspend  [?] Help (default is "N"): Y

Step 3: Install PowerNSX Modules

Ok, so now we just need to install the PowerNSX Modules:

PS /Users/mjovanovic/PowerShell> Find-Module PowerNSX | Install-Module -scope CurrentUser

Untrusted repository
You are installing the modules from an untrusted repository. If you trust this repository, change its InstallationPolicy value by running the Set-PSRepository cmdlet. Are you sure you want to install
the modules from 'https://www.powershellgallery.com/api/v2/'?
[Y] Yes  [A] Yes to All  [N] No  [L] No to All  [S] Suspend  [?] Help (default is "N"): Y

Step 3.1: Resolve the Certificate Error:
If you tried to connect to your vCenter now, you´d get this error:
Connect-VIServer : 06/08/2018 18:32:43 Connect-VIServer Error: Invalid server certificate. Use Set-PowerCLIConfiguration to set the value for the InvalidCertificateAction option to Ignore to ignore the certificate errors for this connection.

Before Logging in to your vCenter, to avoid the Certificate problems (which you will most definitely have), first use, You need to set the Certificate Errors to FALSE:

PS /Users/mjovanovic/PowerShell> set-PowerCLIConfiguration -InvalidCertificateAction Ignore

Perform operation?
Performing operation 'Update PowerCLI configuration.'?
[Y] Yes  [A] Yes to All  [N] No  [L] No to All  [S] Suspend  [?] Help (default is "Y"): Y

PS /Users/mjovanovic/PowerShell>

Step 4: Log into the NSX Manager and vCenter

Now you are GOOD TO DO, you can Log in to your NSX, and to the vCenter:

PS /Users/mjovanovic/PowerShell> Connect-NsxServer -NsxServer -Username admin -Password M4TSCL0UD

PowerNSX requires a PowerCLI connection to the vCenter server NSX is registered against for proper operation.
Automatically create PowerCLI connection to
[Y] Yes  [N] No  [?] Help (default is "Y"): Y

WARNING: Enter credentials for vCenter

PowerShell credential request
Enter your credentials.
User: administrator@vsphere.local
Password for user administrator@vsphere.local: **************

Version             : 6.4.0
BuildNumber         : 7564187
Credential          : System.Management.Automation.PSCredential
Server              :
Port                : 443
Protocol            : https
UriPrefix           :
ValidateCertificate : False
VIConnection        :
DebugLogging        : False
DebugLogfile        : \PowerNSXLog-admin@

Step 5: Start using PowerNSX

You can do so many things here! I recommend this Guide to get you started:

How I passed Google Certified Professional Cloud Architect Exam

After a few months of heavy preps, I managed to pass the exam. I got the electronic certificate, and supposedly I'll get a Cloud Architect Hoodie! Yeah, I'm gonna wear it :)

The exam is every bit as difficult as advertised. I did A LOTS of Hands On in the Google Cloud Platform (the 300$ that Google gives you to play around comes in quite handy), without it I don't think it's possible to pass, bunch of questions have commands to choose from, and a heavy focus on App Development and Linux Commands. If you want to know how I prepared, check out my previous posts:

  1. Why I decided to become a Certified Cloud Architect, why Google Cloud, and how I want to prepare
  2. Introduction to Big Data and Hadoop
  3. Google Cloud - Compute Options (IaaS, PaaS, CaaS/KaaS)
  4. Google Cloud - Storage and Big Data Options
  5. Google Cloud - Networking and Security Options

Stay tuned, my Cloud is about to get much more DevOps-y in 2018!

Public Cloud Networking and Security: VPCs, Interconnection to Cloud, Load Balancing

I'm so happy to finally be here, at the Networking part of the Public Cloud!!! I know, there are more important parts of Cloud then Networks, but SDN is my true love, and we should give it all the attention it deserves.

IMPORTANT: In this post I will be heavily focusing on Google Cloud Platform. The concepts described here apply to ANY Public Cloud. Yes, specifics may vary, and in my opinion GCP is a bit superior to AWS and Azure at this moment, but if you understand how this one works - you'll easily get all the others.

Virtual Private Cloud (VPC)

VPC (Virtual Private Cloud) provide global scalable and flexible networking. This is an actual Software Defined Network provided by Google. Project can have up to 5 VPC - Virtual Private Networks. VPC can be global, and contains subnets and uses a private IP space. Subnets are regional. The network that you are provided with VPC is:

  • Private
  • Secure
  • Managed
  • Scalable
  • Can contain up to 7000 VMs

Once you create the VPC, you have a cross-region RFC1918 IP Space network, using Googles private Network underneath. It uses the Global internal DNS, load balancing, firewalls, routes, and you can scale rapidly with global L7 Load Balancers. Subnets within VPC can only exist within Region/Zone, you can't extend a Subnet over your entire VPC.

VPC Networks can be provisioned in:
  • Auto Mode, where the Subnet(s) is set (automatically assigned) in every region. Firewall rules and routes are preconfigured.
  • Custom mode, where we have to manually configure the subnets.

IP Routing and Firewalling

Routes are defined for the networks to which they apply, and you can use them if you want to apply the route only for the Instances with a certain "instance tag" (If you don't specify the TAG, the route applies to all the instances).

When you use the Routes to/from the Internet, you have 2 options:

Project can contain various VPCs (Google allows you to create up to 5 VPCs per project). VPCs also have Multi Tenancy. All the resources in GCP belong to some VCP.  Routing and Forwarding must be configured to allow traffic within VPC, and with the outside world. You also need to configure the Firewall Rules.

VPCs are GLOBAL, meaning the Resources can span anywhere around the world. Even so, instances from different regions CANNOT BE IN THE SAME SUBNETAn instance needs to be in the same region as a reserved static IP address. The zone in the region doesn't matter.

Firewall Rules can be based on the Source IP (ingress) or Destination IP (Egress) There are DEFAULT "allow egress" and "deny ingress" rules, which are pre-configured for you, with the minimum priority (65535). This means that if you configure the new FW rules with the lower number/higher priority, these will be taken into account, instead of the default ones. GCP Firewall rules are STATEFUL. You can also use TAGs and Service Accounts (something@developer.blabla.com for example) to configure the Firewall rules, and this is probably THE BIGGEST advantage of the Cloud Firewall, because you can do Micro Segmentation in a native way. Once you create a Firewall Rule, a TAG is created, so the next time you create an instance, and apply that rule, it will not create it again, just attach the TAG to your instance.

There are 2 types of IP addresses in VPC:
- External, in the Public IP space
- Internal, in the Private IP space

VPCs can communicate to each other using a Public IP space (External networks visible on the Internet). External IP can also be ephemeral (change every 24 hours) or static. VMs don't know what their external IP is. IMPORTANT: If you RESERVE an External IP in order to configure it as STATIC, and not use it for an Instance or a LB - you will be charged for it! Once you assign it - it's for free.

When you work with Containers - containers need to focus on the Application or Service. They don't need to do their own routing, it simplifies the traffic management.

Can I use a single RFC 1918 space within few GCP Projects?

Yes, using a Shared VPC - Networks can be shared across Regions, Projects etc. If you have different Departments that need to work on the same Network resources, you'd create two separate projects for them, give the access only to the project they work on, and use a single Shared VPC for the Network resources they all need to access.

Google Infrastructure

Google's network infrastructure has three distinct elements:
  • Core data centers (central circule), used for the Computation and Backend storage.
  • Edge Points of Presence (PoPs), Edge Points of Presence (PoPs) are where we connect Google's network to the rest of the internet via peering. We are present on over 90 internet exchanges and at over 100 interconnection facilities around the world.
  • Edge caching and services nodes (Google Global Cache, or GGC). Our edge nodes (called Google Global Cache, or GGC) represent the tier of Google's infrastructure closest to our users. With our edge nodes, network operators and internet service providers deploy Google-supplied servers inside their network.

CDN (Content Delivery Network) is also worth mentioning. It's enabled by Edge Cache Sites (Edge PoPs, or the light green circule above), the places where the online content can be delivered closer to the users for faster response times. It works with Load Balancing, and the Content is CACHED in 80+ Edge Cache Sites around the globe. unlike most CDNs, your site gets a single IP address that works everywhere, combining global performance with easy management — no regional DNS required. For more information check out the official Google docs.

Connecting your environment to GCP (Cloud Interconnect)

While this may change in the future, a VPN hosted on GCP does not allow for client connections. However, connecting a VPC to an on-premises VPN (not hosted on GCP) is not an issue.

There are 3 ways you can connect your Data Center to GCP:
  • Cloud VPN/IPsec VPN, as in a standard Site to Site VPN IPsec session (supports IKEv1 and v2). Supports up to 1,5-3 Gbps per tunnel, but you can set up various to increase performance. You can also use this option to connect different VPCs to each other, or your VPC to other Public Cloud. Cloud Router is not required for Cloud VPN, but it does make things a lot easier, by introducing the Dynamic Routing between your DC and GCP, that supports BGP. When using static routes, any new subnet on the peer end must be added to the tunnel options on the Cloud VPN gateway options.
  • Dedicated Interconnect, used if you don’t want to go via Internet, and you can meet Google in one of Dedicated Interconnect points of presence. You would be using Google Edge Location (you can connect into it Directly, or via Carrier), with Google Peering Edge (PE) device to which your physical Router (CE) connects [you need to be in the supported location - Madrid is included]. This is not cheap, currently around 1700$ per 10Gbps link, 80GB Max!
  • Direct Peering/Carrier Peering, which Google does not charge for, but also - there is no SLA. Peering is a private connection directly into Google Cloud. It's available in more locations then Dedicated Interconnect, and it can be done directly with Google (Direct Peering) if you can meet Google's direct peering requirements (Requires you to have a connection in a colocation facility, either directly or through a carrier provided wave service), or via a Carrier (Carrier Peering).

And, as always, Google provides a Choice Chart if you're not sure which option is for you:

How do I transfer my data from my Data Center to GCP?

When transferring your content into the cloud, you would use the "gsutil" command line tool, and have in mind:
  • Parallel uploads (-o, plus you need to set the parameters) are for breaking up larger files into pieces for faster uploads.
  • Multi-threaded uploads (-m) are for large numbers of smaller files.  If you have bunch of small files, you should group together and compress.
  • You can add multiple Cloud VPNs to reduce the transfer time.
  • gsutil by default will by default occupy the entire bandwidth. There are tools to optimize this. When it fails, gsutil will retry by default.
  • For ongoing automated transfers, use a cron job.

Google Transfer Appliance is a new thing, probably not in the exam, it allows you to copy all your data, ship it to google, and they will load it to the Cloud for you.

Load Balancing in GCP

One of the most important parts of Google Cloud, because it enables the Elasticity, much needed in the cloud, by providing the Auto Scaling for the Managed Instance Groups.

Have in mind that the Load Balancing services for the GCE and GKE work in a different ways, but basically they achieve the same thing - Auto Scaling. Here is how this works:
  • In GCE there is a managed group of instances generated from the same template (Managed Instance Group). By enabling a Load Balancing service, you're getting a Global URL for your Instance Group, that includes the Health check service launched from the Balancer to the Instances, which is the base trigger of the Auto Scaling.
  • In GKE you'd have a Kubernetes Cluster, and the entire Elastic operation of your containers is one of the signature functionalities of the Kubernetes Cluster, so you don't have to worry about configuring any of this manually.

Let's get deeper into the types of the Load Balancing (LB) service in GCP. Have in mind that you should always have in mind the ISO-OSI model, and if you can provide the LB service on the high level - go for it! This means that if you can do a HTTPS Balancing, rather go for that then SSL. If you can't go HTTPS - go for SSL. If your traffic is not encrypted - sure, go for TCP. Only if NONE of this works for you, you should settle for the simple Network LB Service.

IMPORTANT: Whenever you are using one of the encrypted LB Services (HTTPS, SSL/TLS), the Encryption terminates on the Load Balancer, and then the proper Load Balancer established a separate encrypted tunnel to each of the Active Instances.

There are 2 types of Load Balancing on GCP:
  1. EXTERNAL Load Balancing, for an access from the OUTSIDE (Internet)
    1. GLOBAL Load Balancing:
    • HTTP/HTTPS Load Balancing
    • SSL Proxy Load Balancing
    • TCP Proxy Load Balancing
    1. REGIONAL Load Balancing:
    • Network Load Balancer (notice that the Network Load Balancer is NOT Global, only available in a single region)
  2. INTERNAL, for the inter-tier access (example - web servers accessing Data Bases)

Google Cloud Platform (GCP) - How do I choose among the Storage and Big Data options?

Storage options are extremely important when using a GCP, performance and price wise. I will do a bit of a non-standard approach for this post. I will first cover the potential use cases, explain the Hadoop/Standard DB you would use in each case, and then the GCP option for the same use case. Once that part is done, I will go a bit deeper into each of GCP Storage and Big Data technologies. This post will therefore have 2 parts, and an "added value" Anex:
  1. Which option fits to my use case?
  2. Technical details on GCP Storage and Big Data technologies
  3. Added Value: Object Versioning and Life Cycle management

1. Which option fits to my use case?

Before we get into the use cases, let's make sure we understand the layers of abstraction of Storage. Block Storage is a typical storage carried out by applications, data stored in cylinders, UNSTRUCTURED DATA WITH NO ABSTRACTION. When you can refer to data using a physical address - you're using Block Storage. You would normally need some abstraction to use the storage, it would be rather difficult to reference your data by blocks. File Storage is a possible abstraction, and it means you are referring to data using a logical address. In order to do this, we will need some kind of layer on top of our blocks, an intelligence to make sure that our blocks underneath are properly organized and stored in the disks, so that we don't get the corrupt data.

Let's now focus on the use cases, and a single question - what kind of data do you need to store?

If you're using Mobile, the you will be using a slightly different data structures:

Let's now get a bit deeper into each of the Use Cases, and see what Google Cloud can offer.
  1. If you need Block Storage for your compute VMs/instances, you would obviously be using a Googles IaaS option called Compute Engine (GCE), and you would create the Disks using:
    • Persistent disks (Standard or SSD)
    • Local SSD
  1. If you need to store an unstructured data, or "Blobs", as Azure calls it, such as Video, Images and similar Multimedia Files - what you need is a Cloud Storage.
  2. If you need your BI guys to access your Big Data using an SQL like interface, you'll use a BigQuery, a Hive-like Google product. This applies to cases 3 (SQL interface required), and 7 (OLAP/Data Warehouse).
  3. To store the NoSQL Documents like HTML/XML, that have a characteristic pattern, you should use DataStore.
  4. For columnar NoSQL data, that requires fast scanning, use BigTable (GCP equivalent of HBase).
  5. For Transactional Processing, or OLTP , you should use Cloud SQL (if you prefer open source) or Cloud Spanner (if you need less latency, and horizontal scaling).
  6. Same like 3.
  7. Cloud Storage for Firebase is great for Security when you are doing Mobile.
  8. Firebase Realtime DB is great for fast random access with mobile SDK. This is a NoSQL database, and it remains available even when you're offline.

2. Technical details on GCP Storage and Big Data technologies

Storage - Google Cloud Storage

Google Cloud Storage is created in the form of BUCKETS, that are globally unique, identified by NAME, more or less like a DNS. Buckets are STANDALONE, not tied to any Compute or other resources.

TIP: If you want to use Cloud Storage with a web site, have in mind that you need a Domain Verification (adding a meta-tag, uploading a special HTML file or directly via the Search Console).

There are 4 types of Bucket Storage Classes. You need to be really careful to choose the most optimal Class for your Use Case, because the ones that are designed not used frequently are the ones where you'll be charged per access.  You CAN CHANGE a Buckets Storage class. The files stored in the Bucket are called OBJECTS, the Objects can have the Class which is same or "lower" then the Bucket, and if you change the Bucket storage class - the Objects will retain their storage class. The Bucket Storage Classes are:
  • Multi-regional, for frequent access from anywhere around the world. It's used for "Hot Objects", such as Web Content, it has a 99,95% availability, and it's Geo-redundant. It's pretty expensive, 0.026/GB/Month.
  • Regional, frequent access from one region, with 99,9% availability, appropriate for storing data used by Cloud Engine instances. Regilnal class has performance for data intensive computations, unlike multi-regional.
  • Nearline - access once at month at max, with 99% availability, costing 0.01/GB/month with a 30 day minimum duration, but it's got ACCESS CHARGES. It can be used for data Backup, DR or similar.
  • Coldline - access once a year at max, with same throughput and latency, for 0.007/GB/month with a 90 day minimum duration, so you would be able to retrieve your backup super fast, but you would get a bit higher bill.. At least your business wouldn’t suffer.

We can get a data IN and OUT of Cloud Storage using:
  • XML and JSON APIs
  • Command Line (gsutil - a command line tool for storage manipulation)
  • GSP Console (web)
  • Client SDK

You can use TRANSFER SERVICE in order to get your date INTO the Cloud Storage (not out!), from AWS S3, http/https, etc. This tool won't let you get the data out. Basically you would use:
  • gsutil when copying files for the first time from on premise.
  • Transfer Service when transferring from AWS etc.

Cloud Storage is not like Hadoop in the architecture sense, mostly because a HDFS architecture requires a Name Node, which you need to access A LOT, and this would increase your bill. You can read more about Hadoop and it's Ecosystem in my previous post, here.

When should I use it?

When you want to store UNSTRUCTURED data.

Storage - Cloud SQL and Google Spanner

These are both relational databases, super structured data. Cloud Spanner offers ACID++, meaning it's perfect for OLTP. It would, however, be too slow and too many checks for Analytics/BI (OLAP), because OLTP needs strict write consistency, OLAP does not. Cloud Spanner is Google proprietary, and it offers horizontal scaling, like bigger data sets.

*ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties of database transactions intended to guarantee validity even in the event of errors, power failures, etc.

When should I use it?

OLTP (Transactional) Applications.

Storage - BigTable (Hbase equivalent)

BigTable is used for FAST scanning of SEQUENTIAL key values with LOW latency (unlike Datastore, which would be used for non-sequential data). Bigtable is a columnar database, good for sparse data (meaning - missing fields in the table), because similar data is stored next to each other. ACID properties apply only on the ROW level.

What is columnar Data Base? Unlike RDBMS, it is not normalised, and it is perfect for Sparse data (tables with bunch of missing values, because the Columns are converted into rows in the Columnar data store, and the Null value columns are simply not converted. Easy.). Columnar DBs are also great for the data structures with the Dynamic Attributes because we can add new columns without changing the schema.

Bigtable is sensitive to hot spotting.

When should I use it?

Low Latency, SEQUENTIAL data.

Storage - Cloud Datastore (has similarities to MongoDB)

This is much simpler data store then BigTable, similar to MongoDB and CouchDB. It's a key-value structure, like structured data, designed to store documents, and it should not be used for OLTP or OLAP but instead for fast lookup on keys (needle in the haystack type of situation, lookup for non sequential keys).  Datastore is similar to RDBMS in that they both use indices for fast lookups. The difference is that DataStore query execution time depends on the size of returned result, so it will take the same time if you're querying a dataset of 10 rows or 10.000 rows.

IMPORTANT: Don’t use DataStore for Write intensive data, because the indices are fast to read, but slow to write.

When should I use it?

Low Latency, NON-SEQUENTIAL data (mostly Documents that need to be searched really quickly, like XML or HTML, that has a characteristic patterns, to which Datastore is performing INDEXING). It's perfect for SCALING of a HIARARCHICAL documents with Key/Value data. Don't use DataStore if you're using OLTP (Cloud Spanner is a better. choice) or OLAP/Warehousing (BigQuery is a better choice). Don't use for unstructured data (Cloud Storage is better here). It's good for Multi Tenancy (think of HTML, and how the schema can be used to separate data).

Big Data - Dataproc

Dataproc is a GCP managed Hadoop + Spark (every machine in the Cluster includes Hadoop, Hive, Spark and Pig. You need at lease 1 master and 2 workers, and other workers can be Preemptable VMs). Dataproc uses Google Cloud Storage instead of HDFS, simply because the Hadoop Name Node would consume a lot of GCE resources.

When should I use it?

Dataproc allows you to move your existing Hadoop to the Cloud seamlessly.

Big Data - Dataflow

In charge of transformation of data, similar to Apache Spark in Hadoop ecosystem. Dataflow is based on Apache Beam, and it models the flow (PIPELINE) of data and transforms it as needed. Transform takes one or more Pcollections as input, and produces an output Pcollection.

Apache Beam uses the I/O Source and Sink terminology, to represent the original data, and the data after the transformation.

When should I use it?

Whenever you have one data format on the Source, and you need to deliver it in a different format, as a Backend you would use something like Apache Spark or Dataflow.

Big Data - BigQuery

BigQuery is not designed for the low latency use, but it is VERY fast comparing to Hive. It's not as fast as Bigtable and Datastore which are actually preferred for low latency. BigQuery is great for OLAP, but it cannot be used for transactional processing (OLTP).

When should I use it?

If you need a Data Warehouse if your application is OLAP/BA or if you require an SQL interface on top of Big Data.

Big Data - Pub/Sub

Pub/Sub (Publisher/Subscriber) is a messaging transport system. It can be defined as messaging Middleware. The subscribers subscribe to the TOPIC that the publisher publishes, after which the Subscriber sends an ACK to the "Subscription", and the message is deleted from the source. This message stream is called the QUEUE. Message = Data + Attributes (key value pairs). There are two types of subscribers:
  • PUSH Subscriber, where the Apps make HTTPS request to googleapis.com
  • PULL Subscriber, where the Web Hook endpoints able to accept POST requests over HTTPS

When should I use it?

Perfect for applications such as Oder Processing, Event Notifications, Logging to multiple systems, or maybe Streaming data from various Sensors (typical for IoT).

Big Data - Datalab

Datalab is an environment where you can execute notebooks. It's basically a Jupyter or iPhython for notebooks for running code. Notebooks are better the text files for Code, because they include Code, Documentation (markdown) and Results. Notebooks are stored in Google Cloud Storage.

When should I use it?

When you want to use Notebooks for your code.

Need some help choosing?

If it's still not clear which is the best option for you, Google also made a complete Decision Tree, exactly like in the case of "Compute".

3. Added Value: Object Versioning and Lifecycle Management

Object Versioning

By default in Google Cloud Storage If you delete a file in a Bucket, the older file is deleted, and you can't get it back. When you ENABLE Object Versioning on a Bucket (can only be enabled per bucket), the previous versions are ARCHIVED, and can be RETRIEVED later.

When versioning is enabled, you can perform different actions, for example - use an older file and override the LIVE version, or similar.

Object Lifecycle Management

To avoid the archived version creating a chaos in some point of time, it's recommendable to implement some kind of Lifecycle Management. The previous versions of the file maintain their own ACL permissions, which may be different then the LIVE one.

Object Lifecycle Management can turn on the TTL. You can create CONDITIONS or RULES to base your Object Versioning. This can get much more granular, because you have:
  • Conditions are criteria that must be met before the action is taken. These are: Object age, Date of Creation, If it's currently LIVE, Match a Storage Class, and Number of Newer Versions.
  • Rules
  • Actions, you can DELETE or Set another Storage Class.

This way you can get pretty imaginative, and for example delete all objects older then 1 year, or perhaps if a Rule is triggered and conditions are met - change the Class of the Object from, for example, Regional to Nearline etc.

Google Cloud Platform (GCP) - How do I choose among the Compute options? IaaS, PaaS, CaaS/KaaS?

Google has made their Cloud Platform (GCP) so that you can host your application any way your business requires. When we talk about the traditional Data Center, we tend to distinguish 3 types of "resources":
  • Compute
  • Storage
  • Networking and Security

In each of these 3 areas, GCP offers you plenty of options. Don't be naive though, you will need to know the options quite good in order to optimize your performance and costs. In this 3-part Blog Post I will go into each of these 3 in detail, and hopefully help you with your decision.

Let's start with the Compute options. There are 3 options to choose from. You can go for the Google App Engine, or a PaaS option, focus on the code and let Google handle everything else, use a GCP to simply deploy your VMs (or Instances as they call it) the way you like, or you can choose the Containers. My idea is to try and explain each of the options in a bit more details. If this is something you'd be interested in - keep reading.

What are IaaS and PaaS?

Let's start with a simple question: What are IaaS (Infrastructure as a Service), and PaaS (Platform as a Service) and how are they different from a traditional On-premise/Data Center model? Back to basics - what does our application need in order to run? Let's start from the bottom of the Application Stack:
  • Networking, to be reached, and to reach data it requires to operate. We need Switches, Routers, etc.
  • Storage, to store data. We need Disks, Storage Cabins, SAN switches.
  • Servers, to store the compute loads. Physical Servers, with RAM, CPUs etc.
  • Virtualization, to optimize the usage of the Physical Resources by using the VMs (Virtual Machines).
  • Operating System
  • Middleware
  • Runtime
  • Data
  • Applications

In the On Premise architecture, it is on us to (over-) provision and manage all these resources. Wouldn't it be great if someone would provision and manage some of the "basic" layers for us, so that we could focus on the part that actually matters to our business? THIS is what it's all about. I like how AWS defines this - in IaaS, the Cloud Provider takes care of all the heavy lifting, or as they call it - Undifferentiated Services, while you handle the services on top, that make your business different from your competitors.

Now, check out the following diagram, to see what exactly is managed by the Cloud Provider, and what is managed by You, in the case of IaaS, PaaS and SaaS.

*In some examples out there, in IaaS OS is partially managed by You. This pretty much depends on the model that Cloud Provider is offering.

What Compute options does GCP offer?

There are Compute options for hosting your applications in Google Cloud. You can use one of those, or Mix and Match:
  1. Google Cloud Functions (currently in Beta). It's a Serverless environment for building and connecting other cloud services. Very simple, very single purpose functions, written in JavaScript, executed in Node.js. Cloud Function executes a response to a TRIGGER event. Functions are not exactly a Compute option, but they do match this use case, so I'll just keep them here.
  2. Google App Engine (GAE), is the PaaS option, serverless and ops free. It's a flexible, zero ops, serverless platform for highly available apps. You would choose GAE if you ONLY want to focus on writing code. It can be used for the Web sites, Mobile apps, or gaming backend, and also for IoT. Google App Engine is a MANAGED SERVICE, meaning - you NEVER need to worry about the infrastructure, it's invisible to you. There are 2 available environments: Standard (predefined Runtime) and Flexible (configurable Runtime). We will get into these in more detail.
  3. Google Kubernetes Engine (GKE), is the CaaS/KaaS (Containers/Kubernetes as a Service) option, clusters of machines running Kubernetes and hosting containers. Containers are the auto-contained services, containing all the Libraries and Dependencies, so that you don't have to worry about the Operating System at all. GKE engine allows you to focus on the Applications, not on the OS. You should use it to increase velocity and improve operability by separating the application from the OS. Ideal for Hybrid applications.
  4. Google Compute Engine (GCE) is the IaaS option, fully controllable down to the OS. We're talking about Instances of VMs. You should use it if you have a very specific requirements from your operating system, or if you need to use the GPUs (yes, this is the only option that let's you add Graphical Processing Units for intensive compute tasks to your Compute resources).

There is also a fifth option called Firebase, and it's specific for Mobile, but I won't go into that right now. Instead, let's focus on each of the four options mentioned above. Each of these options can be used for any application, and it's on you to choose the one that best fits, each one has their Pros and Cons. Yes, you can mix them in the same application! Check out the following diagram, to get a clearer picture:

Google Cloud Launcher

Before we get into more detail about the Compute options, I'd like to cover the Cloud Launcher, one of my favorite tools in the GCP. Google Cloud Launcher can help you set up an easy app, such as WordPress or LAMP stack, in a few minutes. You can customize your application, because you will have full control of your instances. You will also know more or less how much everything will cost before you deploy it all. Remember this for now, because I will be mentioning the Launcher later.

Google Cloud Functions

Floating, serverless execution environments, for building and connecting the cloud environments. You would be writing simple, single-purpose functions. When an event that is being watched is fired - Cloud Function is triggered. You can run it in any standard Node.js runtime. This would be a perfect option for the coders that like to write their applications in functions.

Google Application Engine (GAE) - PaaS

PaaS option is a perfect option if you just want to focus on your code, and you trust Google to manage your entire infrastructure, including the Operating System. It tends to be very popular with SW, mobile and Web developers. If you prefer to pay per use, and not per allocation, you might prefer PaaS (No-Ops) to IaaS (DevOps) option. Also, there's no vendor lock-in, you can easily move your Apps to another platform because everything is built on the Open Source tools. App Engine is REAGIONAL, and Google makes sure that you have the HA using different (availability) zones within the region.

Can you use GAE in Multiple Regions? You cannot change the region. Your app will be served from the region you chose when creating the app. Anyone can use the app, but users closer to the selected region will have lower latency. More details: https://cloud.google.com/appengine/docs/locations

App Engine supports ONLY HTTP/S.

GAE is super easy to use. You will basically need to create a new Folder, store your files in there, and execute the command "gcloud app deploy". That's it!

There are two environments, depending whether you can customize an OS:
  1. Standard (deployed in Containers), preconfigured with one of the several available runtimes (specific versions of Java 7, Python 2, Go, PHP). Each runtime includes the standard Libraries. Basically this is a container - Serverless. Your code is running in a Sandboxed environment.
  2. Flexible (deployed in VM Instances, based on GCE), that you can customize into a non standard environment, and you can use Java 8, Python 3.x, .NET, also supporting Node.js, Ruby, C#. This is not a container, it's a VM of a compute instance, and you are charged based on the usage of the VM instance (CPU, memory, disk usage) that's been provisioned for you. Unlike on GCE, the instances are automatically managed for you, meaning - regional placement, updates, patches and all (root SSH disabled by default, but can be enabled).

IMPORTANT: Scale up time is measured in seconds in Standard environment, and in minutes in the Flexible environment, simply because the containers are much faster then the VM instances.

Google Compute Engine (GCE) - IaaS

Google Compute Engine should be used when you need IaaS, for example, you need to tune your Load Balancing and Scaling. When you create a VM instance (each instance needs to belong to a Project, and a Project can have up to 5 VPC - Virtual Private Networks), you need to choose the Machine Type, a Zone, an Operating System (Linux and Windows Server are available, you get root access and SSH/RDP enabled). You can choose one of the following Machine Types, but have in mind that in order to later change it you need to stop the instance, change it, and then turn it back on:
  • Standard
  • High memory
  • High CPU
  • Shared core (small, non resource intensive)

Compute Engine instances are pay-per-allocation. When the instance is running, it is charged at an per-second rate whether it is being used or not. I'd also like to use this section to clarify a few important concepts related to GCE:
  • What is a PREEMPTABLE instance?
  • How does Google Maintenance affect your workloads?
  • How do I automate instance creation?
  • What Disks can I assign to my Instance?
  • Which VMs and Images are available for me, and can I qualify for discounts?

What's a Preemptable VM instance?

A type of VM instance that can be deleted with 30 second notification time, once the SOFT OFF signal is sent (best practice: you need a SHUT DOWN SCRIPT, able to shut the instance off and do all the clean-up in less then 30 seconds). It's much cheaper, of course, because it can be deleted AT ANY TIME (at least once every 24 hours). It can, for example, be used for the fault tolerant applications.

How does Google Maintenance affect your workloads?

Google can shut down your machine for maintenance. You can configure what to do in this case, migrate or terminate. This is your call, as it directly depends on the nature of your application, and whether they are Cloud Native (instances treated as a Cattle, rather then as Pets. Confused? Read my previous post for clarification).

Live Migration allows an instance to be up and running, even in the maintenance state, or during a HW or SW update, failed HW, network and power grid maintenance etc. The instance is moved to another host in the same zone. VM gets a notification that it needs to be evicted. A new VM is selected for migration, and the connection is AUTHENTICATED between the old and the new VM.

When a Live Migration is executed there are 3 stages:
  1. Pre-migration brownout: VM executing on the source, when most of the state is sent from source to target. The time depends on the memory that needs to be copied and similar.
  2. Blackout: a brief moment when none of the VMs are running.
  3. Post-migration brownout: a VM is running on the destination/Target Host, but the source VM is still not killed, ready to provide support if needed.

  • Preemptable instances cannot be live migrated.
  • Live migration cannot be used for the VMs with GPUs.
  • Instances with the local SSD can be live migrated.


How do I automate instance creation?

To AUTOMATE the instance creation, you can use the gcloud command line. One of the options is for example to assign a LABEL to instances you want to group (called Instance Group) in order to monitor or automate. You can get the exact script to, for example, create an instance, from the graphical interface, just look for the API and command line equivalents. Yes, this is awesome, you can literally get an API for any graphical interface action you take. Automation made easy, good job Google!

DevOps tools are also available (GCP equivalents for some), which is great if you have strong DevOps skills in the house:
  • Compute Engine Management using Puppet, Chef, Ansible.
  • Automated Image Builds with Jenkins, Packer and Kubernetes.
  • Distributed Load Testing with Kubernetes.
  • Continuous Delivery with Travis CI.
  • Managing Deployments using Spinnaker.

What Disks can I assign to my Instance?

You also have loads of Storage options for your instances. I won't go into the storage options here in detail, but to create a Disk for your VM instance you have 4 options:
  • Cloud Storage Bucket, as the cheapest option.
  • Standard persistent disks (64 TB).
  • SSD persistent disks (64 TB).
  • Local SSD (3 TB), actually attached to the instance, in the same Server.

Which VMs and Images are available for me, and how do I qualify for discounts?

Images help you instantiate new VMs with the OS already installed. There are Standard and Premium Images, depending whether you need some kind of license, like for RedHat Enterprise Linux or MS Windows. You should have in mind that you have 2 possibilities to get your image ready to launch:
  • Startup Script, that you need to write in order for it to download your dependencies, and prepare everything. It needs to always bring the VM in the same state, regardless how many times you execute it.
  • Baking is a more efficient way to create an image in order to faster provision an instance, much more efficient then a Startup script. You would start from the Premium image, and create a Custom instance (sort of a Template, if you will). Baking takes much shorter to provision an instance then a Startup disk. Everything is included into the "baking image". Version management and rollbacks are much easier, you can just rollback an image as a whole.

Check out this link about Google Cloud pricing for more details.

In the image lifecycle the possible statuses are: CURRENT, DEPRECATED (can still be used and launched), OBSOLETE (cannot be launched) and DELETED (cannot be used). This should give you some idea about how you would be managing your instance versions.

  • Snapshots can only be accessed within the same project.
  • All machines are charged for at least 1 minute. After that, a per-second payment is applied. The more you use the VM, the more discount you get.

Before we get to the possible discounts, you first need to choose your machine type correctly, to optimize the cost and the performance:
  1. Pre-defined
  2. Custom: You can specify the number of vCPUs and Memory. You would start with one of the pre-defined, and if you see that your CPU or memory are under-utilized, customize it.
  1. Shared-core is another option, meant for small, non resource intensive applications, that require BURSTING.
  1. High Memory Machines: more memory per vCPU, 6.5GB per Core.
  2. High CPU Machines: more vCPU per unit of memory, 1.8GB per Core

Google offers a few types of discount/price optimization, among others:
  1. Sustained use, when you use a VM for a long period of time
  2. Committed use, that you can purchase in 1 year or 3 year contract, and you get a good price.
  3. Rightsizing is a feature recommends which size of the VMs to run after analyzing your application behavior. This is a brand new feature, and it relies to the Stackdriver collected information from the past 8 days.

Google Containers/Kubernetes Engine (GKE) - CaaS/KaaS

If you have lots of dependencies, you would of course benefit most using the Containers. Container is a light weight standalone executable package that includes everything needed to run it: code, runtime, system tools, system libraries, settings. Containers de-couple the Application from the Operating System, and they can reliably run on different environments. Different containers run on a same Kernel, as presented in the picture below, taken from the Dockers web page:

Container vs VM

A VM contains an entire operating system packaged along with the application. A container ONLY runs an OS Kernel and nothing else, it contains the Application and the essential Libraries, Binary files etc., and it can easily be moved from one Physical or Virtual machine that has the Kubernetes engine, to another. Containers are much faster, as there is no OS to boot, and they are much smaller in size.

To be precise, using Containers/Dockers we can achieve:
  • Process isolation
  • Namespace isolation
  • Own Network Interface
  • Own Filesystem

Meanwhile, when we say a Micro service, that simply means that one container = one process.

What is Kubernetes?

Kubernetes is an open source Container Manager, originally created by Google for it´s internal use. Kubernetes automates Deployment, Scaling and Management. This means that using Kubernetes you can:
  • Rollout new features seamlessly
  • Auto scale your application
  • Run your application in the Hybrid environment, as long as you have the Kubernetes Engine in your VMs.

Why is Kubernetes so important here? Because Google Kubernetes Engine uses Kubernetes as a Container Management engine.

Let's first check out the important components of the Kubernetes architecture:
  • A Container Cluster has one supervising machine running Kubernetes (Master Endpoint, or Master Instance works like Hadoop Cluster Manager). Kubernetes Master manages the cluster, and it's your single point of management of the Kubernetes Cluster.
  • Master Instance will be in touch with a number of individual machines using a software called Kubelet, each running Docker.
  • Each individual machine running Kubelet is known as a Node Instance.
  • Pod is a smallest deployable unit, a group of 1 or more containers in a Node. Inside each Pod in every Node Instance, Containers are running. Pod has it's settings in a Template.
  • Replication Controller ensures that specific number of Pod replicas are running across Nodes.
  • Services are the abstraction layer that decouples the frontend clients to the backend pods. They define the LOGICAL set of pods across nodes and the way of accessing them. Load Balancing is one of the Services, creating an IP and a port as a connection point to our Pods.
  • Label is a METADATA with semantic meaning. It's used for selecting and grouping the objects.
  • scheduler is in charge of scheduling pods onto nodes. Basically it works like this: You create a pod,
scheduler notices that the new pod you created doesn’t have a node assigned to it, and assigns a node to the pod. It’s not responsible for actually running the pod – that’s the kubelet’s job. So it basically just needs to make sure every pod has a node assigned to it.
  • kubectl is a CLI tool for Kubernetes.

Google Cloud Engine includes the following components, most clarified in the Kubernetes architecture:
  • Container Cluster, includes a Kubernetes Master and Compute Engine instances where Kubernetes are running, managing all the components with Kubernetes Master.
  • Kubernetes Master, as a single point of management of the cluster.
  • Pods, as groups of containers.
  • Nodes, as individual Compute Instances.
  • Replication Controller, ensuring the defined number of Pods are always available.
  • Services, decoupling a frontend client from the backend Pods, providing a Load Balancer with a single URL to access your Backend.
  • Container Registry is the image repository, so that you can deploy container images

Why GKE, and not Kubernetes on GCE?

This all depends on what exactly are your needs. You can use CaaS by Google (GKE), which is easier out of the box, and Google would manage the entire "Undifferentiated" application stack, up to Containers. You can also build your own Container management on top of Googles IaaS (GCE), for example if you need GPUs, or you have some specific OS needs, or maybe a non-Kubernetes container solution, or if you are migrating your existing on premise Container solution.

Before you make a decision to, for example, run Kubernetes directly without something like GKE on top of it, I strongly recommend you to investigate the following GitHub link, on implementing Kubernetes without the pre-defined scripts: https://github.com/kelseyhightower/kubernetes-the-hard-way

If you use containers, the best way would be to use DevOps methodology, and Jenkins for CI/CD. You can use Stackdriver for logging and monitoring.

Storage options for GKE are the same like with the GCE, but Container disks are ephemeral (lasting for a very short time), so if you do want your data not ephemeral, you need to use an abstraction called gcePersistentDisk.

When would you use GKE instead of GAE?

GAE only supports HTTP/HTTPS, so if you need to use any other protocol - you would go for CaaS rather then App Engine.  Also, if you are using a Multicloud environment, GAE only works on GCP. App Engine doesn't use Kubernetes, so if you want to use Kubernetes - you would also rather go for GKE.

Interesting fact: Pokemon GO was deployed on GKE (50x more users connected then expected), while Super Mario Run (launched at 150 countries at the same time) was deployed on the GAE.

Need some help choosing?

If it's still not clear which is the best option for you, Google also made a complete Decision Tree.

Most Popular Posts