Cloud Storages

Google recognized the complexity of Kubernetes, so it developed the "Autopilot" mode.

New GKE mode is more expensive and less flexible, but easier and safer
Autopilot in GKE manages pods for you

Two things are well known about Kubernetes clusters. The first is that it is absolutely the best tool for the mission-critical task of container orchestration. And second, its complexity is a barrier to implementation and a common cause of errors. Even Google, the inventor and main promoter of Kubernetes, admits it.

To simplify the deployment and management of clusters, the company provided all GKE customers with access to the service. Autopilot which google already has been using Borg in its own clusters for a long time ... It is automatic resource configuration based on machine learning.

“Despite 6 years of progress, Kubernetes is still incredibly complex - said Drew Bradstock, head of product for Google Kubernetes Engine (GKE), in an interview with The Register. "In recent years, we've seen many companies adopt Kubernetes, but then run into difficulties."

GKE is a Kubernetes platform that runs primarily on the Google Cloud Platform (GCP). It is also available on other clouds or locally as part of Anthos ...

Autopilot - new operating mode GKE, it is more automated and preconfigured to reduce operating costs for cluster management, optimize clusters for production and high availability.

Using Autopilot in Google's own infrastructure, a source

Kubernetes has the concept of clusters (a collection of physical or virtual servers), nodes (individual servers), pods (a control block that represents one or more containers on a node), and containers themselves. GKE is fully managed at the cluster level. Autopilot extends this to nodes and pods.

The easiest way to understand the features and limitations of Autopilot is from system descriptions ... Note the "pre-configured" parameters that cannot be changed.

Comparison of Autopilot and Standard Modes

Basically, this is another way of reserving and managing GKE resources that sacrifices flexibility for convenience. Since Google manages most of the configuration, it guarantees a higher uptime of 99.9% for Autopilot pods with multiple zones. SLA ).

Regions in the Google Cloud are made up of three or more zones. Placing all resources in one zone is less reliable than in several zones, and the expansion into several regions gives the maximum fault tolerance. Clusters on Autopilot are always distributed by regions, not zones: it is more reliable, but more expensive.

Another limitation of Autopilot is the pre-installed Linux operating system with Containerd, "optimized for containers". There is no way to use Linux with Docker or Windows Server. The maximum number of pods per node is 32, not 110 as on the standard GKE.

There is no SSH access to the nodes, the Autopilot nodes are blocked. GPU and TPU (Tensor Processing Unit) support is not available, although planned for the future. “Ditching SSH was a tough decision,” says Bradstock. Of course, this limits the control options. But Bradstock said the decision was based on research that showed a high rate of critical errors in cluster configuration.
The pricing model is different here too. The payment is not charged for compute instances (virtual machines), but for the actual use of CPU, memory and storage by all pods. Plus $ 0.10 per hour for each cluster on Autopilot like standard GKE.

The obvious question is which will be more expensive, a standard cluster or Autopilot. The answer is not easy. Since this is in some ways a premium service, Autopilot is more expensive than a carefully optimized standard GKE deployment. "There is a premium over a regular GKE," Bradstock said, "because we provide not only functionality, but full SRE (Site Reliability Engineering) support and SLA guarantees."

However, Autopilot can be cheaper than a misconfigured GKE deployment that is not fully loaded because it is difficult to evaluate the correct specification for compute instances.

Cumulative distribution function (CDF) of unused memory and occupied machines for 5000 tasks after turning on Autopilot in Google's own infrastructure, a source

Reduced memory errors (OOM) and unused memory share for 500 tasks after enabling Autopilot in Google infrastructure, a source

Why not just use Cloud Run, which runs container workloads without any cluster, node or pod configuration, even on GKE? “Cloud Run is a great environment for developers, one application can go from zero to 1000 instances and back down to zero, that's what the clouds are for,” explains Bradstock. "Autopilot makes life easier for people who want to use Kubernetes, want to see and control everything, want to use third-party scripts, want to build their own platform."

A particular issue is compatibility with existing add-ons with third-party tools for Kubernetes. Some of them are not yet compatible with Autopilot, but others are already working, such as Datadog monitoring. DaemonSets are also supported - this feature is used by many tools to run daemons on all nodes.

The configuration for storage, computing and networking has forced some level of flexibility and some integrations to be dropped: “But we definitely want a third-party ecosystem to run on [Autopilot],” Bradstock says.

With the launch of Autopilot, the range of options for how to run Kubernetes in the Google cloud expands. The trade-off is not only higher cost and less flexibility, but also potential disorientation of devops in factories. However, the main logic is that businesses are better off focusing on their core business rather than on the services that are performed by the contractor.

Google engineering has a much better reputation than customer service. Developer Kevin Lin recently described what the enrollment scheme looks like bonuses for startups at AWS and Google.

Google proved to be a slow and ineffective organization that ended up referring the client to a third-party partner. “The first conversation was all about how much money I plan to spend on Google (as opposed to calling Amazon where they wanted to help me get the service up and running). Google Cloud has really good ergonomics and world-class engineers, but a terrible reputation for customer service, ”he said.

This is further proof that good engineers are not the only important factor when choosing a cloud.
KlauS 14 march 2021, 18:58
Vote for this post
Bring it to the Main Page


Leave a Reply

Avaible tags
  • <b>...</b>highlighting important text on the page in bold
  • <i>..</i>highlighting important text on the page in italic
  • <u>...</u>allocated with tag <u> text shownas underlined
  • <s>...</s>allocated with tag <s> text shown as strikethrough
  • <sup>...</sup>, <sub>...</sub>text in the tag <sup> appears as a superscript, <sub> - subscript
  • <blockquote>...</blockquote>For  highlight citation, use the tag <blockquote>
  • <code lang="lang">...</code>highlighting the program code (supported by bash, cpp, cs, css, xml, html, java, javascript, lisp, lua, php, perl, python, ruby, sql, scala, text)
  • <a href="http://...">...</a>link, specify the desired Internet address in the href attribute
  • <img src="http://..." alt="text" />specify the full path of image in the src attribute