為什麼 EKS worker node 可以自動加入 EKS cluster(二)?

接續前一篇,我們了解了 EKS node group 定義,並知道了 EKS worker node 使用了 EKS optimized Amazon Linux AMI 內預先設置好的 bootstrap 設定 container runtime、kubelet 等設定。

Kubernetes 為確保 Control Plane 及 worker node 溝通安全性,於 Kubernetes 1.4 導入了 certificate request 及 signing API 機制 [1]。當然 EKS worker node 也皆是基於原生 Kubernetes 架構實現,EKS worker node 使用 kubelet TLS bootstrapping 流程與 API server 進行溝通。

故本文將繼續 EKS worker node 如何讓 kubelet 自動化完成 TLS bootstrap 流程,自動加入 EKS cluster。

kubelet bootstrap 初始化步驟

根據 Kubernetes TLS bootstrapping [2] 文件,kubelet 在 bootstrap 初始化步驟如下:

  1. kubelet 啟動。
  2. kubelet 查看是否有相應 kubeconfig 設定檔。
  3. kueblet 檢視是否有對應 bootstrap-kubeconfig 設定檔。
  4. 基於 bootstrap 設定檔,取得 API server endpoint 及有限制權限 token。
  5. 透過上述 token 認證(authenticates)與 API server 建立連線。
  6. kubelet 使用建立有限制權限 credentials 建立並取得 Certificate Signing Requests(CSR) [3]。
  7. kubelet 建立 CSR,並將 singerName 設定為 kubernetes.io/kube-apiserver-client-kubelet。
  8. CSR 被核准。可以透過以下兩種方式:
  • kube-controller-manager 自動核准 CSR
  • 透過外部流程或是人為核准,透過 kubectl 或是 Kubernetes API 方式核准 CSR
  1. kubelet 所需要的憑證(certificate)被建立。
  2. 憑證(certificate)被發送給 kubelet。
  3. kubelet 取得該憑證(certificate)。
  4. kubelet 建立 kubeconfig,其中包含金鑰和已簽名的憑證。
  5. kubelet 開始正常執行
  6. (選擇性設定)kubelet 在憑證接近於過期時間將會自動請求更新憑證
  7. 更新憑證會被核准並發證。

驗證 EKS 設定

在設定上,為達到自動化核准發證流程,我們必須設定以下元件,並需要 Kubernetes Certificate Authority (CA):

  • kube-apiserver
  • kube-controller-manager
  • kubelet

kubelet

首先,透過 systemctl 命令查看 kubelet systemd unit 設定檔:

[ec2-user@ip-192-168-65-212 ~]$ systemctl cat kubelet
# /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Documentation=https://github.com/kubernetes/kubernetes
After=docker.service iptables-restore.service
Requires=docker.service

[Service]
ExecStartPre=/sbin/iptables -P FORWARD ACCEPT -w 5
ExecStart=/usr/bin/kubelet --cloud-provider aws \
    --config /etc/kubernetes/kubelet/kubelet-config.json \
    --kubeconfig /var/lib/kubelet/kubeconfig \
    --container-runtime docker \
    --network-plugin cni $KUBELET_ARGS $KUBELET_EXTRA_ARGS

Restart=always
RestartSec=5
KillMode=process

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/kubelet.service.d/10-kubelet-args.conf
[Service]
Environment='KUBELET_ARGS=--node-ip=192.168.65.212 --pod-infra-container-image=602401143452.dkr.ecr.eu-west-1.amazonaws.com/eks/pause:3.5 --v=2'
# /etc/systemd/system/kubelet.service.d/30-kubelet-extra-args.conf
[Service]
Environment='KUBELET_EXTRA_ARGS=--node-labels=eks.amazonaws.com/sourceLaunchTemplateVersion=1,alpha.eksctl.io/nodegroup-name=ng1-public-ssh,alpha.eksctl.io/cluster-name=ironman-2022,eks.amazonaws.com/nodegroup-image=ami-0ec9e1727a24fb788,eks.a

可以觀察到設定了:

  • kubeconfig:/var/lib/kubelet/kubeconfig 使用了 kubelet username 並透過 aws-iam-authenticator 取得 cluster token。
[ec2-user@ip-192-168-65-212 ~]$ cat /var/lib/kubelet/kubeconfig
apiVersion: v1
kind: Config
clusters:
- cluster:
    certificate-authority: /etc/kubernetes/pki/ca.crt
    server: https://A8E7A39CAEBEF6AA9250DFA9366FDFA2.gr7.eu-west-1.eks.amazonaws.com
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: kubelet
  name: kubelet
current-context: kubelet
users:
- name: kubelet
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1alpha1
      command: /usr/bin/aws-iam-authenticator
      args:
        - "token"
        - "-i"
        - "ironman-2022"
        - --region
        - "eu-west-1"

此 kubeconfig 也再次驗證了 EKS optimized Amazon Linux AMI 預設安裝了 aws-iam-authenticator [4] 以提供 EKS worker node 上 kubelet 進行驗證(authenticate)流程。

  • kubelet config:/etc/kubernetes/kubelet/kubelet-config.json
[ec2-user@ip-192-168-90-19 ~]$ cat /etc/kubernetes/kubelet/kubelet-config.json
{
  "kind": "KubeletConfiguration",
  "apiVersion": "kubelet.config.k8s.io/v1beta1",
  "address": "0.0.0.0",
  "authentication": {
    "anonymous": {
      "enabled": false
    },
    "webhook": {
      "cacheTTL": "2m0s",
      "enabled": true
    },
    "x509": {
      "clientCAFile": "/etc/kubernetes/pki/ca.crt"
    }
  },
  "authorization": {
    "mode": "Webhook",
    "webhook": {
      "cacheAuthorizedTTL": "5m0s",
      "cacheUnauthorizedTTL": "30s"
    }
  },
  "clusterDomain": "cluster.local",
  "hairpinMode": "hairpin-veth",
  "readOnlyPort": 0,
  "cgroupDriver": "cgroupfs",
  "cgroupRoot": "/",
  "featureGates": {
    "RotateKubeletServerCertificate": true
  },
  "protectKernelDefaults": true,
  "serializeImagePulls": false,
  "serverTLSBootstrap": true,
  "tlsCipherSuites": [
    "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256",
    "TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256",
    "TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305",
    "TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384",
    "TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305",
    "TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384",
    "TLS_RSA_WITH_AES_256_GCM_SHA384",
    "TLS_RSA_WITH_AES_128_GCM_SHA256"
  ],
  "clusterDNS": [
    "10.100.0.10"
  ],
  "evictionHard": {
    "memory.available": "100Mi",
    "nodefs.available": "10%",
    "nodefs.inodesFree": "5%"
  },
  "kubeReserved": {
    "cpu": "70m",
    "ephemeral-storage": "1Gi",
    "memory": "574Mi"
  }
}

在這些 kubelet flag [5] 其中我們關注與 TLS Bootstrap 有關 flag:

  • serverTLSBootstrap: true:此將允許來自 certificates.k8s.io API
    給予 kubelet 使用 kubelet serving certificates。然而此為一個 已知安全性限制 [6],此類型 certificate 無法透過 kube-controller-manager - kubernetes.io/kubelet-serving 自動化核准 CSR,則需要仰賴使用者或是第三方 controller。
  • featureGates.RotateKubeletServerCertificate: ture:kubelet 在 bootstrapping 自身 client credentials 後請求 serving certificate 並自動替換憑證。

kube-apiserver

The kube-apiserver 需要以下三點條件才能啟用 TLS bootstrapping:

  • 設別 client certificate 的 CA
  • 驗證(authenticate) bootstrapping kubelet 並設定於 system: bootstrappers group
  • 授權(authorize) bootstrapping kubelet 建立 CSR

其中驗證流程,由上述提及透過 EKS optimized Amazon Linux AMI 所定義 bootstrap.sh script 定義 kubeconfig 使用  aws-iam-authenticator 。而 kubelet 在經由 API server 認證後,由 RBAC system: node role 和 Node authorizer [7] 將允許節點建立和讀取 CSRs。因此我們可以在 EKS 預設 eks: node-bootstrapper role 上檢視:

  • 給予 kubelet 權限提交 CSR
$ kubectl get clusterrole eks:node-bootstrapper -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRole","metadata":{"annotations":{},"labels":{"eks.amazonaws.com/component":"node"},"name":"eks:node-bootstrapper"},"rules":[{"apiGroups":["certificates.k8s.io"],"resources":["certificatesigningrequests/selfnodeserver"],"verbs":["create"]}]}
  creationTimestamp: "2022-09-14T09:46:17Z"
  labels:
    eks.amazonaws.com/component: node
  name: eks:node-bootstrapper
  resourceVersion: "283"
  uid: eb23d8fe-dfdf-4f01-aba7-72ca32b52ad7
rules:
- apiGroups:
  - certificates.k8s.io
  resources:
  - certificatesigningrequests/selfnodeserver
  verbs:
  - create
  • cluster role eks: node-bootstrapper 綁定 system: bootstrapperssystem: nodes group
$ kubectl get clusterrolebindings eks:node-bootstrapper -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRoleBinding","metadata":{"annotations":{},"labels":{"eks.amazonaws.com/component":"node"},"name":"eks:node-bootstrapper"},"roleRef":{"apiGroup":"rbac.authorization.k8s.io","kind":"ClusterRole","name":"eks:node-bootstrapper"},"subjects":[{"apiGroup":"rbac.authorization.k8s.io","kind":"Group","name":"system:bootstrappers"},{"apiGroup":"rbac.authorization.k8s.io","kind":"Group","name":"system:nodes"}]}
  creationTimestamp: "2022-09-14T09:46:16Z"
  labels:
    eks.amazonaws.com/component: node
  name: eks:node-bootstrapper
  resourceVersion: "282"
  uid: 867196bc-b84a-410d-8d4b-cbf52d840108
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: eks:node-bootstrapper
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:bootstrappers
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:nodes
  • 透過 CloudWatch Logs insight syntax 檢視 kube-apiserver logs 一樣可以查看到 --authorization-mode 使用了 Node 及 RBAC 兩種授權模式。
filter @logStream not like /^kube-apiserver-audit/
 | filter @logStream like /^kube-apiserver-/
 | fields @timestamp, @message
 | sort @timestamp asc
 | filter @message like "--authorization-mode"
 | limit 10000
I0914 09:46:06.295709 10 flags.go:59] FLAG: --authorization-mode="[Node,RBAC]"

kube-controller-manager

EKS 使用了 kubelet serving certificates 方式,此方式僅能透過第三方 controller 方式核准 CSR。此部分,我們也可以再次使用 CloudWatch Logs insight syntax 語法檢視此 flag 可以觀察到停用原生 Kubernetes csrsigning controller。

filter @logStream not like /^kube-controller-manager/
 | fields @timestamp, @message
 | sort @timestamp asc
 | filter @message like "--controller"
 | limit 10000
I0914 09:51:40.684692 11 flags.go:59] FLAG: --controllers="[*,-csrsigning]"

實際啟用 EKS 節點

我們透過 eksctl scale 命令 scale up 一個 node 來查看檢視相應元件設定,及 TLS bootstrap 流程:

$ eksctl scale ng --nodes=3 --name=ng1-public-ssh --cluster=ironman-2022
2022-09-19 09:44:52 [ℹ]  scaling nodegroup "ng1-public-ssh" in cluster ironman-2022
2022-09-19 09:44:53 [ℹ]  waiting for scaling of nodegroup "ng1-public-ssh" to complete
2022-09-19 09:46:31 [ℹ]  nodegroup successfully scaled

透過 kubectl get CSR 可以先觀察到 由 node system: node: ip-192-168-65-212.eu-west-1.compute.internal 作為 requestor 發起 CSR。約 11 秒後,此 CSR 被 Approved。接續,CSR 被 Issued 給 node。


$ kubectl get node
NAME                                           STATUS     ROLES    AGE     VERSION
...
ip-192-168-65-212.eu-west-1.compute.internal   NotReady   <none>   0s      v1.22.12-eks-ba74326

$ kubectl get csr
NAME        AGE   SIGNERNAME                      REQUESTOR                                                  REQUESTEDDURATION   CONDITION
csr-kdkll   11s   kubernetes.io/kubelet-serving   system:node:ip-192-168-65-212.eu-west-1.compute.internal   <none>              Approved

$ kubectl get csr
NAME        AGE   SIGNERNAME                      REQUESTOR                                                  REQUESTEDDURATION   CONDITION
csr-kdkll   16s   kubernetes.io/kubelet-serving   system:node:ip-192-168-65-212.eu-west-1.compute.internal   <none>              Approved,Issued

$ kubectl get node
NAME                                                STATUS   ROLES    AGE     VERSION
node/ip-192-168-18-254.eu-west-1.compute.internal   Ready    <none>   4d17h   v1.22.12-eks-ba74326
node/ip-192-168-40-16.eu-west-1.compute.internal    Ready    <none>   19h     v1.22.12-eks-ba74326
node/ip-192-168-65-212.eu-west-1.compute.internal   Ready    <none>   43s     v1.22.12-eks-ba74326
$ kubectl describe csr csr-kdkll
Name:               csr-kdkll
Labels:             <none>
Annotations:        <none>
CreationTimestamp:  Mon, 19 Sep 2022 09:46:59 +0000
Requesting User:    system:node:ip-192-168-65-212.eu-west-1.compute.internal
Signer:             kubernetes.io/kubelet-serving
Status:             Approved,Issued
Subject:
  Common Name:    system:node:ip-192-168-65-212.eu-west-1.compute.internal
  Serial Number:
  Organization:   system:nodes
Subject Alternative Names:
         DNS Names:     ec2-52-211-162-59.eu-west-1.compute.amazonaws.com
                        ip-192-168-65-212.eu-west-1.compute.internal
         IP Addresses:  192.168.65.212
                        52.211.162.59
Events:  <none>

由於並未能查看到 EKS 是否有使用第三方 controller 進行核准(approve)CSR。我們透過 CloudWatch Logs insight syntax 語法檢視 kube-apiserver-audit 日誌內記錄 CSR csr-kdkll 流程變化。

filter @logStream like /^kube-apiserver-audit/
 | fields @timestamp, @message
 | sort @timestamp asc
 | filter @message like 'csr-kdkll'
 | limit 10000
  • 2022-09-19 09:46:59 UTC+0:kubelet/v1.22.12 (linux/amd64) kubernetes/1fc8914 使用了建立了 CSR  csr-kdkll
  • user.username:`system: node: ip-192-168-65-212.eu-west-1.compute.internal
  • user.uidaws-iam-authenticator:111111111111: AROAYFMQSNSE3QYOZUIO6
  • responseObject.spec.signerNamekubernetes.io/kubelet-serving
  • responseObject.spec.usages
  • digital signature
  • key encipherment
  • server auth
  • 2022-09-19 09:46:59 UTC+0:eks: certificate-controller 更新 CSR 並核准(approve)CSR
  • user.usernameeks: certificate-controller
  • responseObject.status.conditions.0.messageAuto approving self kubelet server certificate after SubjectAccessReview.
  • responseObject.status.conditions.0.reasonAutoApproved

由上述可以了解此 CSR 符合 Kubernetes signers signerName kubernetes.io/kubelet-serving 而不被 kube-controller-manager 自動核准,此 CSR 則是由 EKS eks: certificate-controller 來自動核准。

而經由 CSR 核准並發證於 node 上。我們也可以於 /var/lib/kubelet/pki/ 目錄查看到 kubelet-server-current.pem 憑證。

[ec2-user@ip-192-168-65-212 ~]$ journalctl -u kubelet | grep "certificate signing request"
Sep 19 09:47:00 ip-192-168-65-212.eu-west-1.compute.internal kubelet[3347]: I0919 09:47:00.509710    3347 csr.go:262] certificate signing request csr-kdkll is approved, waiting to be issued
Sep 19 09:47:14 ip-192-168-65-212.eu-west-1.compute.internal kubelet[3347]: I0919 09:47:14.707915    3347 csr.go:258] certificate signing request csr-kdkll is issued
[ec2-user@ip-192-168-65-212 ~]$ sudo ls -al /var/lib/kubelet/pki/
total 4
drwxr-xr-x 2 root root   86 Sep 14 16:39 .
drwxr-xr-x 8 root root  182 Sep 14 16:39 ..
-rw------- 1 root root 1370 Sep 14 16:39 kubelet-server-2022-09-14-16-39-07.pem
lrwxrwxrwx 1 root root   59 Sep 14 16:39 kubelet-server-current.pem -> /var/lib/kubelet/pki/kubelet-server-2022-09-14-16-39-07.pem

總結

比對原生 Kubernetes 文件及 EKS 上 kube-apiserver、 kube-controller-manager 及 kubelet 元件設定後,我們可以確定 EKS 於 Control Plane 端設置了 eks: certificate-controller 提供自動核准(Auto-Aprroved)CSR 並 issue CSR 給予 kubelet 使用。在取得 CSR 後,node 則可以順利加入 EKS cluster。


上述資訊透過 EKS 所提供 Logs 來驗證上游 Kubernetes 運作原理,倘若上述內文有所錯誤,隨時可以留言或是私訊我。

參考文件

  1. Add proposal for kubelet TLS bootstrap - https://github.com/kubernetes/kubernetes/pull/20439/files
  2. TLS bootstrapping - https://kubernetes.io/docs/reference/access-authn-authz/kubelet-tls-bootstrapping/
  3. Certificate Signing Requests - https://kubernetes.io/docs/reference/access-authn-authz/certificate-signing-requests/
  4. AWS IAM Authenticator for Kubernetes - https://github.com/kubernetes-sigs/aws-iam-authenticator
  5. kubelet - https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/
  6. design: reduce scope of node on node object w.r.t ip #1982 - https://github.com/kubernetes/community/pull/1982
  7. Using Node Authorization - https://kubernetes.io/docs/reference/access-authn-authz/node/
  8. https://kubernetes.io/docs/reference/access-authn-authz/kubelet-tls-bootstrapping/#certificate-rotation