Kubernetes pod security policies
Welcome to this new blog post!
- Introduction
- Enforcing policies
- Using
kustomize
to manage policies - Rolling the policy changes out
- Multiple matching policies
- Conclusion
Introduction
Pod security policies are cluster level resources.
The Google cloud docs has some basic
human friendly docs. A psp
is a way to enforce certain policies that pod
needs to comply with before it’s allowed
to be scheduled to be run on the cluster - create or an update operation (perhaps a restart of the pod?). Essentially,
it is a type of a validating admission controller.
I should mention that I found it (later on) to think about pod security policies as a way to “control” various attributes of a pod. Hence, the pod spec is worth referring to simultaneously.
The summarized version of how pod security policies are enforced in practice is:
- Cluster admin creates a policy (
psp
) - Cluster admin creates a cluster role allowing usage of the policy
- CLuster admin creates a cluster role binding assigning subjects to the above role and hence allow usage of the policy
On an AWS EKS cluster, we can see there is an existing policy already defined:
$ kubectl describe psp
Name: eks.privileged
Settings:
Allow Privileged: true
Allow Privilege Escalation: true
Default Add Capabilities: <none>
Required Drop Capabilities: <none>
Allowed Capabilities: *
Allowed Volume Types: *
Allow Host Network: true
Allow Host Ports: 0-65535
Allow Host PID: true
Allow Host IPC: true
Read Only Root Filesystem: false
SELinux Context Strategy: RunAsAny
User: <none>
Role: <none>
Type: <none>
Level: <none>
Run As User Strategy: RunAsAny
Ranges: <none>
FSGroup Strategy: RunAsAny
Ranges: <none>
Supplemental Groups Strategy: RunAsAny
Ranges: <none>
The granular permissions are documented here, but the above policy essentially allows pods to be created with all the permissions available.
We also have an associated cluster role binding:
$ kubectl describe clusterrolebinding eks:podsecuritypolicy:authenticated
Name: eks:podsecuritypolicy:authenticated
Labels: eks.amazonaws.com/component=pod-security-policy
kubernetes.io/cluster-service=true
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRoleBinding","metadata":{"annotations":{"kubernetes.io/description":"Allow all...
kubernetes.io/description: Allow all authenticated users to create privileged pods.
Role:
Kind: ClusterRole
Name: eks:podsecuritypolicy:privileged
Subjects:
Kind Name Namespace
---- ---- ---------
Group system:authenticated
The details are documented in the EKS documentation above, but essentially the above role binding allows all
authenticated users (group: system:authenticated
) to make use of the above policy - or, any authenticated user
is allowed to run privileged pods with no policy enforced. Now, if we see which policy any pod is running with, it
will show that it is using the eks.privileged
policy:
$ kubectl -n <my-ns> get pod xledger-api-79c745d7d7-ng2j2 -o jsonpath='{.metadata.annotations.kubernetes\.io\/psp}'
eks.privileged
Now, the reason we have the default pod security policy and the binding is that there must be a pod security policy that is defined in your cluster to allow a pod to be scheduled for running if you have the admission controller enabled. If there was no default policy, no pod would be “admitted” by the cluster.
Enforcing policies
So, let’s say we want to make things better. One way to do would be to define workload specific policies and a default restricted policy. The workload specific policies would have certain privileged access, but not all and they would be explicitly granted via making use of service accounts. The default would however be the restricted policy. Let’s first look at the restricted policy which will apply to all authenticated “users”:
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: default
spec:
privileged: false
# Required to prevent escalations to root.
allowPrivilegeEscalation: false
# This is redundant with non-root + disallow privilege escalation,
# but we can provide it for defense in depth.
requiredDropCapabilities:
- ALL
# Allow core volume types.
volumes:
- 'configMap'
- 'emptyDir'
- 'projected'
- 'secret'
# Assume that persistentVolumes set up by the cluster admin are safe to use.
- 'persistentVolumeClaim'
hostNetwork: false
hostIPC: false
hostPID: false
runAsUser:
rule: 'MustRunAsNonRoot'
seLinux:
# This policy assumes the nodes are using AppArmor rather than SELinux.
rule: 'RunAsAny'
supplementalGroups:
rule: 'MustRunAs'
ranges:
# Forbid adding the root group.
- min: 1
max: 65535
fsGroup:
rule: 'MustRunAs'
ranges:
# Forbid adding the root group.
- min: 1
max: 65535
readOnlyRootFilesystem: false
To come up with the workload specific policies, we need to first figure out what kind of privileged access we need
to allow them to have. We will need to make sure that the custom policies we enforce account for
the permissions that these pods need. kube-psp-advisor is an useful
tool that helps us here. The inspect
sub-command can examine your cluster and generate pod security policies
as well as grants for those policies. Thus a starting point would be to examine each namespace of your cluster
where you have workloads and run:
$ kubectl-advise-psp inspect --grant -n <your namespace>
Once you have got all the policies you have for all the workloads, you will quickly see that for each workload, we will create a:
- Pod Security Policy
- Cluster Role
- Cluster Role Binding
Hence, to minimize duplication, we can make use of kustomize.
Using kustomize
to manage policies
We can use kustomize
base and overlays in the following manner to manage the various policies:
.
├── base
│ ├── kustomization.yaml
│ ├── kustomizeconfig.yaml
│ ├── psp.yaml
│ ├── rolebinding.yaml
│ └── role.yaml
├── overlays
│ ├── aws-node
│ ├── calico-node
│ ├── calico-typha-autoscaler
│ ├── coredns
│ ├── fluent-bit
│ ├── ingress-controllers
│ ├── restricted
..
Let’s look at the base/psp.yaml
:
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: default
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: 'docker/default,runtime/default'
seccomp.security.alpha.kubernetes.io/defaultProfileName: 'runtime/default'
labels:
kubernetes.io/cluster-service: "true"
We don’t define any policy at all here, but just define the PodSecurityPolicy
resource.
Let’s look at base/role.yaml
:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: psp-default
labels:
kubernetes.io/cluster-service: "true"
eks.amazonaws.com/component: pod-security-policy
rules:
- apiGroups:
- policy
resourceNames:
- default
resources:
- podsecuritypolicies
verbs:
- use
The above ClusterRole
allows using the default
pod security policy.
Tying the above role
and psp
resources is the ClusterRoleBinding
as follows in base/rolebinding.yaml
:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: psp-default
labels:
kubernetes.io/cluster-service: "true"
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: psp-default
The cluster role binding above doesn’t specify any subjects.
Before we look at the overlays, let’s look at the kustomization.yaml
:
resources:
- psp.yaml
- role.yaml
- rolebinding.yaml
configurations:
- kustomizeconfig.yaml
The interesting bit here for our purpose is the kustomizeconfig.yaml
file:
nameReference:
- kind: PodSecurityPolicy
fieldSpecs:
- path: rules/resourceNames
kind: ClusterRole
nameReference
transformer which I originally learned about from this
issue allows us to use the name of a resource in another
resource. If you look at the base configuration above, you may have been thinking how do we refer to the pod
security policy (kind: PodSecurityPolicy
) we generated in a overlay in the cluster role (kind: ClusterRole
) for
the overlay. nameReference
allows us to do just that. In plain terms, the above nameReference
transformer
essentially substitutes reference to rules/resourceNames
in ClusterRole
to the name of PodSecurityPolicy
generated in that specific overlay.
The result is that an overlay directory looks like this:
restricted
├── kustomization.yaml
└── restricted.yaml
The kustomization.yaml
file has the following contents:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../../base
namePrefix: restricted-
patches:
- restricted.yaml
The namePrefix
here is used to somewhat indicate the specific workload we are generating the policy for.
The restricted.yaml
file defines the overlay for the PodSecurityPolicy
and the ClusterRoleBinding
:
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: default
spec:
privileged: false
# Required to prevent escalations to root.
allowPrivilegeEscalation: false
# This is redundant with non-root + disallow privilege escalation,
# but we can provide it for defense in depth.
requiredDropCapabilities:
- ALL
# Allow core volume types.
volumes:
- 'configMap'
- 'emptyDir'
- 'projected'
- 'secret'
# Assume that persistentVolumes set up by the cluster admin are safe to use.
- 'persistentVolumeClaim'
hostNetwork: false
hostIPC: false
hostPID: false
runAsUser:
rule: 'MustRunAsNonRoot'
seLinux:
# This policy assumes the nodes are using AppArmor rather than SELinux.
rule: 'RunAsAny'
supplementalGroups:
rule: 'MustRunAs'
ranges:
# Forbid adding the root group.
- min: 1
max: 65535
fsGroup:
rule: 'MustRunAs'
ranges:
# Forbid adding the root group.
- min: 1
max: 65535
readOnlyRootFilesystem: false
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: psp-default
subjects:
- kind: Group
apiGroup: rbac.authorization.k8s.io
name: system:authenticated
Note that we don’t need to define the ClusterRole
in the overlay at all. If we look at the ClusterRole
definition
in the base/role.yaml
file above, we will see that it only needs reference to the PodSecurityPolicy
name that
will be generated. The nameReference
transformer takes care of that.
With the above overlay, when we run kustomize build
, we get the following:
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: docker/default,runtime/default
seccomp.security.alpha.kubernetes.io/defaultProfileName: runtime/default
labels:
kubernetes.io/cluster-service: "true"
name: restricted-default
spec:
allowPrivilegeEscalation: false
fsGroup:
ranges:
- max: 65535
min: 1
rule: MustRunAs
hostIPC: false
hostNetwork: false
hostPID: false
privileged: false
readOnlyRootFilesystem: false
requiredDropCapabilities:
- ALL
runAsUser:
rule: MustRunAsNonRoot
seLinux:
rule: RunAsAny
supplementalGroups:
ranges:
- max: 65535
min: 1
rule: MustRunAs
volumes:
- configMap
- emptyDir
- projected
- secret
- persistentVolumeClaim
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
eks.amazonaws.com/component: pod-security-policy
kubernetes.io/cluster-service: "true"
name: restricted-psp-default
rules:
- apiGroups:
- policy
resourceNames:
- restricted-default
resources:
- podsecuritypolicies
verbs:
- use
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
labels:
kubernetes.io/cluster-service: "true"
name: restricted-psp-default
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: restricted-psp-default
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: Group
name: system:authenticated
Rolling the policy changes out
Once we have written our policies and applied them to the cluster, they will not affect any of the currently running workloads unless any of the pods has been killed and hence restarted, etc.
Hence, to “switch over” the current workloads to use the policies we created, we will need to do the following:
- Remove the existing default
ClusterRoleBinding
- Restart the existing workloads -
kubectl rollout restart
really helps here
This step is prone to cause interruptions if the policy has not been set correctly or there are multiple
matching policies (see next). Hence, exercise caution. In my experience kube-psp-advisor
really helped here.
Multiple matching policies
To summarize how a pod creation operation and pod security policies admission controller interacts:
- Pod creation request is received
- An attempt is made to find a matching policy for the pod
- If a matching policy is found, is this policy allowed to be used by the pod is checked
- If the above check passes, the pod is “admitted”, else “rejected”.
Now, what happens if we have multiple matching policies in step 2? The kubernetes documentation on this topic has changed between releases , but illustrates another aspect of pod security policy - mutating and non-mutating. We have established that each pod has to have a pod security policy enabled. Now, the pod security policy that matches a pod doesn’t need to specify all the various fields. In that scenario, the fields not specified will be attached to the pod with their default values. Thus, this is a “mutating” pod security policy. However, if the policy specified all fields, this would be attached as is to the pod and hence be a “non-mutating” pod security policy.
For kubernetes 1.14, this is what the documentation says will happen when there are multiple matching policies:
- If any policies successfully validate the pod without altering it, they are used.
- If it is a pod creation request, then the first valid policy in alphabetical order is used.
- Otherwise, if it is a pod update request, an error is returned, because pod mutations are disallowed during update operation
(1) above is really confusing and hence it has been fixed in the docs for a while and is currently this:
- PodSecurityPolicies which allow the pod as-is, without changing defaults or mutating the pod, are preferred. The order of these non-mutating PodSecurityPolicies doesn’t matter.
- If the pod must be defaulted or mutated, the first PodSecurityPolicy (ordered by name) to allow the pod is selected.
Note: During update operations (during which mutations to pod specs are disallowed) only non-mutating PodSecurityPolicies are used to validate the pod.
The logic is implemented in the Kubernetes source code here
I would like to add one more point to the above which matches the source code which is - even if there is a mutating pod security policy, it will prefer a non-mutating policy if it exists. This is still subject to the permission check which happens after a matching policy is found.
Conclusion
Pod Security policies are a great way to enforce compliance on your workloads as a cluster admin. Some links to resources which I made use of while working on this are: