Using Gatekeeper in Kubernetes

May 28, 2020

Introduction

Gatekeeper allows a Kubernetes administrator to implement policies for ensuring compliance and best practices in their cluster. It makes use of Open Policy Agent (OPA) and is a validating admission controller. The policies are written in the Rego language. Gatekeeper embraces Kubernetes native concepts such as Custom Resource Definitions (CRDs) and hence the policies are managed as kubernetes resources. The GKE docs on this topic are a good place to learn more.

Before we dive deep into Gatekeeper itself, let’s first familiarize ourselves with the Rego language. One point worth nothing is Rego and OPA can be used for policy enforcement beyond Kubernetes, however, we are going to focus on Kubernetes objects.

Writing our first policy

Let’s look at a policy which will fail if the namespace of an object is default:

package k8svalidnamespace
        
violation[{"msg": msg, "details": {}}] {
  value := input.review.object.metadata.namespace
  value == "default"
  msg := sprintf("Namespace should not be default: %v", [value])
}

The first line of this policy defines a namespace or package for the policy. Each policy must reside in a package.

Next, we define a violation block which “returns” two objects, “msg” and “details” to the calling framework. If you are coming to gatekepper from OPA documentation, you will notice that OPA has deny block, whereas gatekeeper has violation blocks. I am not sure why, but this was changed in gatekeeper a while back. This is the “entrypoint” for a rule as per the OPA constraint framework guide.

The statements inside this block i.e. inside the {} are Rego expressions.

The expression value := input.review.object.metadata.namespace assigns the value of input.review.object.metadata.namespace to the variable value. The input object contains the entire JSON object that Gatekeeper provides to the policy when evaluating it.

Next, we check whether the value of this variable is “default” using value == "default". Only if this condition evaluates to true, the policy will be violated. If we have more than one conditional statement, all the comparisons must evaluate to true for the policy to be evaluted (see next example below).

In the final line of the policy, we use the sprintf function to construct an error message which is stored in the msg object and hence automatically “returned”.

Given the above policy and an input document, let’s test it out in the Rego playground.

For reference, the input is:

{
    "kind": "AdmissionReview",
    "parameters": {},
    "review": {
        "kind": {
            "kind": "Pod",
            "version": "v1"
        },
        "object": {
            "metadata": {
                "name": "myapp",
                "namespace": "default"
            },
            "spec": {
                "containers": []
            }
        }
    }
}

The output you will see is:

{
    "violation": [
        {
            "details": {},
            "msg": "Namespace should not be default: default"
        }
    ]
}

Policy with two conditions in a rule

Let’s now say that in addition to check if the namespace is default, we also want to check if the namespace is an empty string. In other words, we want the policy to be violated if either the namespace is empty or the namespace is default. Here’s the first version of the policy which doesn’t work as expected:

package k8svalidnamespace
        
violation[{"msg": msg, "details": {}}] {          
  value := input.review.object.metadata.namespace          
  value == ""
  value == "default"
  msg := sprintf("Namespace should not be default: %v", [value])
}

I wrote this version in a hurry and I don’t know what I was expecting. Someone in open policy agent slack then pointed me to the issue. Even then we can use the above wrong policy to understand a bit more about how policy evaluation works. Given the same input as the first policy, the policy evaluation will stop at the expression, value == "". It evaluates to false and hence the above rule is not violated and hence we wouldn’t see any violations.

In addition, consider the following input document:

{
    "kind": "AdmissionReview",
    "parameters": {},
    "review": {
        "kind": {
            "kind": "Pod",
            "version": "v1"
        },
        "object": {
            "metadata": {
                "name": "myapp",
                "namespace": ""
            },
            "spec": {
                "containers": []
            }
        }
    }
}

When we evaluate the policy above with the above input document, the first comparison (value == "") evaluates to true, but the second comparsion (value == "default") evaluates to false. Hence, the policy isn’t violated - not what we wanted.

As the last case, let’s consider an input document with no namespace defined at all:

{
    "kind": "AdmissionReview",
    "parameters": {},
    "review": {
        "kind": {
            "kind": "Pod",
            "version": "v1"
        },
        "object": {
            "metadata": {
                "name": "myapp"                
            },
            "spec": {
                "containers": []
            }
        }
    }
}

When given this input document, via some Rego magic, the policy is not evaluated at all. Perhaps it detects that the input object doesn’t have the namespace field defined and hence decides not to evaluate and hence there is no violation of the policy.

OR rules

Let’s now write the correct version of the policy to cause a violation if either the namespace is undefined, empty string or default:

package k8svalidnamespace
     
violation[{"msg": msg, "details": {}}] {
  not input.review.object.metadata.namespace
  msg := "Namespace should not be unspecified"          
}
        
violation[{"msg": msg, "details": {}}] {
  value := input.review.object.metadata.namespace
  count(value) == 0
  msg := sprintf("Namespace should not be empty: %v", [value])          
}
        
violation[{"msg": msg, "details": {}}] {
  value := input.input.review.object.metadata.namespace
  value == "default"
  msg := sprintf("Namespace should not be default: %v", [value])          
}

We have three violation blocks in the above policy each containing one conditional expression. The entire policy will be violated if any of the violation blocks are true.

Invalid input - Unspecified namespace

Let’s consider an input document with no namespace specified:


 {
        "kind": "AdmissionReview",
        "parameters": {},
        "review": {
            "kind": {
                "kind": "Pod",
                "version": "v1"
            },
            "object": {
                "metadata": {
                    "name": "myapp"
                },
                "spec": {
                    "containers": []
                }
            }
        }
    }

When the above policy is evaluated given the above input document, the first rule evaluates to true and hence we have a violation. The other rules are not evaluated at all - not because the first rule evaluates to true, but because the object doesn’t have the namespace field.

Invalid input - Empty namespace

Let’s now consider the following input document:


 {
        "kind": "AdmissionReview",
        "parameters": {},
        "review": {
            "kind": {
                "kind": "Pod",
                "version": "v1"
            },
            "object": {
                "metadata": {
                    "name": "myapp",
                    "namespace": ""
                    
                },
                "spec": {
                    "containers": []
                }
            }
        }
    }

For this policy, the first rule is not violated, but the second rule is, and the third rule is not violated either.

Invalid input - default namespace

Now, consider the input document as:


 {
        "kind": "AdmissionReview",
        "parameters": {},
        "review": {
            "kind": {
                "kind": "Pod",
                "version": "v1"
            },
            "object": {
                "metadata": {
                    "name": "myapp",
                    "namespace": "default"
                    
                },
                "spec": {
                    "containers": []
                }
            }
        }
    }

For this input document, only the last rule is violated and we get a violation from the policy.

Valid Input

Now, consider the following input document:


 {
        "kind": "AdmissionReview",
        "parameters": {},
        "review": {
            "kind": {
                "kind": "Pod",
                "version": "v1"
            },
            "object": {
                "metadata": {
                    "name": "myapp",
                    "namespace": "default1"
                    
                },
                "spec": {
                    "containers": []
                }
            }
        }
    }

For the above input, the policy will report no violations.

A more complicated policy

Let’s now write a policy to ensure that only containers from certain repositories should be allowed to run on the cluster:

package k8sallowedrepos

violation[{"msg": msg}] {
  container := input.review.object.spec.containers[_]
  satisfied := [good | repo = input.parameters.repos[_] ; good = startswith(container.image, repo)]
  not any(satisfied)
  msg := sprintf("container <%v> has an invalid image repo <%v>, allowed repos are %v", [container.name, container.image, input.parameters.repos])
}

The first line of the violation block is:

container := input.review.object.spec.containers[_]

The above expression essentially boils down to the container variable containing a list of all elements in input the containers object. To learn more about the special _ index, see the documentation.

The second line of the violation block is:

satisfied := [good | repo = input.parameters.repos[_] ; good = startswith(container.image, repo)]

The above line is an example of comprehension and it essentially executes the following pseudocode:

For each repo in the list of allowed repos
  For each container in the list of container objects
    Is container.image starting with any of the repos in the list of allowed repos?
    If so, append "true" to the array "satisfied", else append "false"
  End For
  # Evalute the rule not any(satisfied) and report violation if any
End For

The result of the above is an array satisfied with the same number of elements as the number of allowed repos in the input.parameters.repos object, with each value being true or false.

The third line of the violation block is our condition, not any(satisfied). any(satisfied) evaluates to true if any of the values in the satisfied list is true and false otherwise. It’s really important to note here that lines 2-4 in the violation block are “executed” for each item in the container array.

Hence, given the following input document:

{
  "kind": "AdmissionReview",
  "parameters": {
    "repos": [      
      "quay.io/calico",      
      "k8s.gcr.io",
      "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni"
    ]
  },
  "review": {
    "kind": {
      "kind": "Pod",
      "version": "v1"
    },
    "object": {
      "spec": {
        "containers": [
          {
            "image": "amazon-k8s-cni",
            "name": "mysql-backend"
          },
          {
            "image": "nginx",
            "name": "nginx-frontend"
          }          
        ]
      }
    }
  }
}

We will see the following as the output: (Rego playground link)

{
    "violation": [
        {
            "msg": "container <mysql-backend> has an invalid image repo <amazon-k8s-cni>, allowed repos are [\"277433404353.dkr.ecr.eu-central-1.amazonaws.com\", \"quay.io/open-policy-agent\", \"quay.io/calico\", \"quay.io/kubernetes-ingress-controller\", \"k8s.gcr.io\", \"602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni\"], satisfied: [false, false, false, false, false, false]"
        },
        {
            "msg": "container <nginx-frontend> has an invalid image repo <nginx>, allowed repos are [\"277433404353.dkr.ecr.eu-central-1.amazonaws.com\", \"quay.io/open-policy-agent\", \"quay.io/calico\", \"quay.io/kubernetes-ingress-controller\", \"k8s.gcr.io\", \"602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni\"], satisfied: [false, false, false, false, false, false]"
        }
    ]
}

Rego Unanswered Questions

I am still trying to get my head around Rego. Here’s some questions I have:

Difference between “=” and “:=”
Lot more than I can write here, hopefully will be updated.

Setting up Gatekeeper

Install Gatekeeper as per instructions here. The following resources are created:

ClusterRole:

        - gatekeeper-manager-role from gatekeeper.yaml

ClusterRoleBinding:

        - gatekeeper-manager-rolebinding from gatekeeper.yaml

CustomResourceDefinition:

        - configs.config.gatekeeper.sh from gatekeeper.yaml
        - constrainttemplates.templates.gatekeeper.sh from gatekeeper.yaml

Deployment:

        - gatekeeper-controller-manager in gatekeeper-system from gatekeeper.yaml

Namespace:

        - gatekeeper-system from gatekeeper.yaml

Role:

        - gatekeeper-manager-role in gatekeeper-system from gatekeeper.yaml

RoleBinding:

        - gatekeeper-manager-rolebinding in gatekeeper-system from gatekeeper.yaml

Secret:

        - gatekeeper-webhook-server-cert in gatekeeper-system from gatekeeper.yaml

Service:

        - gatekeeper-webhook-service in gatekeeper-system from gatekeeper.yaml

ServiceAccount:

        - gatekeeper-admin in gatekeeper-system from gatekeeper.yaml

ValidatingWebhookConfiguration:

        - gatekeeper-validating-webhook-configuration from gatekeeper.yaml

In addition, you may need to create sync configuration for replicating data.

Creating a constraint template

Now that we have gatekeeper components installed, the first concept we need to learn is that of a ConstraintTemplate - which lays down the schema of the data as well as the policy itself in the Rego language.

The ConstraintTemplate kind is used to create a new constraint template with the name being K8sRequiredLabels:

apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: k8srequiredlabels
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredLabels
        listKind: K8sRequiredLabelsList
        plural: k8srequiredlabels
        singular: k8srequiredlabels
      validation:
        # Schema for the `parameters` field
        openAPIV3Schema:
          properties:
            labels:
              type: array
              items: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequiredlabels

        violation[{"msg": msg, "details": {"missing_labels": missing}}] {
          provided := {label | input.review.object.metadata.labels[label]}
          required := {label | label := input.parameters.labels[_]}
          missing := required - provided
          count(missing) > 0
          msg := sprintf("you must provide labels: %v", [missing])
        }

Once we create the above constraint template, we can list it using kubectl:

$ kubectl get constrainttemplates.templates.gatekeeper.sh                                                            │
NAME                AGE                                                                                                                      
k8srequiredlabels   99s

Creating a constraint

Let’s now define a constraint using the constraint template, K8sRequiredLabels (kind: K8sRequiredLabels):

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: ns-must-have-gk
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Namespace"]
  parameters:
    labels: ["gatekeeper"]

Let’s create the constraint:

$ kubectl apply -f required_labels.yaml 
k8srequiredlabels.constraints.gatekeeper.sh/ns-must-have-gk created

We can use kubectl get to fetch constraints of this template type:

$ kubectl get k8srequiredlabels.constraints.gatekeeper.sh
NAME              AGE
ns-must-have-gk   77s

Testing the constraint

Let’s now test this constraint by creating a namespace without the label:

apiVersion: v1
kind: Namespace
metadata:
  name: test

If we now run kubectl apply on the above definition, we will get:

$ kubectl apply -f ns.yaml 
Error from server ([denied by ns-must-have-gk] you must provide labels: {"gatekeeper"}): error when creating "ns.yaml": admission webhook "validation.gatekeeper.sh" denied the request: [denied by ns-must-have-gk] you must provide labels: {"gatekeeper"}

Audit

Gatekeeper by default has an auditing functionality via which it evaluates the constraints and stores the audit results on the constraint’s status field. For this purpose, Gatekeeper will query the Kubernetes API for the resources that your constraint specifies and validate the resources against the constraints.

Here’s an example:

$ kubectl get k8srequiredlabels.constraints.gatekeeper.sh -o yaml                                                    
apiVersion: v1
items:
- apiVersion: constraints.gatekeeper.sh/v1beta1
  kind: K8sRequiredLabels
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"constraints.gatekeeper.sh/v1beta1","kind":"K8sRequiredLabels","metadata":{"annotations":{},"name":"ns-must-have-gk"},"spec":{"match":{"kinds":[{"apiGroups":[""],"kinds":["Namespace"]}]},"parameters":{"labels":["gatekeeper"]}}}
    creationTimestamp: "2020-05-21T04:21:17Z"
    generation: 1
    name: ns-must-have-gk
    resourceVersion: "1722780"
    selfLink: /apis/constraints.gatekeeper.sh/v1beta1/k8srequiredlabels/ns-must-have-gk
    uid: 640dee9f-8f3e-4f3a-9716-599f54cbd18b
  spec:
    match:
      kinds:
      - apiGroups:
        - ""
        kinds:
        - Namespace
    parameters:
      labels:
      - gatekeeper
  status:
    auditTimestamp: "2020-05-21T04:40:17Z"
    byPod:
    - enforced: true
      id: gatekeeper-controller-manager-55bfb4d454-w6424
      observedGeneration: 1
    totalViolations: 7
    violations:
    - enforcementAction: deny
      kind: Namespace
      message: 'you must provide labels: {"gatekeeper"}'
      name: default
    - enforcementAction: deny
      kind: Namespace
      message: 'you must provide labels: {"gatekeeper"}'
      name: gatekeeper-system
    - enforcementAction: deny
      kind: Namespace
      message: 'you must provide labels: {"gatekeeper"}'
      name: gitlab
    - enforcementAction: deny
      kind: Namespace
      message: 'you must provide labels: {"gatekeeper"}'
      name: kube-node-lease
    - enforcementAction: deny
      kind: Namespace
      message: 'you must provide labels: {"gatekeeper"}'
      name: kube-public
    - enforcementAction: deny
      kind: Namespace
      message: 'you must provide labels: {"gatekeeper"}'
      name: kube-system
    - enforcementAction: deny
      kind: Namespace
      message: 'you must provide labels: {"gatekeeper"}'
      name: logging
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

The above shows us the audit results on all the existing namespaces.

Rego playground and gatekeeper policies

To test a gatekeeper policy on the Rego playground, copy the entire rego policy in the rego object above. Now, for the input, we need to have an object like this:

{
  "kind": "AdmissionReview",
  "parameters": {
    "cpu": "300m",
    "memory": "2Gi"
  },
  "review": {
    "kind": {
      "kind": "Pod",
      "version": "v1"
    },
    "object": {
      "spec": {
        "containers": [
          {
            "image": "quay.io/calico/nginx",
            "name": "nginx-frontend",
            "resources": {
              "limits": {
                "cpu": "290m"
                
              }
            }
          },
          {
            "image": "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni",
            "name": "mysql-backend",
            "resources": {
              "limits": {
                "cpu": "400m",
                "memory": "1Gi"
              }
            }
          }
        ]
      }
    }
  }
}

The above object is available to your rego code as input object.

Gatekeeper constraint library

The gatekeeper library contains a few examples of constraint templates and constraints to enforce in your cluster.

Pod security policies

In a previous post, I discussed using pod security policies to enforce compliance and restrictions in a cluster. We can do the same making use of Gatekeeper constraints. The repository has a few examples here.

Dry run mode

For any constraint, we can add the enforcementAction: dryrun to the spec to enforce it in a audit mode for existing and new resources. This will not disallow non-conformant resoures. This can be especially useful when rolling out constraints to an environment with existing workloads.

Example constraint spec:

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sValidNamespace
metadata:
  name: namespace-must-be-valid
spec:
  enforcementAction: dryrun
  ..

For constraints created with the enforcement action as dryrun, we can then find out the audit results in the output of kubectel get, like so:

kubectl describe k8svalidnamespace.constraints.gatekeeper.sh namespace-must-be-valid
Name:         namespace-must-be-valid
Namespace:    
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"constraints.gatekeeper.sh/v1beta1","kind":"K8sValidNamespace","metadata":{"annotations":{},"name":"namespace-must-be-valid"...
API Version:  constraints.gatekeeper.sh/v1beta1
Kind:         K8sValidNamespace
Metadata:
  Creation Timestamp:  2020-06-03T01:49:24Z
  Generation:          1
  Resource Version:    3421798
  Self Link:           /apis/constraints.gatekeeper.sh/v1beta1/k8svalidnamespace/namespace-must-be-valid
  UID:                 d9c171b2-9451-4a45-98c7-24a2d4e8a3e4
Spec:
  Enforcement Action:  dryrun
  Match:
    Kinds:
      API Groups:
        
      Kinds:
        ConfigMap
        CronJob
        DaemonSet
        Deployment
        Job
        NetworkPolicy
        PodDisruptionBudget
        Role
        RoleBinding
        StatefulSet
        Service
        Secret
        ServiceAccount
      API Groups:
        extensions
        networking.k8s.io
      Kinds:
        Ingress
Status:
  Audit Timestamp:  2020-06-03T04:05:45Z
  By Pod:
    Enforced:             true
    Id:                   gatekeeper-controller-manager-ff7c87585-h7cjh
    Observed Generation:  1
  Total Violations:       3
  Violations:
    Enforcement Action:  dryrun
    Kind:                Secret
    Message:             Namespace should not be default: default
    Name:                default-token-9xvts
    Namespace:           default
    Enforcement Action:  dryrun
    Kind:                ServiceAccount
    Message:             Namespace should not be default: default
    Name:                default
    Namespace:           default
    Enforcement Action:  dryrun
    Kind:                Service
    Message:             Namespace should not be default: default
    Name:                kubernetes
    Namespace:           default
Events:                  <none>

The Violations section above results all the violations of the constraint that were found.

Monitoring and Alerting

Gatekeeper exports several prometheus metrics covering various aspects of the behavior. If you have an existing prometheus setup in your cluster, all you need to do is add the following annotations to Gatekeeper’s controller-manager deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: gatekeeper-controller-manager
  namespace: gatekeeper-system
spec:
  ..
  template:
    metadata:
      annotations:
        prometheus.io/port: "8888"
        prometheus.io/scrape: "true"
        ..

Some of the key counter metrics to monitor are:

gatekeeper_constraints: Total number of constraints
gatekeeper_constraint_templates: Total number of constraint templates
gatekeeper_violations: Total number of constraint violations
request_count: Total number of requests to gatekeeper

The enforcement_action label is available for the gatekeeper_constraints and gatekeeper_violations constraints and can have a value of dryrun, active and error.

The status label is available for the gatekeeper_constraint_templates metric and can take the value of active and error.

The request_count metric has a label, admission_status which is useful for understanding the distribution of allow and deny requests.

Metrics related to the sync/replicating data are available in the v3.1.0-beta.9 release.

All the available metrics are documented here.

Some useful prometheus alerts can be:

Alert when we have a spike of active constraints violated
Alert when the last audit run was X minutes back

Exploring Software