Notes on Kubernetes

Introduction

This in-progress page lists some of my findings while working with Kubernetes.

EKS cluster setup

You may also find this guide from spacelift.io useful.

This section will have findings that are relevant when working with an AWS EKS cluster.

Terraform configuration for master

This is based on the tutorial from the Terraform folks here. Unlike the tutorial though, I assume that you already have the VPC and subnets you want to setup your EKS master in.

First up, the master. There are three main category of AWS resources we will need to create:

IAM role

resource "aws_iam_role" "cluster" {
  name = "eks-cluster-${var.environment}"

  assume_role_policy = <<POLICY
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "eks.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
POLICY
}

resource "aws_iam_role_policy_attachment" "cluster-AmazonEKSClusterPolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role       = "${aws_iam_role.cluster.name}"
}

resource "aws_iam_role_policy_attachment" "cluster-AmazonEKSServicePolicy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSServicePolicy"
  role       = "${aws_iam_role.cluster.name}"
}

Security group and rules

resource "aws_security_group" "cluster" {
  name        = "eks-cluster-${var.environment}"
  description = "Cluster communication with worker nodes"
  vpc_id      = "${var.vpc_id}"

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "eks-cluster-${var.environment}"
  }
}

resource "aws_security_group_rule" "cluster-ingress-workstation-https" {
  cidr_blocks       = "${var.ssh_access_ip}"
  description       = "Allow local workstation to communicate with the cluster API Server"
  from_port         = 443
  protocol          = "tcp"
  security_group_id = "${aws_security_group.cluster.id}"
  to_port           = 443
  type              = "ingress"
}

EKS master

We create an EKS master with the following key attributes:

These private subnet IDs must have a tag - kubernetes.io/cluster/<your cluster name>: shared where the cluster name is the same as that you use in the your terraform configuration.

The following Terraform configuration will create the EKS master:

resource "aws_eks_cluster" "cluster" {
  name            = "${var.cluster_name}"
  role_arn        = "${aws_iam_role.cluster.arn}"

  enabled_cluster_log_types = [
      "api","audit","authenticator","controllerManager","scheduler",
  ]

  vpc_config {
    endpoint_private_access = true
    endpoint_public_access = false
    security_group_ids = ["${aws_security_group.cluster.id}"]
    subnet_ids         = ["${var.private_subnet_ids}"]
  }

  depends_on = [
    "aws_iam_role_policy_attachment.cluster-AmazonEKSClusterPolicy",
    "aws_iam_role_policy_attachment.cluster-AmazonEKSServicePolicy",
  ]
}

Terraform configuration for nodes

Public subnet tagging

Public subnets will need the following key-value pairs as tags:

kubernetes.io/cluster/<cluster-name>: shared 
kubernetes.io/role/elb: 1

This is so that public load balancers can be created for services and/or ingress controllers.

EKS private master and DNS resolution

In my setup, the master was private (along with all the nodes residing in private subnets). Right off the bat, I ran into issue of the master hostname not resolving from my local workstation (even when I was connected to the VPN which had VPC peering with the VPC the master was running in). This issue is described here. The solution I used ended up getting the IP address of the master via the network interface attached to it and then making an entry in the local /etc/hosts file.

Authentication and Authorization

The first step to accessing the cluster is authenticating yourself and the second step is whether based on the credentials you authenticated yourself with, are you authorized to perform the operation you are trying to currently perform. For EKS clusters, using AWS IAM is the most straightforward approach for authentication. The user who sets up the EKS cluster are automatically given access to the cluster as a member of the system:masters kubernetes group and the authentication setup in kubeconfig looks as follows:

- name: mycluster-admin
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1alpha1
      command: aws-iam-authenticator
      args:
      - token
      - -i
      - myclustername

For the user who created the cluster, there is no further configuration required.

Getting cluster data

To be able to make API requests, we have to get another key piece of information

A complete ~/.kube/config file for admin access for the cluster creator will look like as follows:

apiVersion: v1
current-context: ""
clusters:
- cluster:
    certificate-authority-data: foobar 
    server: https://adasd.yl4.eu-central-1.eks.amazonaws.com
  name: myclustername
contexts:
- context:
    cluster: myclustername
    namespace: default
    user: admin
  name: admin
kind: Config
users:
- name: admin
  user:
    exec:
      apiVersion: client.authentication.k8s.io/v1alpha1
      command: aws-iam-authenticator
      args:
      - token
      - -i
      - myclustername

Worker node joining

Once you have configured the above kubeconfig correctly, if you run kubectl get nodes, you will see that no nodes have joined the cluster. That is because, we will need to first update a special ConfigMap to allow the nodes to authenticate to the cluster:

apiVersion: v1
data:
  mapRoles: |
    - rolearn: arn:aws:iam::AWS-ACCOUN-ID:role/myrole
      username: system:node:{{EC2PrivateDNSName}}
      groups:
      - system:bootsrappers
      - system:nodes
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system

The mapRoles array lists all the IAM roles that we want to allow to authenticate successfully to the cluster. We add the role to the kubernetes groups system:bootstrappers and system:nodes. We have to add all the IAM roles of the nodes in our cluster to this ConfigMap. Once we apply this manifest, you should see the nodes are ready when you run kubectl get nodes again.

Adding other admins and managing users

This is discussed in a separate post.

Persistent volumes

When you create a persistent volume claim, an EBS volume is created for you in AWS.

Topology aware: https://kubernetes.io/blog/2018/10/11/topology-aware-volume-provisioning-in-kubernetes/

Secret management

Nginx Ingress with SSL throughout

The following specification enables Nginx ingress with SSL to your backend as well:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: api-ingress
  namespace: mynamespace
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    kubernetes/ingress.class: nginx
    nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
spec:
  tls:
  - hosts:
    - "myhost.dns.com"
    secretName: myhost-tls
  rules:
    - host: myhost.dns.com
      http:
        paths:
        - path: /
          backend:
            serviceName: xledger
            servicePort: 443

However when trying to use the above with AWS ELB, I had to:

kind: ConfigMap
apiVersion: v1
metadata:
  name: nginx-configuration
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
data:
  use-proxy-protocol: "false"
  use-forwarded-headers: "true"
  proxy-real-ip-cidr: "0.0.0.0/0" # restrict this to the IP addresses of ELB
  ssl-ciphers: "ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA"
  ssl-protocols: "TLSv1 TLSv1.1 TLSv1.2"

The key parts that I struggled with was having to set ssl-ciphers and ssl-protocols. Without those, the connections from ALB was just hanging and eventually would give me a 408. For reference, here’s a service-l7.yaml I used:

kind: Service
apiVersion: v1
metadata:
  name: ingress-nginx
  namespace: ingress-nginx
  labels:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "your certificate arn"
    service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "https"
    service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "https"
    service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "60"
spec:
  type: LoadBalancer
  selector:
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
  ports:
   - name: https
     port: 443
     targetPort: 443

Jobs

Jobs are useful for running one off tasks - database migrations for example. Here’s a sample spec:

apiVersion: batch/v1
kind: Job
metadata:
  name: my-job-name
  namespace: my-namespace
spec:
  template:
    spec:
      containers:
      - name: my-job-name
        image: myproject/job
        args:
        - bash
        - -c
        - /migrate.sh
        env:
          - name: ENVIRONMENT
            value: qa
          - name: TOKEN
            valueFrom:
              secretKeyRef:
                name: secret-token
                key: token
      nodeSelector:
        nodegroup: "services"
        environment: "qa"
      restartPolicy: Never
  backoffLimit: 4

Cron jobs

Cron jobs are useful for running scheduled jobs:

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: cron
  namespace: my-namespace
spec:
  schedule: "* * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: cron
            image: myproject/cron-job
            args:
            - bash
            - -c
            - /schedule.sh
            env:
              - name: ENVIRONMENT
                value: qa
              - name: TOKEN
                valueFrom:
                  secretKeyRef:
                    name: secret-token
                    key: token
          restartPolicy: OnFailure
          nodeSelector:
            nodegroup: "services"
            environment: "qa"

Accessing internal network services

kubectl exec allows us to exec into a pod and run arbitrary commands inside the pod. However, let’s say we wanted to run a graphical database client locally and wanted to connect to a database pod. We cannot make use of kubectl exec. kubectl port-forward helps us here. We can setup a port forward from our local workstation on port XXXX to the DB port and we are done. However, things get complicated when we are using network policies and we should. In this particular case, network policies were setup for the database pod to allow only ingress traffic from within the namespace. Hence, when we try to access the DB pod via port forwarding, it doesn’t work.

In such a case, we can make use of the ipBlock object in the policy definition, for example:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: db-policy
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: my-app
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              project: my-app
        - ipBlock:
            cidr: 10.0.56.0/21 # Trusted subnet
      ports:
        - protocol: TCP
          port: 5432

The ingress section defines two selectors for from - one based on namespace and the other based on ipBlock.

It’s worth noting that since I am using AWS EKS cluster, the CNI plugin generates the IP addresses of the pods in the specified subnet IPv4 ranges, so that may be something which makes this solution not applicable to another kubernetes setup.

The nice thing about using kubectl port-forward here is that we have both authentication and authorization being enforced to even obtain the IP address of the DB pod, since we can allow/disallow port-forward via a custom Role:

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: team-access-services
rules:

- apiGroups: [""]
  resources: ["pods/exec", "pods/portforward"]
  verbs: ["create"]

Pod security policies

I have written about this in a separate blog post

Exposing StatefulSets and Headless services

Let’s say we have a StatefulSet:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: prometheus
  namespace: monitoring
spec:
  replicas: 2
  selector:
    matchLabels:
      app.kubernetes.io/name: prometheus
      project: monitoring
  serviceName: prometheus
  template:
    metadata:
      labels:
        app.kubernetes.io/name: prometheus
        project: monitoring
    spec:
      containers:
      - args:
        - /bin/prometheus/prometheus
        - --config.file=/etc/prometheus.yml
        - --storage.tsdb.path=/data
        env:
        - name: ENVIRONMENT
          value: non-production
        image: <your image>
        imagePullPolicy: Always
       ..
        name: prometheus-infra
        ports:
        - containerPort: 9090
        ..
---

It is exposed via a Headless service:


apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: monitoring
spec:
  clusterIP: None
  ports:
  - port: 9090
  selector:
    app.kubernetes.io/name: prometheus
    project: monitoring
---

To expose it via an ingress controller as an ingress object:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: <ingress class>
    nginx.ingress.kubernetes.io/rewrite-target: /
  name: prometheus
  namespace: monitoring
spec:
  rules:
  - host: <your host name>
    http:
      paths:
      - backend:
          serviceName: prometheus
          servicePort: 9090
        path: /
        
        

Gatekeeper

Dedicated post here

Miscellaneous

Pods in pending state

https://kubernetes.io/docs/tasks/debug-application-cluster/debug-pod-replication-controller/