Building Docker Images with Kaniko Pushing to Amazon Elastic Container Registry (ECR)

To deploy to Amazon Elastic Container Registry (ECR) we can create a secret with AWS credentials or we can run with more secure IAM node instance roles.

When running on EKS we would have an EKS worker node IAM role (NodeInstanceRole), we need to add the IAM permissions to be able to pull and push from ECR. These permissions are grouped in the arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryPowerUser policy, that can be attached to the node instance role.

When using instance roles we no longer need a secret, but we still need to configure kaniko to authenticate to AWS, by using a config.json containing just { "credsStore": "ecr-login" }, mounted in /kaniko/.docker/.

We also need to create the ECR repository beforehand, and, if using caching, another one for the cache.

ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
REPOSITORY=kanikorepo
REGION=us-east-1
# create the repository to push to
aws ecr create-repository --repository-name ${REPOSITORY}/kaniko-demo --region ${REGION}
# when using cache we need another repository for it
aws ecr create-repository --repository-name ${REPOSITORY}/kaniko-demo/cache --region ${REGION}

cat << EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: kaniko-eks
spec:
  restartPolicy: Never
  containers:
  - name: kaniko
    image: gcr.io/kaniko-project/executor:v1.0.0
    imagePullPolicy: Always
    args: ["--dockerfile=Dockerfile",
            "--context=git://github.com/carlossg/kaniko-demo.git",
            "--destination=${ACCOUNT}.dkr.ecr.${REGION}.amazonaws.com/${REPOSITORY}/kaniko-demo:latest",
            "--cache=true"]
    volumeMounts:
      - name: docker-config
        mountPath: /kaniko/.docker/
    resources:
      limits:
        cpu: 1
        memory: 1Gi
  volumes:
    - name: docker-config
      configMap:
        name: docker-config
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: docker-config
data:
  config.json: |-
    { "credsStore": "ecr-login" }
EOF

Building Docker Images with Kaniko Pushing to Azure Container Registry (ACR)

To push to Azure Container Registry (ACR) we can create an admin password for the ACR registry and use the standard Docker registry method or we can use a token. We use that token to craft both the standard Docker config file at /kaniko/.docker/config.json plus the ACR specific file used by the Docker ACR credential helper in /kaniko/.docker/acr/config.json. ACR does support caching and so it will push the intermediate layers to ${REGISTRY_NAME}.azurecr.io/kaniko-demo/cache:_some_large_uuid_ to be reused in subsequent builds.

RESOURCE_GROUP=kaniko-demo
REGISTRY_NAME=kaniko-demo
LOCATION=eastus
az login
# Create the resource group
az group create --name $RESOURCE_GROUP -l $LOCATION
# Create the ACR registry
az acr create --resource-group $RESOURCE_GROUP --name $REGISTRY_NAME --sku Basic
# If we want to enable password based authentication
# az acr update -n $REGISTRY_NAME --admin-enabled true

# Get the token
token=$(az acr login --name $REGISTRY_NAME --expose-token | jq -r '.accessToken')

And to build the image with kaniko

git clone https://github.com/carlossg/kaniko-demo.git
cd kaniko-demo

cat << EOF > config.json
{
  "auths": {
		"${REGISTRY_NAME}.azurecr.io": {}
	},
	"credsStore": "acr"
}
EOF
cat << EOF > config-acr.json
{
	"auths": {
		"${REGISTRY_NAME}.azurecr.io": {
			"identitytoken": "${token}"
		}
	}
}
EOF
docker run \
    -v `pwd`/config.json:/kaniko/.docker/config.json:ro \
    -v `pwd`/config-acr.json:/kaniko/.docker/acr/config.json:ro \
    -v `pwd`:/workspace \
    gcr.io/kaniko-project/executor:v1.0.0 \
    --destination $REGISTRY_NAME.azurecr.io/kaniko-demo:kaniko-docker \
    --cache

In Kubernetes

If you want to create a new Kubernetes cluster

az aks create --resource-group $RESOURCE_GROUP \
    --name AKSKanikoCluster \
    --generate-ssh-keys \
    --node-count 2
az aks get-credentials --resource-group $RESOURCE_GROUP --name AKSKanikoCluster --admin

In Kubernetes we need to mount the docker config file and the ACR config file with the token.

token=$(az acr login --name $REGISTRY_NAME --expose-token | jq -r '.accessToken')
cat << EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: kaniko-aks
spec:
  restartPolicy: Never
  containers:
  - name: kaniko
    image: gcr.io/kaniko-project/executor:v1.0.0
    imagePullPolicy: Always
    args: ["--dockerfile=Dockerfile",
            "--context=git://github.com/carlossg/kaniko-demo.git",
            "--destination=${REGISTRY_NAME}.azurecr.io/kaniko-demo:latest",
            "--cache=true"]
    volumeMounts:
    - name: docker-config
      mountPath: /kaniko/.docker/
    - name: docker-acr-config
      mountPath: /kaniko/.docker/acr/
    resources:
      limits:
        cpu: 1
        memory: 1Gi
  volumes:
  - name: docker-config
    configMap:
      name: docker-config
  - name: docker-acr-config
    secret:
      name: kaniko-secret
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: docker-config
data:
  config.json: |-
    {
      "auths": {
    		"${REGISTRY_NAME}.azurecr.io": {}
    	},
    	"credsStore": "acr"
    }
---
apiVersion: v1
kind: Secret
metadata:
  name: kaniko-secret
stringData:
  config.json: |-
    {
    	"auths": {
    		"${REGISTRY_NAME}.azurecr.io": {
    			"identitytoken": "${token}"
    		}
    	}
    }
EOF

Building Docker Images with Kaniko Pushing to Google Container Registry (GCR)

To push to Google Container Registry (GCR) we need to login to Google Cloud and mount our local $HOME/.config/gcloud containing our credentials into the kaniko container so it can push to GCR. GCR does support caching and so it will push the intermediate layers to gcr.io/$PROJECT/kaniko-demo/cache:_some_large_uuid_ to be reused in subsequent builds.

git clone https://github.com/carlossg/kaniko-demo.git
cd kaniko-demo

gcloud auth application-default login # get the Google Cloud credentials
PROJECT=$(gcloud config get-value project 2> /dev/null) # Your Google Cloud project id
docker run \
    -v $HOME/.config/gcloud:/root/.config/gcloud:ro \
    -v `pwd`:/workspace \
    gcr.io/kaniko-project/executor:v1.0.0 \
    --destination gcr.io/$PROJECT/kaniko-demo:kaniko-docker \
    --cache

kaniko can cache layers created by RUN commands in a remote repository. Before executing a command, kaniko checks the cache for the layer. If it exists, kaniko will pull and extract the cached layer instead of executing the command. If not, kaniko will execute the command and then push the newly created layer to the cache.

We can see in the output how kaniko uploads the intermediate layers to the cache.

INFO[0001] Resolved base name golang to build-env
INFO[0001] Retrieving image manifest golang
INFO[0001] Retrieving image golang
INFO[0004] Retrieving image manifest golang
INFO[0004] Retrieving image golang
INFO[0006] No base image, nothing to extract
INFO[0006] Built cross stage deps: map[0:[/src/bin/kaniko-demo]]
INFO[0006] Retrieving image manifest golang
INFO[0006] Retrieving image golang
INFO[0008] Retrieving image manifest golang
INFO[0008] Retrieving image golang
INFO[0010] Executing 0 build triggers
INFO[0010] Using files from context: [/workspace]
INFO[0011] Checking for cached layer gcr.io/api-project-642841493686/kaniko-demo/cache:0ab16b2e8a90e3820282b9f1ef6faf5b9a083e1fbfe8a445c36abcca00236b4f...
INFO[0011] No cached layer found for cmd RUN cd /src && make
INFO[0011] Unpacking rootfs as cmd ADD . /src requires it.
INFO[0051] Using files from context: [/workspace]
INFO[0051] ADD . /src
INFO[0051] Taking snapshot of files...
INFO[0051] RUN cd /src && make
INFO[0051] Taking snapshot of full filesystem...
INFO[0061] cmd: /bin/sh
INFO[0061] args: [-c cd /src && make]
INFO[0061] Running: [/bin/sh -c cd /src && make]
CGO_ENABLED=0 go build -ldflags '' -o bin/kaniko-demo main.go
INFO[0065] Taking snapshot of full filesystem...
INFO[0070] Pushing layer gcr.io/api-project-642841493686/kaniko-demo/cache:0ab16b2e8a90e3820282b9f1ef6faf5b9a083e1fbfe8a445c36abcca00236b4f to cache now
INFO[0144] Saving file src/bin/kaniko-demo for later use
INFO[0144] Deleting filesystem...
INFO[0145] No base image, nothing to extract
INFO[0145] Executing 0 build triggers
INFO[0145] cmd: EXPOSE
INFO[0145] Adding exposed port: 8080/tcp
INFO[0145] Checking for cached layer gcr.io/api-project-642841493686/kaniko-demo/cache:6ec16d3475b976bd7cbd41b74000c5d2543bdc2a35a635907415a0995784676d...
INFO[0146] No cached layer found for cmd COPY --from=build-env /src/bin/kaniko-demo /
INFO[0146] Unpacking rootfs as cmd COPY --from=build-env /src/bin/kaniko-demo / requires it.
INFO[0146] EXPOSE 8080
INFO[0146] cmd: EXPOSE
INFO[0146] Adding exposed port: 8080/tcp
INFO[0146] No files changed in this command, skipping snapshotting.
INFO[0146] ENTRYPOINT ["/kaniko-demo"]
INFO[0146] No files changed in this command, skipping snapshotting.
INFO[0146] COPY --from=build-env /src/bin/kaniko-demo /
INFO[0146] Taking snapshot of files...
INFO[0146] Pushing layer gcr.io/api-project-642841493686/kaniko-demo/cache:6ec16d3475b976bd7cbd41b74000c5d2543bdc2a35a635907415a0995784676d to cache now

If we run kaniko twice we can see how the cached layers are pulled instead of rebuilt.

INFO[0001] Resolved base name golang to build-env
INFO[0001] Retrieving image manifest golang
INFO[0001] Retrieving image golang
INFO[0004] Retrieving image manifest golang
INFO[0004] Retrieving image golang
INFO[0006] No base image, nothing to extract
INFO[0006] Built cross stage deps: map[0:[/src/bin/kaniko-demo]]
INFO[0006] Retrieving image manifest golang
INFO[0006] Retrieving image golang
INFO[0008] Retrieving image manifest golang
INFO[0008] Retrieving image golang
INFO[0010] Executing 0 build triggers
INFO[0010] Using files from context: [/workspace]
INFO[0010] Checking for cached layer gcr.io/api-project-642841493686/kaniko-demo/cache:0ab16b2e8a90e3820282b9f1ef6faf5b9a083e1fbfe8a445c36abcca00236b4f...
INFO[0012] Using caching version of cmd: RUN cd /src && make
INFO[0012] Unpacking rootfs as cmd ADD . /src requires it.
INFO[0049] Using files from context: [/workspace]
INFO[0049] ADD . /src
INFO[0049] Taking snapshot of files...
INFO[0049] RUN cd /src && make
INFO[0049] Found cached layer, extracting to filesystem
INFO[0051] Saving file src/bin/kaniko-demo for later use
INFO[0051] Deleting filesystem...
INFO[0052] No base image, nothing to extract
INFO[0052] Executing 0 build triggers
INFO[0052] cmd: EXPOSE
INFO[0052] Adding exposed port: 8080/tcp
INFO[0052] Checking for cached layer gcr.io/api-project-642841493686/kaniko-demo/cache:6ec16d3475b976bd7cbd41b74000c5d2543bdc2a35a635907415a0995784676d...
INFO[0054] Using caching version of cmd: COPY --from=build-env /src/bin/kaniko-demo /
INFO[0054] Skipping unpacking as no commands require it.
INFO[0054] EXPOSE 8080
INFO[0054] cmd: EXPOSE
INFO[0054] Adding exposed port: 8080/tcp
INFO[0054] No files changed in this command, skipping snapshotting.
INFO[0054] ENTRYPOINT ["/kaniko-demo"]
INFO[0054] No files changed in this command, skipping snapshotting.
INFO[0054] COPY --from=build-env /src/bin/kaniko-demo /
INFO[0054] Found cached layer, extracting to filesystem

In Kubernetes

To deploy to GCR we can use a service account and mount it as a Kubernetes secret, but when running on Google Kubernetes Engine (GKE) it is more convenient and safe to use the node pool service account.

When creating the GKE node pool the default configuration only includes read-only access to Storage API, and we need full access in order to push to GCR. This is something that we need to change under Add a new node pool – Security – Access scopes – Set access for each API – Storage – Full. Note that the scopes cannot be changed once the node pool has been created.

If the nodes have the correct service account with full storage access scope then we do not need to do anything extra on our kaniko pod, as it will be able to push to GCR just fine.

PROJECT=$(gcloud config get-value project 2> /dev/null)

cat << EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: kaniko-gcr
spec:
  restartPolicy: Never
  containers:
  - name: kaniko
    image: gcr.io/kaniko-project/executor:v1.0.0
    imagePullPolicy: Always
    args: ["--dockerfile=Dockerfile",
            "--context=git://github.com/carlossg/kaniko-demo.git",
            "--destination=gcr.io/${PROJECT}/kaniko-demo:latest",
            "--cache=true"]
    resources:
      limits:
        cpu: 1
        memory: 1Gi
EOF

Building Docker Images with Kaniko Pushing to Docker Registries

We can build a Docker image with kaniko and push it to Docker Hub or any other standard Docker registry.

Running kaniko from a Docker daemon does not provide much advantage over just running a docker build, but it is useful for testing or validation. It also helps understand how kaniko works and how it supports the different registries and authentication mechanisms.

git clone https://github.com/carlossg/kaniko-demo.git
cd kaniko-demo
# if you just want to test the build, no pushing
docker run \
    -v `pwd`:/workspace gcr.io/kaniko-project/executor:v1.0.0 \
    --no-push

Building by itself is not very useful, so we want to push to a remote Docker registry.

To push to DockerHub or any other username and password Docker registries we need to mount the Docker config.json file that contains the credentials. Caching will not work for DockerHub as it does not support repositories with more than 2 path sections (acme/myimage/cache), but it will work in Artifactory and maybe other registry implementations.

DOCKER_USERNAME=[...]
DOCKER_PASSWORD=[...]
AUTH=$(echo -n "${DOCKER_USERNAME}:${DOCKER_PASSWORD}" | base64)
cat << EOF > config.json
{
    "auths": {
        "https://index.docker.io/v1/": {
            "auth": "${AUTH}"
        }
    }
}
EOF
docker run \
    -v `pwd`/config.json:/kaniko/.docker/config.json:ro \
    -v `pwd`:/workspace \
    gcr.io/kaniko-project/executor:v1.0.0 \
    --destination $DOCKER_USERNAME/kaniko-demo:kaniko-docker

In Kubernetes

In Kubernetes we can manually create a pod that will do our Docker image build. We need to provide the build context, containing the same files that we would put in the directory used when building a Docker image with a Docker daemon. It should contain the Dockerfile and any other files used to build the image, ie. referenced in COPY commands.

As build context we can use multiple sources

  • GCS Bucket (as a tar.gz file)
    • gs://kaniko-bucket/path/to/context.tar.gz
  • S3 Bucket (as a tar.gz file) `
    • s3://kaniko-bucket/path/to/context.tar.gz
  • Azure Blob Storage (as a tar.gz file)
    • https://myaccount.blob.core.windows.net/container/path/to/context.tar.gz
  • Local Directory, mounted in the /workspace dir as shown above
    • dir:///workspace
  • Git Repository
    • git://github.com/acme/myproject.git#refs/heads/mybranch

Depending on where we want to push to, we will also need to create the corresponding secrets and config maps.

We are going to show examples building from a git repository as it will be the most typical use case.

Deploying to Docker Hub or a Docker registry

We will need the Docker registry credentials in a config.json file, the same way that we need them to pull images from a private registry in Kubernetes.

DOCKER_USERNAME=[...]
DOCKER_PASSWORD=[...]
DOCKER_SERVER=https://index.docker.io/v1/
kubectl create secret docker-registry regcred \
    --docker-server=${DOCKER_SERVER} \
    --docker-username=${DOCKER_USERNAME} \
    --docker-password=${DOCKER_PASSWORD}

cat << EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: kaniko-docker
spec:
  restartPolicy: Never
  containers:
  - name: kaniko
    image: gcr.io/kaniko-project/executor:v1.0.0
    imagePullPolicy: Always
    args: ["--dockerfile=Dockerfile",
            "--context=git://github.com/carlossg/kaniko-demo.git",
            "--destination=${DOCKER_USERNAME}/kaniko-demo"]
    volumeMounts:
      - name: docker-config
        mountPath: /kaniko/.docker
    resources:
      limits:
        cpu: 1
        memory: 1Gi
  volumes:
  - name: docker-config
    projected:
      sources:
      - secret:
          name: regcred
          items:
            - key: .dockerconfigjson
              path: config.json
EOF

Building Docker Images with Kaniko

This is the first post in a series about kaniko.

kaniko is a tool to build container images from a Dockerfile, similar to docker build, but without needing a Docker daemon. kaniko builds the images inside a container, executing the Dockerfile commands in userspace, so it allows us to build the images in standard Kubernetes clusters.

This means that in a containerized environment, be it a Kubernetes cluster, a Jenkins agent running in Docker, or any other container scheduler, we no longer need to use Docker in Docker nor do the build in the host system by mounting the Docker socket, simplifying and improving the security of container image builds.

Still, kaniko does not make it safe to run untrusted container image builds, but it relies on the security features of the container runtime. If you have a minimal base image that doesn’t require permissions to unpack, and your Dockerfile doesn’t execute any commands as the root user, you can run Kaniko without root permissions.

kaniko builds the container image inside a container, so it needs a way to get the build context (the directory where the Dockerfile and any other files that we want to copy into the container are) and to push the resulting image to a registry.

The build context can be a compressed tar in a Google Cloud Storage or AWS S3 bucket, a local directory inside the kaniko container, that we need to mount ourselves, or a git repository.

kaniko can be run in Docker, Kubernetes, Google Cloud Build (sending our image build to Google Cloud), or gVisor. gVisor is an OCI sandbox runtime that provides a virtualized container environment. It provides an additional security boundary for our container image builds.

Images can be pushed to any standard Docker registry but also Google GCR and AWS ECR are directly supported.

With Docker daemon image builds (docker build) we have caching. Each layer generated by RUN commands in the Dockerfile is kept and reused if the commands don’t change. In kaniko, because the image builds happen inside a container that is gone after the build we lose anything built locally. To solve this, kaniko can push these intermediate layers resulting from RUN commands to the remote registry when using the --cache flag.

In this series I will be covering using kaniko with several container registries.

Deploying Kubernetes Apps into Alibaba Cloud Container Service

alibaba-cloud-logo-898D58C1CE-seeklogo.comAlibaba Cloud has a managed Kubernetes service called Alibaba Cloud Container Service. As with other distributions of Kubernetes there are some quirks to use it. I have documented the issues I’ve found when trying to run Jenkins X there.

Alibaba Cloud has several options to run Kubernetes:

  • Dedicated Kubernetes: You must create three Master nodes and one or multiple Worker nodes for the cluster
  • Managed Kubernetes: You only need to create Worker nodes for the cluster, and Alibaba Cloud Container Service for Kubernetes creates and manages Master nodes for the cluster
  • Multi-AZ Kubernetes
  • Serverless Kubernetes (beta): You are charged for the resources used by container instances. The amount of used resources is measured according to resource usage duration (in seconds).

You can run in multiple regions across the globe, however to run in the mainland China regions you need a Chinese id or business id. When running there you also have to face the issues of running behind The Great Firewall of China, that is currently blocking some Google services, such as Google Container Registry access, where some Docker images are hosted. DockerHub or Google Storage Service are not blocked.

Creating a Kubernetes Cluster

Alibaba requires several things in order to create a Kubernetes cluster, so it is easier to do it through the web UI the first time.

The following services need to be activated: Container Service, Resource Orchestration Service (ROS), RAM, and Auto Scaling service, and created the Container Service roles.

If we want to use the command line we can install the aliyun cli. I have added all the steps needed below in case you want to use it.

brew install aliyun-cli
aliyun configure
REGION=ap-southeast-1

The clusters need to be created in a VPC, so that needs to be created with VSwitches for each zone to be used.

aliyun vpc CreateVpc \
    --VpcName jx \
    --Description "Jenkins X" \
    --RegionId ${REGION} \
    --CidrBlock 172.16.0.0/12

{
    "ResourceGroupId": "rg-acfmv2nomuaaaaa",
    "RequestId": "2E795E99-AD73-4EA7-8BF5-F6F391000000",
    "RouteTableId": "vtb-t4nesimu804j33p4aaaaa",
    "VRouterId": "vrt-t4n2w07mdra52kakaaaaa",
    "VpcId": "vpc-t4nszyte14vie746aaaaa"
}

VPC=vpc-t4nszyte14vie746aaaaa

aliyun vpc CreateVSwitch \
    --VSwitchName jx \
    --VpcId ${VPC} \
    --RegionId ${REGION} \
    --ZoneId ${REGION}a \
    --Description "Jenkins X" \
    --CidrBlock 172.16.0.0/24

{
    "RequestId": "89D9AB1F-B4AB-4B4B-8CAA-F68F84417502",
    "VSwitchId": "vsw-t4n7uxycbwgtg14maaaaa"
}

VSWITCH=vsw-t4n7uxycbwgtg14maaaaa

Next, a keypair (or password) is needed for the cluster instances.

aliyun ecs ImportKeyPair \
    --KeyPairName jx \
    --RegionId ${REGION} \
    --PublicKeyBody "$(cat ~/.ssh/id_rsa.pub)"

The last step is to create the cluster using the just created VPC, VSwitch and Keypair. It’s important to select the option Expose API Server with EIP (public_slb in the API json) to be able to connect to the API from the internet.

echo << EOF > cluster.json
{
    "name": "jx-rocks",
    "cluster_type": "ManagedKubernetes",
    "disable_rollback": true,
    "timeout_mins": 60,
    "region_id": "${REGION}",
    "zoneid": "${REGION}a",
    "snat_entry": true,
    "cloud_monitor_flags": false,
    "public_slb": true,
    "worker_instance_type": "ecs.c4.xlarge",
    "num_of_nodes": 3,
    "worker_system_disk_category": "cloud_efficiency",
    "worker_system_disk_size": 120,
    "worker_instance_charge_type": "PostPaid",
    "vpcid": "${VPC}",
    "vswitchid": "${VSWITCH}",
    "container_cidr": "172.20.0.0/16",
    "service_cidr": "172.21.0.0/20",
    "key_pair": "jx"
}
EOF

aliyun cs  POST /clusters \
    --header "Content-Type=application/json" \
    --body "$(cat create.json)"

{
    "cluster_id": "cb643152f97ae4e44980f6199f298f223",
    "request_id": "0C1E16F8-6A9E-4726-AF6E-A8F37CDDC50C",
    "task_id": "T-5cd93cf5b8ff804bb40000e1",
    "instanceId": "cb643152f97ae4e44980f6199f298f223"
}

CLUSTER=cb643152f97ae4e44980f6199f298f223

We can now download kubectl configuration with

aliyun cs GET /k8s/${CLUSTER}/user_config | jq -r .config > ~/.kube/config-alibaba
export KUBECONFIG=$KUBECONFIG:~/.kube/config-alibaba

Another detail before being able to install applications that use PersistentVolumeClaims is to configure a default storage class. There are several volume options that can be listed with kubectl get storageclass.

NAME                          PROVISIONER     AGE
alicloud-disk-available       alicloud/disk   44h
alicloud-disk-common          alicloud/disk   44h
alicloud-disk-efficiency      alicloud/disk   44h
alicloud-disk-ssd             alicloud/disk   44h

Each of them matches the following cloud disks:

  • alicloud-disk-common: basic cloud disk (minimum size 5GiB). Only available in some zones (us-west-1a, cn-beijing-b,…)
  • alicloud-disk-efficiency: high-efficiency cloud disk, ultra disk (minimum size 20GiB).
  • alicloud-disk-ssd: SSD disk (minimum size 20GiB).
  • alicloud-disk-available: provides highly available options, first attempts to create a high-efficiency cloud disk. If the corresponding AZ’s efficient cloud disk resources are sold out, tries to create an SSD disk. If the SSD is sold out, tries to create a common cloud disk.

To set SSDs as the default:

kubectl patch storageclass alicloud-disk-ssd \
    -p '{"metadata": {"annotations": {"storageclass.kubernetes.io/is-default-class":"true"}}}'

NOTE: Alibaba cloud disks must be more than 5GiB (basic) or 20GiB (SSD and Ultra)) so we will need to configure any service that is deployed with PVCs to have that size as a minimum or the PersistentVolumeprovision will fail.

You can continue reading about installing Jenkins X on Alibaba Cloud as an example.

Progressive Delivery with Jenkins X: Automatic Canary Deployments

jenkins-x

This is the third post in a Progressive Delivery series, see the previous ones:

Progressive Delivery is used by Netflix, Facebook and others to reduce the risk of deployments. But you can now adopt it when using Jenkins X.

Progressive Delivery is the next step after Continuous Delivery, where new versions are deployed to a subset of users and are evaluated in terms of correctness and performance before rolling them to the totality of the users and rolled back if not matching some key metrics.

In particular we focused on Canary releases and made it really easy to adopt them in your Jenkins X applications. Canary releases consist on sending a small percentage of traffic to the new version of your application and validate there are no errors before rolling it out to the rest of the users. Facebook does it this way, delivering new versions first to internal employees, then a small percentage of the users, then everybody else, but you don’t need to be Facebook to take advantage of it!

facebook-canary-strategy.jpg

You can read more on Canaries at Martin Fowler’s website.

Jenkins X

If you already have an application in Jenkins X you know that you can promote it to the “production” environment with jx promote myapp --version 1.0 --env production. But it can also be automatically and gradually rolled it out to a percentage of users while checking that the new version is not failing. If that happens the application will be automatically rolled back. No human intervention at all during the process.

NOTE: this new functionality is very recent and a number of these steps will not be needed in the future as they will also be automated by Jenkins X.

As the first step three Jenkins X addons need to be installed:

  • Istio: a service mesh that allows us to manage traffic to our services.
  • Prometheus: the most popular monitoring system in Kubernetes.
  • Flagger: a project that uses Istio to automate canarying and rollbacks using metrics from Prometheus.

The addons can be installed (using a recent version of the jx cli) with

jx create addon istio
jx create addon prometheus
jx create addon flagger

This will enable Istio in the jx-production namespace for metrics gathering.

Now get the ip of the Istio ingress and point a wildcard domain to it (e.g. *.example.com), so we can use it to route multiple services based on host names. The Istio ingress provides the routing capabilities needed for Canary releases (traffic shifting) that the traditional Kubernetes ingress objects do not support.

kubectl -n istio-system get service istio-ingressgateway \
-o jsonpath='{.status.loadBalancer.ingress[0].ip}'

The cluster is configured, and it’s time to configure our application. Add a canary.yaml to your helm chart, under charts/myapp/templates.

{{- if eq .Release.Namespace "jx-production" }}
{{- if .Values.canary.enable }}
apiVersion: flagger.app/v1alpha2
kind: Canary
metadata:
  name: {{ template "fullname" . }}
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: {{ template "fullname" . }}
  progressDeadlineSeconds: 60
  service:
    port: {{.Values.service.internalPort}}
{{- if .Values.canary.service.gateways }}
    gateways:
{{ toYaml .Values.canary.service.gateways | indent 4 }}
{{- end }}
{{- if .Values.canary.service.hosts }}
    hosts:
{{ toYaml .Values.canary.service.hosts | indent 4 }}
{{- end }}
  canaryAnalysis:
    interval: {{ .Values.canary.canaryAnalysis.interval }}
    threshold: {{ .Values.canary.canaryAnalysis.threshold }}
    maxWeight: {{ .Values.canary.canaryAnalysis.maxWeight }}
    stepWeight: {{ .Values.canary.canaryAnalysis.stepWeight }}
{{- if .Values.canary.canaryAnalysis.metrics }}
    metrics:
{{ toYaml .Values.canary.canaryAnalysis.metrics | indent 4 }}
{{- end }}
{{- end }}
{{- end }}

Then append to the charts/myapp/values.yaml the following, changing myapp.example.com to your host name or names:

canary:
  enable: true
  service:
    # Istio virtual service host names
    hosts:
    - myapp.example.com
    gateways:
    - jx-gateway.istio-system.svc.cluster.local
  canaryAnalysis:
    # schedule interval (default 60s)
    interval: 60s
    # max number of failed metric checks before rollback
    threshold: 5
    # max traffic percentage routed to canary
    # percentage (0-100)
    maxWeight: 50
    # canary increment step
    # percentage (0-100)
    stepWeight: 10
    metrics:
    - name: istio_requests_total
      # minimum req success rate (non 5xx responses)
      # percentage (0-100)
      threshold: 99
      interval: 60s
    - name: istio_request_duration_seconds_bucket
      # maximum req duration P99
      # milliseconds
      threshold: 500
      interval: 60s

Soon, both the canary.yaml and values.yaml changes won’t be needed when you create your app from one of the Jenkins X quickstarts, as they will be Canary enabled by default.

That’s it! Now when the app is promoted to the production environment with jx promote myapp --version 1.0 --env production it will do a Canary rollout. Note that the first time it is promoted it will not do a Canary as it needs a previous version data to compare to, but it will work from the second promotion on.

With the configuration in the values.yaml file above it would look like:

  • minute 1: send 10% of the traffic to the new version
  • minute 2: send 20% of the traffic to the new version
  • minute 3: send 30% of the traffic to the new version
  • minute 4: send 40% of the traffic to the new version
  • minute 5: send 100% of the traffic to the new version

If the metrics we have configured (request duration over 500 milliseconds or more than 1% responses returning 500 errors) fail, Flagger then will note that failure, and if it is repeated 5 times it will rollback the release, sending 100% of the traffic to the old version.

To get the Canary events run

$ kubectl -n jx-production get events --watch \
  --field-selector involvedObject.kind=Canary
LAST SEEN   FIRST SEEN   COUNT   NAME                                                  KIND     SUBOBJECT   TYPE     REASON   SOURCE    MESSAGE
23m         10d          7       jx-production-myapp.1584d8fbf5c306ee   Canary               Normal   Synced   flagger   New revision detected! Scaling up jx-production-myapp.jx-production
22m         10d          8       jx-production-myapp.1584d89a36d2e2f2   Canary               Normal   Synced   flagger   Starting canary analysis for jx-production-myapp.jx-production
22m         10d          8       jx-production-myapp.1584d89a38592636   Canary               Normal   Synced   flagger   Advance jx-production-myapp.jx-production canary weight 10
21m         10d          7       jx-production-myapp.1584d917ed63f6ec   Canary               Normal   Synced   flagger   Advance jx-production-myapp.jx-production canary weight 20
20m         10d          7       jx-production-myapp.1584d925d801faa0   Canary               Normal   Synced   flagger   Advance jx-production-myapp.jx-production canary weight 30
19m         10d          7       jx-production-myapp.1584d933da5f218e   Canary               Normal   Synced   flagger   Advance jx-production-myapp.jx-production canary weight 40
18m         10d          6       jx-production-myapp.1584d941d4cb21e8   Canary               Normal   Synced   flagger   Advance jx-production-myapp.jx-production canary weight 50
18m         10d          6       jx-production-myapp.1584d941d4cbc55b   Canary               Normal   Synced   flagger   Copying jx-production-myapp.jx-production template spec to jx-production-myapp-primary.jx-production
17m         10d          6       jx-production-myapp.1584d94fd1218ebc   Canary               Normal   Synced   flagger   Promotion completed! Scaling down jx-production-myapp.jx-production

Dashboard

Flagger includes a Grafana dashboard for visualization purposes as it is not needed for the Canary releases. It can be accessed locally using Kubernetes port forwarding

kubectl --namespace istio-system port-forward deploy/flagger-grafana 3000

Then accessing http://localhost:3000 using admin/admin, selecting the canary-analysis dashboard and

  • namespace: jx-production
  • primary: jx-production-myapp-primary
  • canary: jx-production-myapp

would provide us with a view of different metrics (cpu, memory, request duration, response errors,…) of the incumbent and new versions side by side.

Caveats

Note that Istio by default will prevent access from your pods to the outside of the cluster (a behavior that is expected to change in Istio 1.1). Learn how to control the Istio egress traffic.

If a rollback happens automatically because the metrics fail, the Jenkins X GitOps repository for the production environment becomes out of date, still using the new version instead of the old one. This is something planned to be fixed in next releases.

Progressive Delivery with Jenkins X

kubernetes

This is the second post in a Progressive Delivery series, see the first one, Progressive Delivery in Kubernetes: Blue-Green and Canary Deployments.

I have evaluated three Progressive Delivery options for Canary and Blue-Green deployments with Jenkins X, using my Croc Hunter example project.

  • Shipper enables blue-green and multi cluster deployments for the Helm charts built by Jenkins X, but has limitations on what are the contents of the chart. You could do blue-green between staging and production environments.
  • Istio allows to send a percentage of the traffic to staging or preview environments by just creating a VirtualService.
  • Flagger builds on top of Istio and adds canary deployment, with automated roll out and roll back based on metrics. Jenkins X promotions to the production environment can automatically be canary-enabled for a graceful roll out by creating a Canary object.

Find the example code for Shipper, Istio and Flagger.

Shipper

Because Shipper has multiple limitations on the Helm charts created I had to make some changes to the app. Also Jenkins X only builds the Helm package from master so we can’t do rollouts of PRs, only the master branch.

The app label can’t include the release name, ie. app: {{ template “fullname” . }} won’t work, need something like app: {{ .Values.appLabel }}

App rollout failed with the Jenkins X generated charts due to a generated templates/release.yaml, probably a conflict with jenkins.io/releases CRD.

Chart croc-hunter-jenkinsx-0.0.58 failed to render:
could not decode manifest: no kind "Release" is registered for version "jenkins.io/v1"

We just need to change jx step changelog to jx step changelog –generate-yaml=false so the file is not generated.

In multi cluster, it needs to use public urls for both chartmuseum and docker registry in the shipper application yaml so the other clusters can find the management cluster services to download the charts.

Istio

We can create this Virtual Service to send 1% of the traffic to a Jenkins X preview environment (for PR number 35), for all requests coming to the Ingress Gateway for host croc-hunter.istio.example.org

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
 name: croc-hunter-jenkinsx
 namespace: jx-production
spec:
 gateways:
 - public-gateway.istio-system.svc.cluster.local
 - mesh
 hosts:
 - croc-hunter.istio.example.com
 http:
 - route:
   - destination:
       host: croc-hunter-jenkinsx.jx-production.svc.cluster.local
       port:
         number: 80
     weight: 99
   - destination:
       host: croc-hunter-jenkinsx.jx-carlossg-croc-hunter-jenkinsx-serverless-pr-35.svc.cluster.local
       port:
         number: 80
     weight: 1

Flagger

We can create a Canary object for the chart deployed by Jenkins X in the jx-production namespace, and all new Jenkins X promotions to jx-production will automatically be rolled out 10% at a time and automatically rolled back if anything fails.

apiVersion: flagger.app/v1alpha2
kind: Canary
metadata:
  # canary name must match deployment name
  name: jx-production-croc-hunter-jenkinsx
  namespace: jx-production
spec:
  # deployment reference
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: jx-production-croc-hunter-jenkinsx
  # HPA reference (optional)
  # autoscalerRef:
  #   apiVersion: autoscaling/v2beta1
  #   kind: HorizontalPodAutoscaler
  #   name: jx-production-croc-hunter-jenkinsx
  # the maximum time in seconds for the canary deployment
  # to make progress before it is rollback (default 600s)
  progressDeadlineSeconds: 60
  service:
    # container port
    port: 8080
    # Istio gateways (optional)
    gateways:
    - public-gateway.istio-system.svc.cluster.local
    # Istio virtual service host names (optional)
    hosts:
    - croc-hunter.istio.example.com
  canaryAnalysis:
    # schedule interval (default 60s)
    interval: 15s
    # max number of failed metric checks before rollback
    threshold: 5
    # max traffic percentage routed to canary
    # percentage (0-100)
    maxWeight: 50
    # canary increment step
    # percentage (0-100)
    stepWeight: 10
    metrics:
    - name: istio_requests_total
      # minimum req success rate (non 5xx responses)
      # percentage (0-100)
      threshold: 99
      interval: 1m
    - name: istio_request_duration_seconds_bucket
      # maximum req duration P99
      # milliseconds
      threshold: 500
      interval: 30s

Progressive Delivery in Kubernetes: Blue-Green and Canary Deployments

kubernetesProgressive Delivery is the next step after Continuous Delivery, where new versions are deployed to a subset of users and are evaluated in terms of correctness and performance before rolling them to the totality of the users and rolled back if not matching some key metrics.

There are some interesting projects that make this easier in Kubernetes, and I’m going to talk about three of them that I took for a spin with a Jenkins X example project: Shipper, Istio and Flagger.

Shipper

Shipper is a project from booking.com extending Kubernetes to add sophisticated rollout strategies and multi-cluster orchestration (docs). It supports deployments from one to multiple clusters, and allows multi-region deployments.

Shipper is installed with a cli shipperctl, that pushes the configuration of the different clusters to manage. Note this issue with GKE contexts.

Shipper uses Helm packages for deployment but they are not installed with Helm, they won’t show in helm list. Also, deployments must be version apps/v1 or shipper will not edit the deployment to add the right labels and replica count.

Rollouts with Shipper are all about transitioning from an old Release, the incumbent, to a new Release, the contender. This is achieved by creating a new Application object that defines the n stages that the deployment goes through. For example for a 3 step process:

  1. Staging: Deploy the new version to one pod, with no traffic
  2. 50/50: Deploy the new version to 50% of the pods and 50% of the traffic
  3. Full on: Deploy the new version to all the pods and all the traffic
   strategy:
     steps:
     - name: staging
       capacity:
         contender: 1
         incumbent: 100
       traffic:
         contender: 0
         incumbent: 100
     - name: 50/50
       capacity:
         contender: 50
         incumbent: 50
       traffic:
         contender: 50
         incumbent: 50
     - name: full on
       capacity:
         contender: 100
         incumbent: 0
       traffic:
         contender: 100
         incumbent: 0

If a step in the release does not send traffic to the pods they can be accessed with kubectl port-forward, ie. kubectl port-forward mypod 8080:8080, which is useful for testing before users can see the new version.

Shipper supports the concept of multiple clusters, but treats all clusters the same way, only using regions and filter by capabilities (set in the cluster object), so there’s no option to have dev, staging, prod clusters with just one Application object. But we could have two application objects

  • myapp-staging deploys to region “staging
  • myapp deploys to other regions

In GKE you can easily configure a multi cluster ingress that will expose the service running in multiple clusters and serve from the cluster closest to your location.

Limitations

The main limitations in Shipper:

  • Chart restrictions: The Chart must have exactly one Deployment object. The name of the Deployment should be templated with {{.Release.Name}}. The Deployment object should have apiVersion: apps/v1.
  • Pod-based traffic shifting: there is no way to have fine grained traffic routing, ie. send 1% of the traffic to the new version, it is based on the number of pods running.
  • New Pods don’t get traffic if Shipper is not working

Istio

Istio is not a deployment tool but a service mesh. However it is interesting as it has become very popular and allows traffic management, for example sending a percentage of the traffic to a different service and other advanced networking.

In GKE it can be installed by just checking the box to enable Istio in the cluster configuration. In other clusters it can be installed manually or with Helm.

With Istio we can create a Gateway that processes all external traffic through the Ingress Gateway and create VirtualServices that manage the routing to our services. In order to do that just find the ingress gateway ip address and configure a wildcard DNS for it. Then create the Gateway that will route all external traffic through the Ingress Gateway

apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
 name: public-gateway
 namespace: istio-system
spec:
 selector:
   istio: ingressgateway
 servers:
 - port:
     number: 80
     name: http
     protocol: HTTP
   hosts:
   - "*"

Istio does not manage the app lifecycle just the networking. We can create a Virtual Service to send 1% of the traffic to the service deployed in a pull request or in the master branch, for all requests coming to the Ingress Gateway.

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
 name: croc-hunter-jenkinsx
 namespace: jx-production
spec:
 gateways:
 - public-gateway.istio-system.svc.cluster.local
 - mesh
 hosts:
 - croc-hunter.istio.example.org
 http:
 - route:
   - destination:
       host: croc-hunter-jenkinsx.jx-production.svc.cluster.local
       port:
         number: 80
     weight: 99
   - destination:
       host: croc-hunter-jenkinsx.jx-staging.svc.cluster.local
       port:
         number: 80
     weight: 1

Flagger

Flagger is a project sponsored by WeaveWorks using Istio to automate canarying and rollbacks using metrics from Prometheus. It goes beyond what Istio provides to automate progressive rollouts and rollbacks based on metrics.

Flagger requires Istio installed with Prometheus, Servicegraph and configuration of some systems, plus the installation of the Flagger controller itself. It also offers a Grafana dashboard to monitor the deployment progress.

grafana-canary-analysis

The deployment rollout is defined by a Canary object that will generate primary and canary Deployment objects. When the Deployment is edited, for instance to use a new image version, the Flagger controller will shift the loads from 0% to 50% with 10% increases every minute, then it will shift to the new deployment or rollback if metrics such as response errors and request duration fail.

Comparison

This table summarizes the strengths and weaknesses of both Shipper and Flagger in terms of a few Progressive Delivery features.

Shipper Flagger
Traffic routing Bare k8s balancing as % of pods Advanced traffic routing with Istio (% of requests)
Deployment progress UI No Grafana Dashboard
Deployments supported Helm charts with strong limitations Any deployment
Multi cluster deployment Yes No
Canary or blue/green in different namespace (ie. jx-staging and jx-production) No No, but the VirtualService could be manually edited to do it
Canary or blue/green in different cluster Yes, but with a hack, using a new Application and link to a new “region” Maybe with Istio multicluster ?
Automated rollout No, operator must manually go through the steps Yes, 10% traffic increase every minute, configurable
Automated rollback No, operator must detect error and manually go through the steps Yes, based on Prometheus metrics
Requirements None Istio, Prometheus
Alerts Slack

To sum up, I see Shipper’s value on multi-cluster management and simplicity, not requiring anything other than Kubernetes, but it comes with some serious limitations.

Flagger really goes the extra mile automating the rollout and rollback, and fine grain control over traffic, at a higher complexity cost with all the extra services needed (Istio, Prometheus).

Find the example code for Shipper, Istio and Flagger.

Jenkins Kubernetes Plugin: 2018 in Review

kubernetesLast year has been quite prolific for the Jenkins Kubernetes Plugin, with 55 releases and lots of external contributions!

In 2019 there will be a push for Serverless Jenkins and with that a shift to make agents work better in a Kubernetes environment, with no persistent jnlp connections. You can watch my Jenkins X and Serverless Jenkins demo at Kubecon.

Main changes in the Kubernetes plugin in 2018:

  • Allow creating Pod templates from yaml. This allows setting all possible fields in Kubernetes API using yaml
  • Add yamlFile option for Declarative agent to read yaml definition from a different file
  • Support multiple containers in declarative pipeline
  • Support passing kubeconfig file as credentials using secretFile credentials
  • Show pod logs and events in the Jenkins node page
  • Add optional usage restriction for a Kubernetes cloud using folder properties
  • Add Pod Retention policies to keep pods around on failure
  • Validate label and container names with regex
  • Add option to apply caps only on alive pods
  • Split credentials classes into new plugin kubernetes-credentials

Full Changelog

2018-12-31 kubernetes-1.14.2
2018-12-24 kubernetes-1.14.1
2018-12-19 kubernetes-1.14.0
2018-12-19 kubernetes-1.13.9
2018-12-13 kubernetes-1.13.8
2018-11-30 kubernetes-1.13.7
2018-11-23 kubernetes-1.13.6
2018-10-31 kubernetes-1.13.5
2018-10-30 kubernetes-1.13.4
2018-10-30 kubernetes-1.13.3
2018-10-24 kubernetes-1.13.2
2018-10-23 kubernetes-1.13.1
2018-10-19 kubernetes-1.13.0
2018-10-17 kubernetes-1.12.9
2018-10-17 kubernetes-1.12.8
2018-10-11 kubernetes-1.12.7
2018-09-07 kubernetes-1.12.6
2018-09-07 kubernetes-1.12.5
2018-08-28 kubernetes-1.12.4
2018-08-09 kubernetes-1.12.3
2018-08-07 kubernetes-1.12.2
2018-08-06 kubernetes-1.12.1
2018-07-31 kubernetes-1.12.0
2018-07-31 kubernetes-1.11.0
2018-07-23 kubernetes-1.10.2
2018-07-16 kubernetes-1.10.1
2018-07-11 kubernetes-1.10.0
2018-07-11 kubernetes-1.9.3
2018-06-26 kubernetes-1.9.2
2018-06-26 kubernetes-1.9.1
2018-06-26 kubernetes-1.9.0
2018-06-22 kubernetes-1.8.4
2018-06-22 kubernetes-1.8.3
2018-06-19 kubernetes-1.8.2
2018-06-13 kubernetes-1.8.1
2018-06-13 kubernetes-1.8.0
2018-05-30 kubernetes-1.7.1
2018-05-30 kubernetes-1.7.0
2018-05-29 kubernetes-1.6.4
2018-05-25 kubernetes-1.6.3
2018-05-23 kubernetes-1.6.2
2018-05-22 kubernetes-1.6.1
2018-04-25 kubernetes-1.6.0
2018-04-16 kubernetes-1.5.2
2018-04-09 kubernetes-1.5.1
2018-04-01 kubernetes-1.5
2018-03-28 kubernetes-1.4.1
2018-03-21 kubernetes-1.4
2018-03-16 kubernetes-1.3.3
2018-03-07 kubernetes-1.3.2
2018-02-21 kubernetes-1.3.1
2018-02-21 kubernetes-1.3
2018-02-16 kubernetes-1.2.1
2018-02-02 kubernetes-1.2
2018-01-29 kubernetes-1.1.4
2018-01-10 kubernetes-1.1.3