Distroless as the final container base image

Mohammad Badruzzaman
6 min readNov 2, 2023

--

Container images play a crucial role in microservices and Kubernetes ecosystems. They provide a lightweight, portable, and reproducible way to package and deploy microservices. In this post, we will explore how distroless container images can

  • Enhance the security and efficiency of your Kubernetes cluster and cloud network
  • Reduce data transfer costs in your cloud network
  • Possibly can improve the SLA of the CI system as well

Application developers usually develop their application leveraging a lot of external libraries or dependencies and finally ship their application in a containerized image to a Kubernetes cluster. The size and security of those container images mainly depend on how efficiently developers import external dependencies and also how efficiently they build their container image concerning security. If there are no standard best practices set for container image creation in your company then depending on the knowledge level of the container image creator, it is possible that there can be a lot of garbage included in the container image. Let’s consider, on average if a container image contains 100 MiB of overhead then 200000 containers will cause ~19TiB overhead traffic while pulling container images from the image registry. If you are using spot instances or spot VMs as Kubernetes workers they can be claimed back by the cloud provider. This causes the rescheduling of your pod in the Kubernetes cluster. Also in your staging or development environment due to misconfiguration and error, pods sometimes go into frequent restart mode. If we consider an average of 1 container restart(including production, staging, and dev environment) per day then with an imagePullPolicy set to Always in your Kubernetes pod manifest, you may have around 6886 TiB of unnecessary data per year traveling from container image registry to k8s clusters which contribute (around ⅔ of the total overhead traffic) to inter AZ data transfer cost in the cloud platform and also cause unnecessary load on the cloud VPC network. In an actual scenario, in a medium-sized company with 500–1000 applications, k8s clusters may end up with more than 200k containers, and the average overhead per container image can be more than 100MiB. Moreover, the CI has to upload container images to the image registry which causes extra time to upload(increase build pipeline execution time), extra load on the network, and possibly inter-AZ or inter-region data transfer cost as well depending on the location of the CI system and container registries. In short, the extra 100 MiB overhead which may look negligible to a developer who builds the container images, may not be negligible for platform maintainers. However, it is not feasible to completely get rid of this kind of overhead but there is always scope and good practices to reduce this kind of extra overhead as much as possible.

Restricting what’s in your runtime container to precisely what’s necessary for your app is one of the best practices for container image creation. Distroless container images can help a lot to create a minimalistic and secure container image. Distroless images are available for both compiled and interpreted programming languages.

Let’s inspect the layers of a static distroless image gcr.io/distroless/static-debian12, the irony here is, though it is called a distroless image but it is actually a Debian distro but stripped down to the bones.

Now we will see an example of how much distroless can reduce the size of a container image for a simple Go application. The following main.go file contains the code for a simple Go application

package main

import (
"net/http"
"github.com/gin-gonic/gin"
"os"
)

func main() {
r := gin.Default()
r.GET("/", func(c *gin.Context) {
host, _ := os.Hostname()
c.JSON(http.StatusOK, gin.H{"hostname": host})
})

r.Run()
}

Now use the following command to create the container image from the following Dockerfile.

FROM golang:1.21 as build

WORKDIR /go/src/app
COPY ./main.go .

RUN go mod init github.com/mydistroless && \
go mod tidy && \
go vet -v

RUN CGO_ENABLED=0 go build -o /go/bin/app

# Now copy the go binary into the base image.
FROM gcr.io/distroless/static-debian12

COPY --from=build /go/bin/app /
ENTRYPOINT ["/app"]

Command to build the image

$ docker buildx build -t distroless-go .

The final application image size is just 12.4 MB.

$ docker images                  
REPOSITORY TAG IMAGE ID CREATED SIZE
distroless-go latest 72f49e7b09f2 3 weeks ago 12.4MB

Now let’s use the official Golang alpine 3.18 image to build the application container image by replacing gcr.io/distroless/static-debian12 with golang:1.21-alpine3.18 in the above Dockerfile. This time the image size is 232 MB.

From these two image sizes, it is clear how much network overhead we can reduce by using distroless images.

A good production system access should be restricted(even for the application developers) and application developers should have a proper observability setup to debug issues in the production system instead of having a shell inside the production pods. There is no shell available in the distroless image which enhances the security of application which uses distroless images. But this makes it hard to debug applications deployed in those images. However, distroless has a solution for this problem. You can use the :debug tag to get a busybox shell inside your container for debugging.

NOTE: For distroless images the ENTRYPOINT must be specified in vector form, to avoid the container runtime prefixing with a shell.

This works with distroless images:

ENTRYPOINT ["myapp"]

But this does not work:

ENTRYPOINT "myapp"

Though Distorless supports docker image building tools, but mainly distroless images are built with Bazel image builder. Bazel is a fast build tool that supports multiple languages. It supports Java, C++, Android, iOS, Go, and a wide variety of other language platforms. With advanced local and distributed caching, optimized dependency analysis, and parallel execution, Bazel creates fast and incremental builds. Bazel is one of the widely adopted builders and is trusted by Google, Square, Asana, and other well-known companies.

In summary, the advantages of distroless base images are:

  • Contains only the application and its runtime dependencies. They do not contain package managers, shells or any other extra programs.
  • Restricts container runtime to precisely what is necessary following the best practice employed by Google and other tech giants that have used containers in production for many years.
  • Distroless images are very small. The smallest distroless image, gcr.io/distroless/static-debian12, is around 2 MiB. That’s less than 30% of the size of Alpine (~7 MiB), and less than 2% of the size of Debian (124 MiB).
  • As there is no shell available, the runtime environment is already secured from external interference.
  • Reduced image size results in faster application start time due to less time for pulling images.
  • Reduces image size reduces build pipeline execution time in CI and improves SLA for the CI system and also reduces cost for the maintenance of the CI system.

Distroless is not the only initiative to reduce container image size. There are some other solutions that already exist but they have some limitations as well

  • Scratch: As the smallest image, the scratch could have been a possible choice for the base image but it has the following limitations:
    - Scratch containers miss proper user management.
    - Scratch containers miss important folders (/tmp, /home, /var).
    - Scratch containers miss CA certificates.
    - Scratch containers miss timezone information.
    - It does not have any security features or patching mechanisms.
  • Chainguard images: Chainguard is another suitable alternative to provide secure container images for production environments. Chainguard maintains a set of secure containers that can be used in production environments. Chaingurad uses apko to build their images.
  • Ko: Ko is another initiative to create an efficient container image builder but it can be used to build container images for only Go applications.

Considering all the advantages offered by Distroless images, it is clear that Distroless images can help to improve the security and reliability of the Kubernetes cluster and also can reduce cloud data transfer costs. However, it would be nice to know about other alternate perspectives as well. Please feel free to share your experience in the comment section if you already adopted distroless images. Also, I would be happy to know about any other solution that provides more benefit than distroless images at this moment.

--

--

No responses yet