ADR-0017: Enable container autodiscovery to scan container images from private registries
Status: | DRAFT ADR-0017 |
Date: | 2022-09-13 |
Author(s): | Simon Hülkenberg simon.huelkenberg@iteratec.com |
Context
The secureCodeBox offers an autodiscovery
feature which automatically creates scheduled scans when new services
or containers
(in pods) are created. The container autodiscovery will start a Trivy container scan for every distinct container in a given namespace. Currently, it is not possible to scan container images that require imagePullSecrets
to download the image. The container autodiscovery will create a scheduled scan but the scan itself will fail. Because of that limitation it is necessary to make the container autodiscovery aware of secrets to enable Trivy to download the container images.
Problem regarding imagePullSecretes
It is possible to configure multiple imagePullSecrets
in a pods yaml file as seen in this Stackoverflow post. Kubernetes will then try all available credentials until one succeeds. The container autodiscovery needs to do this too because Trivy does not support multiple imagePullSecrets
at once. Additionally one can only pass Trivy imagePullSecrets
as an environment variable whose name depends on the location of the image (see Trivy docs). For google cloud credentials one as to provide a json file path as an environment variable for example, but for dockerhub username and password as environment variable is sufficient. So Trivy configuration changes quite significantly with different platforms.
Another problem is the fact that imagePullSecrets
might change over time. It must be ensured that Trivy has access to the newest imagePullSecrets
when the scan starts.
Possible solutions
In general the easiest solution to provide Trivy with the required credentials are kubernetes secretes that are mounted as environment variables. The following solutions use this idea although the secrets are created through different approaches.
Using an initContainer and a sidecar
This option does not alter the behavior of the container autodiscovery at all. All additional logic is moved to an initContainer
and a sidecar
container.
The initContainer
reads the imagePullSecrets
of the pod that houses the container to be scanned. The initContainer
must determine which secret is the correct one for the scanned container image. It will then create a temporary secret following a predefined naming scheme (technically the Trivy pod will be created while the secret that is mounted as an environment variable doesn't exit yet, thus the temporary secret name must be predefined). After the initContainer
created the secret the Trivy container and the sidecar
will be started. The sidecar will continuously check if the main scan container exited (similar to the lurker) and then delete the temporary secret that was used to pass the credentials to Trivy to clean up after the scan to avoid cluttering of the namespace.
Pros:
- The autodiscovery logic will not change.
- No secrets that clutter the namespace
Cons:
- By default, a pod does not have sufficient permissions to create/delete secrets, so the pod needs to be assigned a
serviceAccount
with those permissions. - Logic split up over multiple containers, solution a bit hacky
- Edge Case: Changing the host platform of the container (example: AWS -> GCP) would break the scan because the required environment variables for Trivy must change (as described above). The parameters of the scheduled scan must be changed but the existing autodiscovery operator would lack the logic to do this (no read permissions on secrets so it would notice the difference).
Proof of concept
kind: ServiceAccount
metadata:
name: internal-kubectl
---
apiVersion: v1
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: modify-secrets
rules:
- apiGroups: [""]
resources:
- secrets
verbs:
- get
- create
- list
- delete
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: modify-secrets-to-service-account
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: modify-secrets
subjects:
- kind: ServiceAccount
name: internal-kubectl
---
apiVersion: batch/v1
kind: Job
metadata:
name: secret-test
spec:
template:
metadata:
name: secret-test-pod
spec:
serviceAccountName: internal-kubectl
restartPolicy: Never
volumes:
- name: shared-volume
emptyDir: {}
initContainers:
- command:
- sh
- -c
- pacman -Sy && pacman -S --noconfirm kubectl && kubectl create secret generic
some-image-pull-secret --from-literal=username=im_a_secret!
image: archlinux
name: init-secret
containers:
- command:
- sh
- -c
- echo secret is $env_test_secret && sleep 10 && touch /shared-volume/shutdown
env:
- name: env_test_secret
valueFrom:
secretKeyRef:
key: username
name: some-image-pull-secret
image: archlinux
name: imagine-this-is-Trivy
volumeMounts:
- name: shared-volume
mountPath: /shared-volume
- command:
- sh
- -c
- pacman -Sy && pacman -S --noconfirm kubectl && while [ ! -f /shared-volume/shutdown ]; do echo shutdownfile not found && sleep 2; done; kubectl delete secrets some-image-pull-secret
image: archlinux
name: secret-deletion-on-stop-sidecar
volumeMounts:
- name: shared-volume
mountPath: /shared-volume
Using an initContainer and use ownerReferences for the created secretes
Using the same initContainer logic described above but setting the ownerReference of the created secrets to the job/pod of the scan would result in similar behavior without the need of a sidecar.
Pros:
- No need for a sidecar while keeping the namespace clean of temporary secrets
Using a separate operator
Instead of creating the secrets for Trivy at scan runtime all secrets could be created upon pod creation. The new proposed operator would not listen on secrets but on pods (not every secret in a namespace is automatically used as an imagePullSecret
so it doesn't make sense to subscribe to secrets). When a pod gets created the new operator would check if it has imagePullSecrets
defined and if so it would check if the specific container was already processed. If not then it would check which imagePullSecret
belongs to which container and then create the Trivy secrets. Similar to the container autodiscovery this new operator would only create a secret for Trivy for every distinct secret. When a pod gets updated the new operator will check if the imagePullSecrets
changed. This approach will clutter the namespace quite substantially because the Trivy secrets exist continuously until the secret is not needed anymore (pod that uses the imagePullSecrets
gets deleted). To be clear: The scheduled scans are still created by the existing autodiscovery. The existing autodiscovery will check if the new operator created a secret for the specific container it wants to scan. If a secret for the container was created the existing autodiscovery would mount this secret. There needs to be a way to tell the existing autodiscovery which pod was processed by the new operator. One could use pod annotations to mark pods as processed (processed := secret was created by the new operator).
Pros:
- Existing autodiscovery does need permissions to read secrets
- It would be possible to only install the existing autodiscovery in case one doesn't like that the secret operator needs read permissions on secrets (Container Autodiscovery and new secret operator could be separate helm charts).
- One could alter the roles of the service account of the new secret operator to only enable it in certain namespaces while using the existing autodiscovery in every namespace (both could use different serviceAccounts).
Cons:
- Namespace gets cluttered by secrets created by the new operator
- The new secret operator needs permissions to read, create and delete secrets
- Edge Case: Changing the host platform of the container (example: AWS -> GCP) would break the scan because the required environment variables for Trivy must change (as described above) The parameters of the scheduled scan must be changed but the existing autodiscovery operator would lack the logic to do this (no read permissions on secrets so it would notice the difference).
Modifying the container autodiscovery
The container autodiscovery itself could test which imagePullSecret
belongs to which container and then create a scheduled scan that mounts the correct secret directly (without creating new intermediate secrets). With each pod update the autodiscovery would check if the imagePullSecrets
changed (by checking which secret the underlying scheduled scan mounts) and if the imagePullSecret
changed it would alter the scheduled scan accordingly.
Pros:
- No initContainers or additional operators
- Secrets will be mounted directly without creating intermediate secrets, Namespace will not be cluttered.
- One could alter the roles of the service account of the container autodiscovery operator to only allow read permission on secrets in certain namespaces while using the existing autodiscovery in every namespace.
Cons:
- Per default the container autodiscovery has read permissions on secrets. This can only be changed after installation by altering the clusterRole.