Audit Logging in RKE2 Kubernetes
While being an important topic (especially in multi-tenant clusters), Audit Logging is a rather underdocumented feature of Kubernetes. Compared to other functionalities in Kubernetes, there’s a quite high entry hurdle in using it, because first of all you need to hand-write the policy’s YAML and second you need to manually alter your control plane’s configuration because auditing cannot be configured using kubectl
.
Audit Policy
First things first, we need to tell Kubernetes which events are actually relevant for us. I often see people enable logging for virtually everything and I urge you to not do that. Kubernetes Audit Logs tend to get huge in no time, because even if you’re not doing anything actively with your cluster, there’s a lot of things going on in the background and even node-to-node communication may be logged.
I’m pretty sure that 90% of the Kubernetes administrators out there simply want to know which human user did what at what specific point in time? And that’s exactly what we’re going to answer in this article.
My personal approach when it comes to Audit Logging in Kubernetes is to log literally nothing but the things I’m specifically interested in. Then, if I see that I don’t have enough information, I’m granularly adding more logging. So our starting point for the Audit Policy is this:
You might have guessed it: That policy doesn’t do anything and it won’t log anything. And that’s a good thing, because now we can simply add the things that are interesting for us.
For the next step, you need to know who your users are or which group they belong to. Unless you only have a very small and static amount of users, I recommend scoping your logging to groups instead of individual users.
So now we just want to log anything our human users are doing:
This will now log any actions of users who are member of the cluster-admin
group at the Request-level. That means that the audit log will only contain the payload the user sent to the Kubernetes API, not the response body the user receives. If you need that body as well, you can simply adjust the level from Request
to RequestResponse
.
The rule has been put above the last rule that drops anything. That’s very important, because once an event is matching a policy rule, the evaluation of the policy will stop. So if you’d switch positions of the rules, the new rule wouldn’t do anything, because our “bail-out rule” would apply first and evaluation would be stopped at that point.
If you’re using OIDC Authentication with group name prefixing, you can e.g. use oidc:my-group
in the userGroups
array to match against a certain group that comes from your authentication provider. Please note that, unfortunately, it’s not possible to use wildcards in the group names or user names.
Now, that’s a good starting point but there’s one more thing you should keep in mind. We now log anything at a request-level that comes from our human users. And anything implies requests to edit or create secrets as well. As those requests are logged including the request body, the secret’s contents would be intercepted and written into our audit log. That’s obviously bad practice, so we’ll add another rule to make sure those requests are only logged at the Metadata-level which means it neither includes the request nor the response body.
Again, we put the rule above the other ones. The primary mantra for those audit policy rules is: From most specific to least specific. This new rule will now match all requests going to endpoints that take care of the Secret
resource and adjust their log level to Metadata
, thus effectively preventing leaking any contents of the accessed secrets into our audit logs.
Save the audit policy on each control plane node under /etc/rancher/rke2/audit.yaml
.
Reference: Log Levels
For reference, these are all four possible log levels you can assign to an audit policy rule.
Log Level | Description |
---|---|
None | Don’t log the event at all |
Metadata | All basic request information (e.g. Timestamp, User, Accessed Resource, etc.) |
Request | Metadata plus the request body sent by the client |
RequestResponse | Request plus the response body sent back to the client |
Control Plane Configuration
Now that you have your audit log policy ready to go on all control plane nodes under /etc/rancher/rke2/audit.yaml
, it’s time to actually configure your Kubernetes to use those rules.
Responsible for receiving and processing all API requests including those from kubectl
is the kube-apiserver
component of the control plane. In RKE2 there’s no direct audit logging arguments you can use, but luckily you can modify the kube-apiserver
command line arguments.
To do so, head over to your RKE2 server configuration file and open it with an editor of your choice:
Then you need to add the arguments you want RKE2 to start the kube-apiserver
binary with:
What’s happening here?
audit-policy-file
: The path to the audit policy YAML file. RKE2 is intelligent enough to make sure that this file is automatically mounted into the kube-apiserver’s Pod. So no need to worry about any additional mounts for it.audit-log-path
: The path the audit log file shall be written to. That’s an important one, we highlight below.audit-log-maxage
: The max amount of days worth of audit logs Kubernetes shall keep before deleting old ones.audit-log-maxsize
: The max filesize in Megabytes worth of audit logs Kubernetes shall keep before deleting old ones. Don’t add a suffix, it’s always Megabytes.
Now, there’s one thing we need to highlight again: audit-log-path
. As RKE2 is running the control plane components as Pods in Kubernetes itself, the kube-apiserver
is running in a container. That means that the path, the audit log is being written to, is seen as from within that container. And while RKE2 is smart enough to make sure that any configuration files are properly mapped as volumes, that doesn’t count for the audit log path.
So in order to actually write the audit log on the disk of your control plane node, you need to add an extra volume mount to the kube-apiserver
Pod which can be done by adding yet another configuration parameter to the RKE2 config file:
That’s now the final RKE2 server config adjustment you have to make to get audit logging up and running. To activate the changes, create the /var/log/rke2
directory and perform a restart of the RKE2 server on your control plane nodes (one by one):
Afterward, you should see a new audit.log
file in /var/log/rke2/
containing all audit events in JSON format. You should check the file and see if there’s any unwanted things that are being logged (or if there are things missing) and adjust your audit policy accordingly.