johanneskueber.com

Running a rootless NetBird Routing Peer Inside Kubernetes

2026-05-25

This article documents how to run a NetBird client inside a Kubernetes cluster as a routing peer — rootless, with an in-memory state volume, a shell-based readiness probe, and dropped capabilities — so that remote mesh members can reach cluster-internal services over the encrypted overlay, without exposing kernel WireGuard and without making the cluster a peer of every workload.


1. Why a NetBird peer in the cluster

NetBird is a WireGuard-based identity mesh. Each peer holds keys, exchanges them via the management plane, and routes encrypted traffic to other peers. The default deployment runs the client on a host (Linux VM, MacBook, etc.) and uses the host’s kernel WireGuard.

Kubernetes clusters are not “a host” in this sense. Mounting kernel WireGuard inside a Pod requires CAP_NET_ADMIN and host-network mode, which is rarely acceptable on a production cluster. NetBird’s rootless image solves this: it includes a userspace WireGuard implementation (wireguard-go), runs as a non-root user, drops all capabilities, and exposes the same management plane as the kernel client.

The pattern that emerges: a Deployment runs the rootless NetBird client, joins the mesh with a routing-peer setup key, and announces a subset of cluster subnets into the mesh. Remote peers — a developer laptop, an on-call SRE, another cluster — reach cluster-internal services through this peer over the encrypted overlay, without those services being exposed to the public internet. The traffic direction is mesh → cluster: no cluster Pod uses this peer for egress. Giving a workload Pod outbound mesh access is a different pattern with different capability requirements (see §5).

Why not the netbird-operator

The obvious alternative is the netbird-operator — a Kubernetes operator that owns peer lifecycle, provisions the Deployment, manages the setup-key Secret, and registers peers with the management plane. I deliberately do not use it here. The project is young, the CRD surface and reconciliation behavior are still moving, and the extra moving parts (an operator Pod, RBAC for peer registration, a leader-election lease, CRD-version migrations on upgrade) are not justified for a setup that fits in two manifests. A plain Deployment plus a setup-key Secret is auditable at a glance and survives an operator-breaking release without intervention.


2. The Deployment

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
apiVersion: apps/v1
kind: Deployment
metadata: { name: netbird-routing-peer }
spec:
  replicas: 2
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1
  selector: { matchLabels: { app: netbird } }
  template:
    metadata: { labels: { app: netbird } }
    spec:
      terminationGracePeriodSeconds: 20
      containers:
        - name: netbird
          image: netbirdio/netbird:0.71.2-rootless@sha256:6148...
          securityContext:
            privileged: false
            runAsNonRoot: true
            runAsUser: 65532
            allowPrivilegeEscalation: false
            capabilities:
              drop: ["ALL"]
          env:
            - { name: USER, value: netbird }
            - { name: HOME, value: /var/lib/netbird }
          volumeMounts:
            - { name: netbird-data, mountPath: /var/lib/netbird }

          readinessProbe:
            exec:
              command:
                - sh
                - -ec
                - |
                  out="$(netbird status 2>/dev/null || true)"
                  echo "$out" | grep -Eq '^Management:\s+Connected' &&
                  echo "$out" | grep -Eq '^Signal:\s+Connected'
            periodSeconds: 5
            timeoutSeconds: 2
            failureThreshold: 3

      volumes:
        - name: netbird-data
          emptyDir: { medium: Memory }

Field reference:

  • image: netbirdio/netbird:...-rootless — the rootless variant. The default image runs as root and binds kernel WireGuard; the rootless variant runs wireguard-go in userspace. Pin to a digest because rolling tags occasionally regress on the rootless path.
  • securityContext.capabilities.drop: ["ALL"] — the NetBird process handles both the userspace WireGuard tunnel and the mesh → cluster forwarding without touching the kernel routing table or iptables, so no CAP_NET_ADMIN is required. Dropping every capability removes the most common privilege-escalation surface. The unrelated mesh-access sidecar pattern (giving a workload Pod outbound mesh access) does need elevated caps — see §5.
  • runAsNonRoot: true + runAsUser: 65532 — the nobody user inside the upstream rootless image.
  • env.USER=netbird, env.HOME=/var/lib/netbird — NetBird stores its state (private key, peer cache, config) under $HOME/.netbird. Setting HOME explicitly prevents fallback to /.
  • volumes.netbird-data.emptyDir.medium: Memory — the state lives in tmpfs. The private key never touches disk; on Pod restart, NetBird re-authenticates with the management plane via the setup key. Ephemeral peers (set via the Tofu config) clean up cleanly on the management side.
  • strategy.rollingUpdate.maxUnavailable: 0 — keeps at least one peer always available during rollouts.
  • terminationGracePeriodSeconds: 20 — enough time for NetBird to deregister from the management plane on SIGTERM. Going below 10 s causes the peer list to fill with zombie peers.

3. The readiness probe is shell, not HTTP

NetBird’s client does not expose an HTTP health endpoint. The CLI prints a status block to stdout; the probe parses it:

1
2
3
out="$(netbird status 2>/dev/null || true)"
echo "$out" | grep -Eq '^Management:\s+Connected' &&
echo "$out" | grep -Eq '^Signal:\s+Connected'

Both conditions must hold:

  • Management: Connected — control-plane reachable, peer is authenticated, peer list is current.
  • Signal: Connected — STUN/relay reachable, this peer can be reached by NAT-traversed peers.

A peer that is authenticated but not reachable (firewall, IPv6-only with broken v4) reports Management: Connected + Signal: Disconnected and should not receive traffic — which is exactly what failureThreshold: 3 + periodSeconds: 5 enforces (a Pod is removed from endpoints after roughly 15 seconds of degraded signal).

The shell construction is precise: || true after netbird status prevents the exec probe from failing on transient CLI errors during startup; the grep -Eq matches at line-start to avoid matching unrelated status lines.


4. Setup keys and identity

The Pod authenticates to the NetBird management plane via a setup key. The key is a reusable, ephemeral, auto-group-assigning token issued by the management plane. In OpenTofu:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
resource "netbird_setup_key" "routing_peer" {
  name                   = "routing-peers-edge"
  expiry_seconds         = 0
  type                   = "reusable"
  allow_extra_dns_labels = true
  auto_groups            = [netbird_group.edge.id, netbird_group.edge_peers.id]
  ephemeral              = true
  revoked                = false
  usage_limit            = 0
}

Field reference:

  • type: "reusable" — multiple Pods (and Pod restarts) consume the same key. The alternative one-off would require regenerating the key after every restart.
  • expiry_seconds: 0 / usage_limit: 0 — both unlimited. Reusable keys for production peers; tighter limits for one-off provisioning.
  • auto_groups — every Pod that joins via this key is automatically added to the listed NetBird groups. This is the identity attach point: ACLs reference these groups, not individual peers.
  • ephemeral: true — peers using this key are marked for automatic cleanup after disconnection. Pairs with the tmpfs state volume: the peer record vanishes from the management plane when the Pod terminates.
  • allow_extra_dns_labels: true — permits peers to advertise DNS labels beyond their hostname. Used when the peer publishes service subdomains into the mesh.

The setup key is delivered to the Pod via a Kubernetes Secret, mounted as NB_SETUP_KEY env var (omitted from the manifest above for brevity).


5. Routing peer vs. mesh-access sidecar

The manifest above is a routing peer: it terminates the WireGuard tunnel in userspace and forwards mesh → cluster traffic through the NetBird process itself. No kernel-level NAT, iptables, or route-table changes are involved, so drop: ["ALL"] is sufficient and the rootless image works without modification.

The opposite direction — giving a workload Pod outbound access to the mesh via a sidecar container — is a different pattern. To redirect the Pod’s own traffic into the WireGuard tunnel, the sidecar must install iptables rules or rewrite the Pod’s route table. Both require CAP_NET_ADMIN (the iptables path also wants CAP_NET_RAW). With runAsNonRoot: true + drop: ["ALL"], neither is available, so the rootless image is the wrong building block for that use case. Either run the non-rootless NetBird image with the elevated caps, or attach the Pod to a CNI-managed mesh that handles the redirect outside the workload Pod’s container.

Routing peer (this article)Mesh-access sidecar
Traffic directionmesh → clusterPod → mesh
What forwards the packetNetBird userspace processKernel, via iptables or route rules installed by the sidecar
Required capabilitiesnoneCAP_NET_ADMIN (+ CAP_NET_RAW for the iptables path)
Works with the rootless imageyesno
Typical use caseReach cluster-internal services from a developer laptop or another clusterGive a specific Pod access to a mesh-only resource without making the cluster a peer for everything

This article only covers the routing-peer column. Don’t copy this manifest as the basis of a mesh-access sidecar — the cap-drop block will keep the sidecar from doing anything useful, and the failure mode (mesh appears up; Pod traffic still uses the cluster default route) is easy to misdiagnose.


6. Verifying the result

Pod ready, peer joined:

1
2
3
4
5
6
7
8
kubectl get pod -n netbird -l app=netbird
# NAME                                READY   STATUS    RESTARTS   AGE
# netbird-routing-peer-7f9c...-abc    1/1     Running   0          2m

kubectl exec -n netbird deploy/netbird-routing-peer -- netbird status
# Management: Connected
# Signal:     Connected
# Peers:      12/12

Peers: 12/12 means all twelve mesh peers are reachable through STUN or via relay. A 2/12 indicates firewall problems; check netbird debug status for the per-peer details.

Peer visible in management plane:

1
2
3
curl -s -H "Authorization: Token $NETBIRD_TOKEN" \
  https://api.netbird.io/api/peers | \
  jq '.[] | select(.hostname | startswith("netbird-routing-peer"))'

The Pod’s hostname (netbird-routing-peer-...) appears with connected: true, group membership matching the setup-key’s auto_groups, and the peer’s mesh IP.

Traffic reaches a mesh peer:

1
2
kubectl exec -n netbird deploy/netbird-routing-peer -- \
  curl -s http://<remote-mesh-ip>:8080/healthz

Reach a service on another mesh peer by mesh IP. If this succeeds and the same curl from a non-NetBird Pod fails, the mesh path is functional and the boundary is correctly drawn.


stat /posts/2026-05-25-netbird-routing-peer/

2026-05-25: Initial publication of the article