Skip to main content

Troubleshooting Declarative Networking

Given the following sample registration:

example MachineRegistration using Declarative Networking
apiVersion: ipam.cluster.x-k8s.io/v1alpha2
kind: InClusterIPPool
metadata:
name: elemental-inventory-pool
namespace: fleet-default
spec:
addresses:
- 192.168.122.150-192.168.122.200
prefix: 24
gateway: 192.168.122.1
---
apiVersion: elemental.cattle.io/v1beta1
kind: MachineRegistration
metadata:
name: fire-nodes
namespace: fleet-default
spec:
machineName: m-${Product/UUID}
config:
network:
configurator: nmc
ipAddresses:
inventory-ip:
apiGroup: ipam.cluster.x-k8s.io
kind: InClusterIPPool
name: elemental-inventory-pool
config:
dns-resolver:
config:
server:
- 192.168.122.1
search: []
routes:
config:
- destination: 0.0.0.0/0
next-hop-interface: enp1s0
next-hop-address: 192.168.122.1
metric: 150
table-id: 254
interfaces:
- name: enp1s0
type: ethernet
description: Main-NIC
state: up
ipv4:
enabled: true
dhcp: false
address:
- ip: "{inventory-ip}"
prefix-length: 24
ipv6:
enabled: false

We can expect each Elemental Machine to be configured using the defined nm-configurator _all.yaml template.

At the very first boot, the elemental-register will try to contact the Rancher API to register a new MachineInventory.
At this stage the machine's network is not configured and will default to DHCP. It is a requirement that the machine is able to contact the Rancher API in this setup, otherwise the registration can not take place.

Once the MachineInventory has first been registered, the elemental-operator is going to claim an IPAddress for each IPPool reference defined in the network configuration.
On the MachineInventory, this will look like the following:

apiVersion: elemental.cattle.io/v1beta1
kind: MachineInventory
metadata:
finalizers:
- machineinventory.elemental.cattle.io
name: m-e5331e3b-1e1b-4ce7-b080-235ed9a6d07c
namespace: fleet-default
spec:
ipAddressClaims:
inventory-ip:
apiVersion: ipam.cluster.x-k8s.io/v1beta1
kind: IPAddressClaim
name: m-e5331e3b-1e1b-4ce7-b080-235ed9a6d07c-inventory-ip
namespace: fleet-default
uid: 78f2d07a-7b6d-4b58-b615-c4108b7964b9
ipAddressPools:
inventory-ip:
apiGroup: ipam.cluster.x-k8s.io
kind: InClusterIPPool
name: elemental-inventory-pool
network:
config:
dns-resolver:
config:
search: []
server:
- 192.168.122.1
interfaces:
- description: Main-NIC
ipv4:
address:
- ip: "{inventory-ip}"
prefix-length: 24
dhcp: false
enabled: true
ipv6:
enabled: false
name: enp1s0
state: up
type: ethernet
routes:
config:
- destination: 0.0.0.0/0
metric: 150
next-hop-address: 192.168.122.1
next-hop-interface: enp1s0
table-id: 254
ipAddresses:
inventory-ip: 192.168.122.150
status:
conditions:
- lastTransitionTime: "2024-07-30T11:50:47Z"
message: NetworkConfig is ready
reason: ReconcilingNetworkConfig
status: "True"
type: NetworkConfigReady

You will notice that the MachineInventory carries the same network.config as the MachineRegistration, however instead of referencing IPAddressPools, we now have a map of real IPAddresses:

    ipAddresses:
inventory-ip: 192.168.122.150

This inventory-ip will then be substituted in the nm-configurator config whenever {inventory-ip} has been defined.

Also note that the MachineInventory references and owns each IPAddressClaim associated with it. Each claim follow the predictable $MachineIventoryName-$IPPoolRefKey naming convention: m-e5331e3b-1e1b-4ce7-b080-235ed9a6d07c-inventory-ip.
These claims will follow the lifecycle of the MachineInventory object and be deleted on cascade, for example during the reset workflow.

If the IPAddresses can not be claimed, the NetworkConfigReady condition will be False, preventing the machine from completing installation. This can be the case if the IPPool has no more IPAddresses available.

On the machine side​

During the installation phase, the elemental-register process running on the machine will receive the nm-configurator _all.yaml config template and the list of claimed IPAddresses with their keys. This information will be digested to an applicable nm-configurator configuration:

config:
dns-resolver:
config:
search: []
server:
- 192.168.122.1
interfaces:
- description: Main-NIC
ipv4:
address:
- ip: "192.168.122.150"
prefix-length: 24
dhcp: false
enabled: true
ipv6:
enabled: false
name: enp1s0
state: up
type: ethernet
routes:
config:
- destination: 0.0.0.0/0
metric: 150
next-hop-address: 192.168.122.1
next-hop-interface: enp1s0
table-id: 254

The elemental-register will then invoke nmc generate and nmc apply to apply this configuration into the running system.
From this moment until reset, the machine will always use the applied configuration.

Also note that outside of installation and reset, nm-configurator is no longer used, since the elemental-register will persist the /etc/NetworkManager/system-connection/*.nmconnection files generated by nmc rather than the nmc configuration itself.

For example on any running system, you will find a yip configuration file (/oem/94_custom.yaml) to apply the desired nmconnections, for example:

name: Apply network config
stages:
initramfs:
- files:
- path: /etc/NetworkManager/system-connections/Wired connection 1.nmconnection
permissions: 384
owner: 0
group: 0
content: |
[connection]
id=Wired connection 1
uuid=d26b4ae4-d525-3cbf-a557-33feb60343c0
type=ethernet
autoconnect-priority=-999
interface-name=enp1s0
timestamp=1722340245

[ethernet]

[ipv4]
address1=192.168.122.150/24
dhcp-timeout=2147483647
dns=192.168.122.1;
dns-options=
dns-priority=40
method=manual
route1=0.0.0.0/0,192.168.122.1,150
route1_options=table=254

[ipv6]
addr-gen-mode=eui64
dhcp-timeout=2147483647
method=disabled

[proxy]

[user]
nm-configurator.interface.description=Main-NIC
encoding: ""
ownerstring: ""

During reset​

Whenever reset is triggered, the elemental-register running on the machine will clear any /etc/NetworkManager/system-connection/*.nmconnection file and restart the network stack. The machine should then revert to DHCP and after that confirm to the elemental-operator on the management side, that reset has succeeded.

Following that, the machine should reboot into recovery mode, perform the actual reset and receive a fresh network configuration to be applied. Potentially this will be the same as before (if the MachineRegistration has not been updated), or it may have different IPs since the previous ones may have been claimed by other machines in the meanwhile.

If reverting to DHCP failed or the machine is anyhow unable to contact the Rancher API back for confirmation, you will notice that the MachineInventory will not be deleted, despite having a deletion timestamp.
Since the machine has now network issue, it won't be possible to remotely fix it.

You have the option to physically reach the machine or by any means fix the DHCP driven network configuration, or alternatively you can remove the machineinventory.elemental.cattle.io finalizer from the MachineInventory, to allow deletion, if you intend to decommission the machine.