Our company is in process of upgrading to AlmaLinux 9 and I’ve recently stumbled upon very odd issue with some of our workstations. This never happened with AlmaLinux 8, or any other older distro.
We have different kinds of computer configurations, but this only occurs on Dell Precision 7820 computers with builtin Intel intel I219-LM network adapter. Our installation is going through PXE server and a kickstart file. Once the distro is installed, it boots with no network address, therefore some services are in failed state (dbus, NetworkManager etc) and so even the GUI cannot be started, leaving the machine in unusable state.
I switched to CLI installation just so I can try debugging, because graphical.target wouldn’t even let me switch to terminal.
The OS can see the network adapter, but won’t lease an address via DHCP in timely manner:
nmcli device
Error: Could not create NMClient object: Could not connect: Connection refused.
After 20 to 40 min I can restart successfully the NetworkManager service and it leases IP. OR If I restart the network.target Once that happens everything seems normal.
I’ve also noticed that if I install dhclient package to manually lease ip it will work.
Any advice will be appreciated. Thanks!
Lack of network address does hinder some services, but NetworkManager is not one of them. NetworkManager sets the network address. On my system:
$ systemd-analyze critical-chain NetworkManager.service
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.
NetworkManager.service +380ms
└─network-pre.target @1.960s
└─nftables.service @1.706s +253ms
└─basic.target @1.681s
└─dbus-broker.service @1.655s +24ms
└─dbus.socket @1.631s
└─sysinit.target @1.622s
└─systemd-update-utmp.service @1.615s +6ms
└─auditd.service @1.574s +33ms
└─systemd-tmpfiles-setup.service @1.516s +48ms
└─local-fs.target @1.512s
└─run-credentials-systemd\x2dtmpfiles\x2dsetup.service.mount @2.516s
└─local-fs-pre.target @641ms
└─systemd-tmpfiles-setup-dev.service @548ms +22ms
└─kmod-static-nodes.service @433ms +50ms
└─systemd-journald.socket
└─system.slice
└─-.slice
Which of these is the earliest that does fail on boot? Why does it fail?
Alright. Thank for your attention. As expected, the problem wasn’t actually in the networking, but something else.
This morning I managed to isolate the issue, but still can’t explain why this occurs only on those model workstations. We have like 80 different computers that have no such problem whatsoever.
During the post section of our KS there’s a bash script that runs last to set some custom stuff for our needs. One of those things is deleting /var/tmp, creating /tmp/var and symlinking it to /var/tmp.
Apparently lack of /var/tmp folder causes all of this on those Dells. Right now I’m writing this on a older Dell Precision with Alma9 that installed just the same way with no worries.
I had problems symlinking /var/tmp to elsewhere - it caused problems with systemd-tmpfiles that has settings for /var/tmp
I ‘fixed’ my problem by bind mounting the new location on /var/tmp - although my new location wasn’t under /tmp - I’m not sure if this would work with the new location being under /tmp ?