Kernel error HPE ProLiant ML30 Gen11 E 2414

Hi guys,

I have installed AlmaLibux 8.5 and latest 8.10 (different fresh builds for diagnostics purposes) on a HPE ProLiant ML30 Gen11 E 2414 server. Have also patched up server bios/ firmware to latest service pack. However, on both builds when left idle when logged in as the root user the below error message appears. Any ideas what this would indicate? Looks like something the kernel is not liking. How can I resolve this? Thanks in advance.

Link to log

What is the 'local-tunnel.service` that fails? It is not from base repos.


The ā€˜tg3ā€™ is a network device driver. Which exact NIC(s) do you have?
lspci -nn | grep -i net

Please find below as requested:

01:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe [14e4:1657] (rev 01)
01:00.1 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe [14e4:1657] (rev 01)
01:00.2 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe [14e4:1657] (rev 01)
01:00.3 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe [14e4:1657] (rev 01)

ā€˜local-tunnel.serviceā€™ is a bepoke service in which automatically runs up an ssh connection to a remote server using public/ private key authencation with the following execution:

[Unit]
Description=ā€œSetup a local tunnelā€
After=network.target

[Service]
EnvironmentFile=/u/tunnels/local-tunnel
ExecStart=/usr/bin/ssh -4 -i ${PATH_TO_KEY} -o ServerAliveInterval=60 -o ExitOnForwardFailure=yes -nNT -L ${LOCAL_PORT}:${REMOTE_ADDR}:${REMOTE_PORT} ${REMOTE_USER}@${REMOTE_HOST}
RestartSec=15
Restart=always
KillMode=mixed

[Install]
WantedBy=multi-user.target

For diagnostic purposes, I have disabled this service.

Hello again,

From having disabled the local-tunnel service for diagnostics purposes I can report that the error still occurs. Please find attached new log.

Log 2

@toracat might know something more about wonky drivers.

1 Like

Looks like Broadcom makes the Linux driver for BCM5719 available for personal use. If this is distributable to the public, ELRepo can package it is as a kmod.

Can you test-install the latest mainline kernel to see if the current version of the tg3 driver works better? You can find kernel-ml-6.10.0-0.rc3.el8.elrepo.x86_64 here:

https://elrepo.org/people/akemi/testing/el8/kernel/6_10/

1 Like

Thank you for this. I have installed as per request. I will bench test for the next few days and let you know how it goes. Thanks for your help.

rpm -ivh kernel-ml-6.10.0-0.rc3.el8.elrepo.x86_64.rpm
error: Failed dependencies:
kernel-ml-core-uname-r = 6.10.0-0.rc3.el8.elrepo.x86_64 is needed by kernel-ml-6.10.0-0.rc3.el8.elrepo.x86_64
kernel-ml-modules-uname-r = 6.10.0-0.rc3.el8.elrepo.x86_64 is needed by kernel-ml-6.10.0-0.rc3.el8.elrepo.x86_6

rpm -ivh kernel-ml-core-6.10.0-0.rc3.el8.elrepo.x86_64.rpm
Verifyingā€¦ ################################# [100%]
Preparingā€¦ ################################# [100%]
Updating / installingā€¦
1:kernel-ml-core-6.10.0-0.rc3.el8.e################################# [100%]

rpm -ivh kernel-ml-modules-6.10.0-0.rc3.el8.elrepo.x86_64.rpm
Verifyingā€¦ ################################# [100%]
Preparingā€¦ ################################# [100%]
Updating / installingā€¦
1:kernel-ml-modules-6.10.0-0.rc3.el################################# [100%]

rpm -ivh kernel-ml-6.10.0-0.rc3.el8.elrepo.x86_64.rpm
Verifyingā€¦ ################################# [100%]
Preparingā€¦ ################################# [100%]
Updating / installingā€¦
1:kernel-ml-6.10.0-0.rc3.el8.elrepo################################# [100%]

reboot

I have conducted bench testing for over a 24-hour period and can confirm that the issue has not resurfaced. Typically, this issue would occur within an hour of the unit being powered on.

Will this be included into a new release? Thanks for your help, much appreciated.

96 hours later all still stable.

Thatā€™s good news. At this point I suggest you file a request for a kmod package for the tg3 driver with ELRepo ( My View - ELRepo Bugs ) . Providing a link to this forum thread will be helpful.

1 Like

Here it is:

https://elrepo.org/bugs/view.php?id=1464

1 Like

Thanks for this. I have performed a fresh install of OS (AlmaLinux 8.5 fully patched up to 8.10), applied install of ā€˜kmod-tg3-3.139k-1.el8_10.elrepo.x86_64.rpmā€™ from https://elrepo.org/linux/elrepo/el8/x86_64/RPMS/kmod-tg3-3.139k-1.el8_10.elrepo.x86_64.rpm as seen via 0001464: Broadcom tg3 transmit timeout - ELRepo Bugs and iā€™m afraid to say the issue occured again. Please see attached log 3:

Log 3

I would like to mention that this is the first HPE ProLiant ML30 Gen11 E 2414 on which we have installed AlmaLinux 8.x. However, itā€™s important to note that we used the exact same installation process on another server of the same model (out the box, with no patches/ software mention in this thread), and that server does not experience the issue. I have contacted HPE and provided the relevant hardware-level logs. They confirmed that the hardware is functioning correctly, even though the issue persists with this particular server.

Do I need to udpate 0001464: Broadcom tg3 transmit timeout - ELRepo Bugs with these findings? Apologies, I am unsure of the process.

I believe so. They (ELRepo) do build the module, so they need to know how it behaves.