Discussion of kernel changes in HPC

Originally posted as a reply to this, and moved to its own thread by moderator.

So this is the sort of problem kernel changes make in the HPC world where proprietary parallel file systems are common (think GPFS/Panasas/Weka etc.)

My experience is mainly with GPFS (currently called StorageScale by IBM) over the last two decades. I can also imagine problems with NVidia drivers for GPU’s too, though those are somewhat more tolerant to kernel changes in my experience.

My current solution as I have RHEL subscriptions for the hardware running the GPFS file system is to download the genuine kernel source code, build my own kernel RPM’s from that and deploy those in place of the Alma kernels. As I am not distributing the kernel or using the RedHat binary RPM’s I believe I am legally good to go. However, even with this, the lack of testing may well cause breakages in the future.

Our other problem is vendor support for software. So for example Ansys who did support CentOS7 now only supports genuine RHEL (along with SLES and Ubuntu) and well I don’t have the budget to change to genuine RHEL across the whole cluster. Far too many nodes that need full server subscriptions rather than HPC node subscriptions.

We are probably going to be forced out of the RHEL ecosystem, but this is a big task and likely be tied to moving on from RHEL8 and it’s rebuilds.

Hey @jabuzzard! I want to help, but I’m trying to understand. Are you saying the problem is that the change that fixed the CVE was a change in the kernel that ultimately caused the break in your filesystem?

Yes, the change that fixed the CVE in the kernel broke the compilation of the file system driver for GPFS. The link I provided details the issue with all the potential and often messy workarounds that are required to get yourself back up and running.

These sorts of issues with changes to the kernel are a constant headache in the HPC space even with genuine RHEL kernels, so diverging kernels are likely going to be a problem. As I said my current solution is to do my own rebuilds of the kernel RPMS as I have access to the kernel SRPM’s anyway and won’t be distributing the results. It is however yet more pain.

Ah! Okay, so I think it’s mostly just frustration with the fact that this change happened in general, and nothing specific to AlmaLinux here.

If there’s something we can do to help, please do let us know, or join the new HPC/AI SIG to provide input.

Woah, hold your horses there cowboy. What I was illustrating with the link is how seemingly minor variations in the kernel source in what is supposed to be a “stable” distribution even within a point release can cause problems. The question is now that Alma is no longer even attempting to follow the RHEL kernel but deliberately deviating from it then these sorts of issues are likely to be much more common. What if your kernel changes break something like GPFS or the NVidia GPU drivers etv. which target a genuine RHEL kernel?

HAH! That’s quite an intro there. :smiley:

The changes that we may or may not choose to implement in the kernel are scrutinized thoroughly before we make them, but your point is deeply understood. IF and WHEN we deviate from RHEL, it’s without impacting API compatibility. This change (the bug fix you’re referencing), for example, does impact the API and so is a change that we wouldn’t even consider.

The “even attempting to follow” is a pretty grand statement. We do, in fact, follow the RHEL kernel as closely as one can without copying source code they have chosen not to share. The one change that I assume you’re using as your guide here (supporting older hardware compatibility), was deeply and widely considered before implementing. It’s a show of us meeting the needs of our users (of which HPC users are a huge portion and were some of the biggest supporters of this change), but not a deviation from our north star: a RHEL-compatible (API-compatible, specifically) operating system.

The “what if” questions are still plentiful and feel very large to risk-averse companies that are used to being able to count on CentOS as an exact duplicate of RHEL that they didn’t have to pay for, but the answer is simple to us: that would be considered a huge problem and would be fixed as such.

It boils down to this: if a software works on RHEL, it should also work on AlmaLinux, and if it doesn’t (for a reason that is in our OS and not in the software), we consider that a bug.

That’s all reassuring but what happens down the line when supporting this additional hardware is mutually exclusive with being able to support GPFS, NVidia etc.? Who gets the kick? Do you boot the ancient hardware again or would those that need GPFS, NVidia etc. support get dropped by the roadside?

A very large part of me says if you have hardware that is not supported by a vanilla RHEL8 kernel then this should have come too late in the day to be of any use. That is you should already have had a plan on what you were going to do when RHEL7/CentOS7 went end of line and have or be in the process of implementing it.

So for example say you had a Dell R610/R710 with a H200 card, you were either going to ditch the machine because is ~12 years old or replace the H200 with a H700 for under $50 and get it working in RHEL8 and derivatives out the box even if the H700 is unsupported. I would strongly argue if you have not already done so then what the hell. Noting that RHEL9 and derivative support would require you to fit a network card anyway as the onboard Broadcom NIC’s don’t work in RHEL9 and haven’t been added back. At least they are not in the list of hardware reenabled. Not that we care about onboard 1Gbps cards in 2024 anymore. The only stuff slower than 10Gbps is the management network.

To use the impact these changes would mean I could continue to use three very ancient Sun X4170 machines but these are now over 15 years old and continued use in a production environment is a dubious endeavour. I also have an ancient Dell R300 I could reenable but again not really terribly useful due again to it’s extreme age.

Don’t get me wrong we have had to replace adaptors to accommodate RHEL8, so a bunch of HP Qlogic 10Gbps SFP+ cards with Intel X520DA2’s and all the Truescale Infiniband cards with ConnectX-3 in the undergraduate teaching HPC cluster. However, none of these cards have been added back so the changes are not terribly helpful in that regard and all of those changes have already been made well before this announcement because I am a competent system administrator.

Of course, if you wanted a RHEL like experience but with wider hardware support then you have always had the option of holding your nose and installing Oracle Linux with their UEK and getting a much wider list of hardware supported again. It is what I have done at home from my Plex server as RedHat pulled all the video capture cards from RHEL9 and at the time didn’t even include this little nugget of information in their release notes.

In conclusion, my general view is these changes came far to late to be useful to competent system administrators, and impact hardware that continued use in production environments is of dubious value. If you are a hobbyist then you always have the option of installing Oracle Linux anyway if you want to continue to use ancient hardware.

The other option of course was just go any buy CentOS 7 ELS from TuxCare, something we are going to have to do for at least one machine because the SMB filesystem driver is busted beyond RHEL7 with DFS shares. Note it is as far as I can determine a Linux Kernel problem because it is also busted in Ubuntu in a slew of versions.

This to me seems to be the bulk of the question in your mind, but I think it’s pretty clear from my side: our commitment is compatibility. If compatibility is broken by supporting older hardware, then we drop older hardware. It doesn’t do any good for our community if we are diverging to do nothing more than break our commitment to compatibility.

Agreed on the “let’s get people to use more modern stuff” but that’s exactly what we’re doing here. Getting folks on more modern OSes even if their hardware isn’t brand new. If they can still use it with this extremely minimal change from us, then I’d rather help them get to a supported OS now, and have them anticipate the move later.

Lifecycle management is very hard in all kinds of places, and unfortunately, the answer for LOTS of organizations is to let the old OS sit there until there’s a forcing factor (needed feature, root level compromise, harddrive failure, etc) that pushes it to the top of the list.

What I will contribute to this is: Nvidia did not release kmods for Alma’s kernel-5.14.0-362.24.2.el9_3.x86_64 at Index of /compute/cuda/repos/rhel9/x86_64. My suspicion is that 24.2 is sufficiently compatible with 24.1 that, were Nvidia’s kmod packages kABI-tracking via weak-modules then the modules would happily run on 24.2, however, they are not, and it does not work out-of-the-box (in fact, some combination of Nvidia’s packaging metadata and its DNF plugin responded by breaking the kernel installations so that weak-modules stopped working for kmod-lustre-client. But I digress).

Similarly, DKMS would probably work, but simply speaking, using the kmods from Nvidia’s RHEL9 repository on AL9 does not work the way it does on RHEL9, and in that sense there’s an incompatibility. The kernels are likely compatible in the sense that I could probably rebuild kmod-nvidia for the 24.2 kernel, but that’s an API sense of compatibility rather than a usage one. I rebuild Lustre, ZFS, and MOFED all the time (and can confirm at least the first two do indeed link up fine to 24.2), but I am not presently familiar with doing that for Nvidia because Nvidia have been good enough about publishing builds that I haven’t needed to. Nvidia are apparently not applying that effort to Alma.

It’s an incompatibility more likely in how Nvidia are packaging rather than the code itself, and I don’t know how easy it would be to convince them to either make their kmod packages kABI-tracking or make additional releases for any bonus Alma kernels – I imagine it’s connected to QA and they might not want to publish packages that make combinations available that haven’t been subject to the same testing – but if in the future Alma publish a bonus kernel without some way of avoiding these kind of mismatches (whether implemented within AlmaLinux or by third parties), then there there are going to be users who go to follow instructions that work on RHEL and do not on AlmaLinux.

I had the “dkms” version of drivers from NVidia’s “CUDA repo” and install of the 24.2 did get symlinks into its weak-modules that did point to “extras” in 24.1’s tree, where the kmod-nvidia--24.1- had the kernel modules for the 24.1 kernel. The systems did appear to function with that. (If I did not have the 24.1 kernel and kmod-nvidia installed, then installation of the 24.2 kernel either had dependency issue or just lacked symlinks.)

Either I was lucky, or did not look closely enough.

In which case you need to make it clear who is going to get the short end of the stick up front and publicly. That is these additional drivers will only be available so long as they don’t break compatibility. At this point, they become useless because they could be pulled at any point in time to maintain compatibility with effectively zero notice.

The issue is not with older hardware this is ancient hardware on machines that are well over a decade old. If the owners of such hardware have not come up with a plan for migrating off said hardware at the end of the RHEL7 life cycle then giving them extra time is not going to help as these are people with their heads buried in the sand to begin with. They should have had a plan to get off unsupported hardware long before the Alma announcement of support for extra hardware. They should have anticipated that their hardware would be unsupported for five years at this point. How much more time do you need?

Also like I pointed out they can move to Oracle Linux with their UEK kernel if they want for some frankly insane reason to continue to use such ancient hardware in production. I would note with Oracle you get the UEK with a lot more hardware enabled than Alma is doing or you still get the RHEL compatible kernel at your discretion. With Alma you get the worst of both worlds, a tiny amount of extra hard which might go at a moment’s notice to maintain compatibility. Not worth it and frankly not thought through properly.

I would use the TrueScale Infiniband driver to highlight how this may well play out. The qib driver was supported in elrepo for a while, but they had to drop it after 8.2 because it was impossible to maintain. The reason our undergraduate teaching cluster had all the Infiniband cards replaced with ConnectX-3 cards at the beginning of the year. I knew it was going to be an issue, and I had a plan to deal with it for a considerable time. It didn’t just come unexpectedly out of left field in April of this year. Crutches for incompetent system administrators who didn’t plan is a dumb idea.

Finally, to address the issue of money, I work in a University we don’t have unlimited budgets. I am using old Dell Powerconnect switches for my management network and replacing the fans myself by buying them from Digikey and crimping the connectors on. If I can manage to replace all my hardware that is not compatible with RHEL8 in good order then so can other people and for those that have zero budget then Oracle Linux is a much better solution.

I’m a bit at loss here.

Additional hardware support, drivers, are kernel modules, aren’t they?
Building one new kernel module should not affect other kernel modules, should it?
AlmaLinux building a module is no different from, say NVidia building a module.
Except, drivers on Alma install media allow installation without DUDs. That is a convenience.

RHEL 8 is already in “Maintenance Support Phase”. No more feature changes to kernel should appear, only critical fixes. Hence, less breaking of third-party modules by kernel changes (like that qib?). The maintenance phase also implies that you install el8 now only if you cannot install something else. Presumably.

You are presuming that additional kernel modules won’t impact other parts of the kernel. As I have tried to demonstrate yes it can. Seemingly innocent changes to the kernel can break compatibility. Also it is not just about AlmaLinux 8 they have added the same hardware back into AlmaLinux 9.

Again like I have said many times if you are a lazy incompetent system administrator and have spent the last five years with your eyes closed to the fact that RHEL8+ comes with less hardware support and have not put in place a plan to deal with the end of life of RHEL7 then you can either cough up and pay TuxCare for ELS or hold your nose and install Oracle Linux with their UEK and extensive additional hardware support.

A tiny amount of extra hardware has been added with the almost certain outcome of breaking compatibility at some point down the line where the choice is either kick out the extra hardware with no notice or break compatibility with RHEL.

Clearly those making this decision are not in the habit of relying on source as well as binary compatibility with RHEL. Concluded making changes to the kernel source would not be a problem so they could add in some extra hardware.

At another level it is all a bit dumb when you look at what the extra hardware support means in practical terms. So you are supporting a Dell Perc H200 SAS card in an over decade old PowerEdge R710 in AlmaLinux 9, but the onboard Broadcom Ethernet NIC’s are not supported so not terribly useful. Noting that the Perc H700 which is included if not supported only costs $20 on eBay. Similar with the Adaptec RAID cards, so now I can install AlmaLinux 9 on my 14 year old Sun X4170’s but again the onboard Broadcom Ethernet cards are not supported. That’s as useful as a chocolate teapot and it could all be snatched away in an instant if supporting them breaks compatibility. All this is for hardware that is really questionable to use in a production environment.

I guess I am not best pleased as a competent system administrator having done the work to make sure my hardware is supported in RHEL8 and RHEL9 over the last five years that at the last minute Alma come along and risk compatibility for a handful of incompetent system administrators who have not done their job over the last five years. Why should I be taking a risk for their incompetence in not addressing the issue?

1 Like

I do believe you, even though I keep thinking of a scenario where ELRepo builds a kernel module that I don’t install nor use, and that somehow affects my system …

Yes, modules that are in use (and cannot be blacklisted) obviously do something.

The potential for issues are illustrated with the Truescale Infiniband driver (qib). There was an elrepo driver that supported up to 8.2 but was then dropped because it could not be built for 8.3 kernels. I don’t can’t remember the exact details but there was some incompatibility that made it impossible to support going forward.

However, the qib driver works fine with the Oracle Linux UEK kernel in Oracle Linux 8 and 9 and all the current Ubuntu LTS kernels. So it’s not that the driver doesn’t work in later kernels, just that the RHEL kernel deviates from the mainline kernel enough to prevent compilation. Now you might be able to change the qib driver to make it compile but that is a whole new can of worms.

So we get a scenario down the line where the RHEL kernel has deviated from the mainline kernel just enough that some of the extra hardware requires the AlmaLinux kernel to follow the mainline kernel rather than the RHEL kernel to continue to work.

I also demonstrated how minor changes even within a point release for security fixes is able to break the compilation of the GPFS/StorageScale kernel module.

If you are unable to envisage a scenario where these two things combine and cause problems then you have a limited imagination IMHO.

I would be willing to bet that these issues were not considered when the decision was made because unless you are at the sharp end of needing 1:1 kernel compatibility for additional binary and source code drivers it would likely not occur to you that deviating from the RHEL kernel could be problematic in the future. I would note that Oracle clearly appreciates this which is why they have a completely separate kernel to support extra hardware.

Hi @jabuzzard! I appreciate your insight and concerns you have raised. I am the lead of the AlmaLinux HPC and AI SIG. One idea that we have discussed is having an HPC-focused kernel in AlmaLinux, e.g. kernel-hpc, with tuning for HPC environments, e.g. 64K page size, Deadline as a scheduler, and other kernel optimizations from the hpc-compute TuneD profile. Would that be of interest to you? Would you have any concerns about that? What if this kernel-hpc omitted the recently introduced legacy hardware support to ensure 1:1 kernel compatibility for additional binary and source code drivers, like NVIDIA and GPFS? I welcome your thoughts.

I know this is not a main point in the thread but since the ib_qib driver was mentioned … The kmod-ib_qib package is now available for EL 8.10. Please see 0001459: Request for kmod ib-qib for InfiniBand: QLogic Corp. IBA7220 InfiniBand HCA for RHEL 8.10 - ELRepo Bugs for more info.

I have been giving it some thought, and it occurs to me that anyone in the HPC community who cares about source code level compatibility will likely have access to a RHEL subscription. That is almost certainly true for anyone running GPFS. If they don’t, springing for a RHEL subscription is not exactly painful.

How about a mechanism to support such users deploying kernel RPM packages they have compiled up from the RHEL SRPM’s which they legitimately have access to and given they would not be distributing are in compliance with RedHat’s license terms?

Yes but that is way too late to be useful if you are a competent system administrator. Support got added back in elrepo a handful of weeks before CentOS 7 goes EOL. So being a competent system administrator I replaced all the TrueScale adaptors with ConnectX-3 months ago, costing on average $40 a node. As I mentioned previously I work at a University we have finite and limited budgets and my management network uses old hand me down Dell Powerconnect switches for which I keep going by buying replacement fans from Digikey cutting the leads to length for the relevant failed fan and crimping the connectors on. I would love to replace them with something newer, but a single replacement management switch would cost more than replacing all the Infiniband cards in the undergraduate teaching HPC cluster.