Discussion of kernel changes in HPC

Well, here we are 11 months later, and it looks like I was right. Or at the very least, building the GPFS 5.2.2-1 kernel module on RHEL 8.10 with kernel 4.18.0-553.50.1 works. However, on AlmaLinux 8.10 with kernel 4.18.0-553.50.1 it bombs out with

  CC [M]  /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o
In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:61,
                 from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:54:
/usr/lpp/mmfs/src/gpl-linux/kx.c:3041:27: error: field 'nbarg' has incomplete type
   struct noobaa_arg       nbarg;
                           ^~~~~
In file included from /usr/lpp/mmfs/src/gpl-linux/cfiles.c:61,
                 from /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.c:54:
/usr/lpp/mmfs/src/gpl-linux/kx.c: In function 'kxGanesha':
/usr/lpp/mmfs/src/gpl-linux/kx.c:8399:13: error: 'OPENHANDLE_REGISTER_NOOBAA' undeclared (first use in this function); did you mean 'OPENHANDLE_GET_NODEID'?
   if (op == OPENHANDLE_REGISTER_NOOBAA)
             ^~~~~~~~~~~~~~~~~~~~~~~~~~
             OPENHANDLE_GET_NODEID
/usr/lpp/mmfs/src/gpl-linux/kx.c:8399:13: note: each undeclared identifier is reported only once for each function it appears in
/usr/lpp/mmfs/src/gpl-linux/kx.c:8405:59: error: invalid application of 'sizeof' to incomplete type 'struct noobaa_arg'
       if (copy_from_user(&args.nbarg, (void *)arg, sizeof(struct noobaa_arg)))
                                                           ^~~~~~
make[3]: *** [scripts/Makefile.build:318: /usr/lpp/mmfs/src/gpl-linux/cfiles_cust.o] Error 1
make[2]: *** [Makefile:1619: _module_/usr/lpp/mmfs/src/gpl-linux] Error 2
make[2]: Leaving directory '/usr/src/kernels/4.18.0-553.50.1.el8_10.x86_64'
make[1]: *** [makefile:149: modules] Error 1
make[1]: Leaving directory '/usr/lpp/mmfs/src/gpl-linux'
make: *** [makefile:145: Modules] Error 1

I will try building with the equivalent Rocky kernel on Monday. Still, at a minimum, AlmaLinux no longer has source code compatibility with the kernel, which, as I tried and failed miserably to impress on people last year, is an important thing.

Thanks for the notification, @jabuzzard. As I said before, this absolutely should not be the case and yours is the first indication of any kind of problem. As with all big reports of any nature, we need details so that we can reproduce it and have an opportunity to investigate the problem. Would you be willing to file a proper bug report for this? If you’re not okay filing a report on bugs.almalinux.org, feel free to email me (benny at AlmaLinux dot org) and I’ll get it to the right place.

I have just tested Rocky 8.10 using the 4.18.0-553.50.1 kernel, and it compiles fine. Clearly it’s screwed up in AlmaLinux. I would note that the problem exists at least as far back as 4.18.0-553.22.1, which works fine with GPFS 5.1.9-6. However, I tried upgrading to GPFS 5.2.2-1 and it failed as above, so I upgraded to 4.18.0-553.50.1 as that is a tested kernel for GPFS 5.2.2-1 and got the same error as with 4.18.0-553.22.1

Now, the question is, how do you put out an updated kernel with exactly the same package names? Fixing it with a different package name is going to break more stuff, and fixing it with different kernel packages is a no-go, too. Note that I am already down a kernel version anyway for compatibility reasons.

The other question then becomes, as was raised last year, what if you have to ditch some of that additional hardware you added to maintain source code compatibility with the RHEL kernel?

It really does feel like people just didn’t listen. My entirely legitimate concerns, based on 33 years of experience in Linux, were ignored, and it now looks very much like exactly what I warned about last year has come to pass.

Anyway, it looks like I need an account to file a bug report in your Mantis system, and I can’t figure out how to do that. At least there is no create account option I can spot.

Most of our systems now require an AlmaLinux Account.