As a longtime Centos 6/7 user, I’m trying to install Alma 9.0 on a slightly older motherboard.
My goal is to have two drives in raid-1 (mirroring), and be able to
automaticly boot from either drive alone into a running system with
degraded raid arrays. Then in case of drive failure, I just install
another, clone the partitioning, and rebuild the raid-1 arrays.
The system boots and runs nicely, but I’m trying to test that this system will
boot properly with one failed drive by alternately disconnecting one
drive’s sata cable, and then the other.
With either drive’s sata cable disconnected, the system wil get
through a grub menu and load the Alma-linux kernel, but then the fun
begins. The initrd code waits a long timeout, then asks for the root
password and enters dracut emergency mode.
Is it correct to expect an Alma 9 system with a failed or removed raid
member to boot properly into a running system with a all raid volumes
degraded? I seem to recall this is how things worked in centos 6 and
7.
Lots more details:
Advantech AIM-584 industrial MATX motherboard; bios boot only (no EFI)
intel i5-4590S CPU
two 6 TB sata drives.
I Re-used GPT partitioning and mdadm raid setup from a previous
install of some recent fedora release, which looks like this:
$ sudo fdisk -l /dev/sda
Disk /dev/sda: 5.46 TiB, 6001175126016 bytes, 11721045168 sectors
Disk model: WDC WD6003FFBX-6
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 4E1B36B5-BA01-3E4A-8403-9A56E7F1AEE7
Device Start End Sectors Size Type
/dev/sda1 2048 6143 4096 2M BIOS boot
/dev/sda2 4196352 8390655 4194304 2G Linux RAID
/dev/sda3 8390656 218105855 209715200 100G Linux RAID
/dev/sda4 218105856 427821055 209715200 100G Linux RAID
/dev/sda5 427821056 532678655 104857600 50G Linux RAID
/dev/sda6 532678656 11721043967 11188365312 5.2T Linux RAID
$ sudo fdisk -l /dev/sdb
Disk /dev/sdb: 5.46 TiB, 6001175126016 bytes, 11721045168 sectors
Disk model: WDC WD6003FFBX-6
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 4E1B36B5-BA01-3E4A-8403-9A56E7F1AEE7
Device Start End Sectors Size Type
/dev/sdb1 2048 6143 4096 2M BIOS boot
/dev/sdb2 4196352 8390655 4194304 2G Linux RAID
/dev/sdb3 8390656 218105855 209715200 100G Linux RAID
/dev/sdb4 218105856 427821055 209715200 100G Linux RAID
/dev/sdb5 427821056 532678655 104857600 50G Linux RAID
/dev/sdb6 532678656 11721043967 11188365312 5.2T Linux RAID
$ sudo lsblk -l -o +UUID
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS UUID
sda 8:0 0 5.5T 0 disk
sda1 8:1 0 2M 0 part
sda2 8:2 0 2G 0 part 910b4fa4-c2a0-a6d6-4877-9d82bcc126ff
sda3 8:3 0 100G 0 part b65bd4f9-dd9c-eec3-c38a-fc80d0dde46a
sda4 8:4 0 100G 0 part 746bd41e-bce4-fb78-221a-a4018b91fe69
sda5 8:5 0 50G 0 part 79402428-ab74-b67c-e93e-57fa3095fa81
sda6 8:6 0 5.2T 0 part 44915d0b-d025-58c7-8295-cbc587bb917f
sdb 8:16 0 5.5T 0 disk
sdb1 8:17 0 2M 0 part
sdb2 8:18 0 2G 0 part 910b4fa4-c2a0-a6d6-4877-9d82bcc126ff
sdb3 8:19 0 100G 0 part b65bd4f9-dd9c-eec3-c38a-fc80d0dde46a
sdb4 8:20 0 100G 0 part 746bd41e-bce4-fb78-221a-a4018b91fe69
sdb5 8:21 0 50G 0 part 79402428-ab74-b67c-e93e-57fa3095fa81
sdb6 8:22 0 5.2T 0 part 44915d0b-d025-58c7-8295-cbc587bb917f
md0 9:0 0 5.2T 0 raid1 /home 5efdab1f-6bbf-4d76-bec6-bab2f2a02190
md1 9:1 0 2G 0 raid1 /boot e0d6518f-60dd-48dd-aba5-450923e2ce0f
md2 9:2 0 99.9G 0 raid1 / 8870ab9b-c86f-423c-9934-73bcb9822bca
md5 9:5 0 50G 0 raid1 4e19527a-fc5b-4d72-a15f-b639a1a1bf1d
md127 9:127 0 99.9G 0 raid1 813f4a49-a94e-474e-84e5-fddc0a509361
While installing Alma9 using the usual anaconda, I picked the
mountpoints above from the custom-partitioning menu, much like I’ve
done in the past when reinstalling for example centos 7 over centos 6 while preserving /home.
I’ve done the traditional redundant boot thing of placing a biosboot partition on both drives,
and installing grub onto both drives:
$ sudo /sbin/grub2-install /dev/sda
$ sudo /sbin/grub2-install /dev/sdb
Again, all is normal with both drives in system.
With either drive’s sata cable disconnected, the system wil get
through a grub menu and load the linux kernel, but the initrd-dracut
waits a long timeout, then asks for the root password and enters
emergency mode.
The error messages are about being unable to assemble the raid arrays.
A normal boot has these volumes mounted (the spare partitions are ignored) (from “df”):
/dev/md2 102559672 3148376 94155360 4% /
/dev/md1 2022248 262048 1639112 14% /boot
/dev/md0 5548807328 44030456 5225057960 1% /home
which is exactly what I expect, seeing as /etc/fstab contains:
UUID=8870ab9b-c86f-423c-9934-73bcb9822bca / ext4 defaults 1 1
UUID=e0d6518f-60dd-48dd-aba5-450923e2ce0f /boot ext4 defaults 1 2
UUID=5efdab1f-6bbf-4d76-bec6-bab2f2a02190 /home ext4 defaults 1 2
/etc/mdadm.conf on the normal root partition contains:
# cat /etc/mdadm.conf
# mdadm.conf written out by anaconda
MAILADDR root
AUTO +imsm +1.x -all
ARRAY /dev/md/0 level=raid1 num-devices=2 UUID=44915d0b:d02558c7:8295cbc5:87bb917f
ARRAY /dev/md/1 level=raid1 num-devices=2 UUID=910b4fa4:c2a0a6d6:48779d82:bcc126ff
ARRAY /dev/md/2 level=raid1 num-devices=2 UUID=b65bd4f9:dd9ceec3:c38afc80:d0dde46a
ARRAY /dev/md/5 level=raid1 num-devices=2 UUID=79402428:ab74b67c:e93e57fa:3095fa81
From the dracut emergency shell, I can get the large /home raid1 array on md0 to assemble and become mountable with:
mdadm --assemble --scan --verbose
but I’m unable to get / and /boot assembled; those md devices stay “inactive” in /proc/mdstat:
Personalities : [raid1]
md0 : active raid1 sda6[1]
5594050560 blocks super 1.2 [2/1] [_U]
bitmap: 1/42 pages [4KB], 65536KB chunk
md2 : inactive sda3[1](S)
104791040 blocks super 1.2
md1 : inactive sda2[1](S)
2094080 blocks super 1.2
md5 : inactive sda5[1](S)
52395008 blocks super 1.2
Part of the output from running “mdadm --assemble --scan --verbose” a second time is:
mdadm: looking for devices for /dev/md/2
mdadm: no recogniseable superblock on /dev/md/0
mdadm: no recogniseable superblock on /dev/sdb2
mdadm: no recogniseable superblock on /dev/sdb1
mdadm: Cannot assemble mbr metadata on /dev/sdb
mdadm: /dev/sda6 has wrong uuid.
mdadm: /dev/sda5 has wrong uuid.
mdadm: /dev/sda4 has wrong uuid.
mdadm: /dev/sda3 is busy - skipping
mdadm: /dev/sda2 has wrong uuid.
mdadm: no recogniseable superblock on /dev/sda1
mdadm: Cannot assemble mbr metadata on /dev/sda
“/dev/sda2 has wrong uuid.
” is suspicious, that should be part of /dev/md1 /boot
Other things possibly of interest… when booted successfully with both drives:
# cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=UUID=4e19527a-fc5b-4d72-a15f-b639a1a1bf1d rd.md.uuid=b65bd4f9:dd9ceec3:c38afc80:d0dde46a rd.md.uuid=910b4fa4:c2a0a6d6:48779d82:bcc126ff rd.md.uuid=79402428:ab74b67c:e93e57fa:3095fa81 8250.nr_uarts=12"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true
# cat /proc/cmdline
BOOT_IMAGE=(mduuid/910b4fa4c2a0a6d648779d82bcc126ff)/vmlinuz-5.14.0-70.26.1.el9_0.x86_64 root=UUID=8870ab9b-c86f-423c-9934-73bcb9822bca ro crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=UUID=4e19527a-fc5b-4d72-a15f-b639a1a1bf1d rd.md.uuid=b65bd4f9:dd9ceec3:c38afc80:d0dde46a rd.md.uuid=910b4fa4:c2a0a6d6:48779d82:bcc126ff rd.md.uuid=79402428:ab74b67c:e93e57fa:3095fa81 8250.nr_uarts=12
I’d like to attach the rdsosreport.txt found in dracut emergency mode, but the upload doesn’t seem to accept .txt I guess I can paste it into a followup.
Thanks for any suggestions!
My next step might be to wipe most of the partitions and reinstall,
painfully doing similar partioning manually in the installation gui.
But I’m hoping that instead there’s a chance to learn somthing by
fixing this almost-working install. Might even turn out to be a bug
to report.