Set up software RAID on existing AlmaLinux 9 LVM

I am trying to set up a software RAID on an existing AlmaLinux 9 LVM OS. Following instructions I found here (for Debian), I run into trouble when I get to the mdadm --create command.

As per the instructions, I’ve copied the partition set up from the original to a blank drive, and made the new drive’s LVM Linux partitions “raid autodetect” using “fdisk /dev/sdh” command.

The next step is to create new degraded RAID arrays, using mdadm --create /dev/md0 --level=1 --raid-devices=2 missing /dev/sdh1 for the two partitions (md0 and md1 for sdh1 and sbh2 respectively).

However, at this point, I get this:

mdadm: partition table exists on /dev/sdh1
mdadm: partition table exists on /dev/sdh1 but will be lost or meaningless after creating array
mdadm: Note: this array has metadata at the start and may not be suitable as a boot device.  If you plan to store '/boot' on this device please ensure that your boot-loader understands md/v1.x metadata, or use
Continue creating array?

… at which point I answer no.

Here’s the lsblk for the disk with the live OS (sdb):

sdb                  8:16   0 465.8G  0 disk 
├─sdb1               8:17   0     1G  0 part /boot
└─sdb2               8:18   0 464.8G  0 part 
  ├─almalinux-root 253:0    0    70G  0 lvm  /
  ├─almalinux-swap 253:1    0   4.4G  0 lvm  [SWAP]
  └─almalinux-home 253:2    0 390.3G  0 lvm  /home

I copied that to the sdh disk, so it had the same format when I was trying the above. I am leaving the sdh out of the lsblk response for now, because I had reformatted that disk as I was trying different things.

What should I do? Thanks for any help.

Lets recap. You have:

  • Two drives, sdb and sdh
  • The sdb has two partitions
  • The sdh was empty
  • You created two partitions on sdh
  • mdadm sees a “partition table” within the sdh1 that is the first partition in sdh

The sdh was not “clean empty”, if mdadm detects bogus “data” on it.
One could dd if=/dev/zero of=/dev/sdh to zero it out completely, or
dd if=/dev/zero of=/dev/sdh1 to zero just the first partition.
One can also add options bs=... count=... to ddto write to only X MB from the start.

However, it does not matter what is on “empty” partitions, since we will overwrite (with mdadm, pvcreate, mkfs).

The option --metadata=0.90 is required for the sdh1 because otherwise the beginning of the partition will not look like “regular partition with a filesystem”, which the legacy BIOS boot mode does depend on.

The AL9 default is to have XFS for /boot, /, and /home. One cannot shrink an existing XFS filesystem.

The partitions on sdh are same size as on sdb? The RAID metadata takes a bit of space. There is thus a bit less for filesystems. Luckily, you do have a swap partition that can be removed and replaced with a bit smaller. That does not apply to the /boot.

Great explanation, thank you. I think I’m starting to grasp this. I did wipe the disk, using cat /dev/zero > /dev/sdh as recommended by someone else. Seems to have worked.

A couple questions:

Where does the option --metadata=0.90 go for sdh1? In the mdadm command? If so, at the end?

How do I remove that partition, which is part of the LVM setup? It shows only on the sdb disk. Even after copying the partitions to sdh, it doesn’t show there in the lsblk response:

sdb                  8:16   0 465.8G  0 disk 
├─sdb1               8:17   0     1G  0 part /boot
└─sdb2               8:18   0 464.8G  0 part 
  ├─almalinux-root 253:0    0    70G  0 lvm  /
  ├─almalinux-swap 253:1    0   4.4G  0 lvm  [SWAP]
  └─almalinux-home 253:2    0 390.3G  0 lvm  /home
sdh                  8:112  0 465.8G  0 disk 
├─sdh1               8:113  0     1G  0 part 
└─sdh2               8:114  0 464.8G  0 part 

The fdisk -l command does show that the copied sdh2 partition is of the LVM type, however:

Device     Boot   Start       End   Sectors   Size Id Type
/dev/sdh1  *       2048   2099199   2097152     1G 83 Linux
/dev/sdh2       2099200 976773119 974673920 464.8G 8e Linux LVM

So, since an XFS filesystem can’t be shrunk, and yet we’re trying to create the same size partitions for the eventual copying of the original disk’s OS, would it be better instead of copying the partition format to just manually create partitions of the same size, except for making one of them a bit smaller? And, if so, would it be better to make the sdh1 partition smaller, since it’s not LVM? If so, how much smaller? And what to do with the extra space? Create another partition for the RAID metadata, or what?

You can read man mdadm for the syntax of the command. Perhaps:

mdadm --create /dev/md0 --metadata=0.90 --level=1 --raid-devices=2 missing /dev/sdh1

I don’t think that you did “copy” as in “copy contents”, but did “define similar partitions” on the second drive. Partitions are listed as “first sector, last sector, type” values in partition table.

If you do follow the instructions, then you will make the sdh2 partition a LVM physical volume (PV) and after that add it to the LVM volume group (VG) “almalinux”. When the VG has more space, you can move logical volume (LV) – change where on the drives the data of that LV is, with pvmove.

The LV “swap” of VG “almalinux” is in use. The /etc/fstab has entry for it. One way to stop the use of swap is to first comment the line (insert # at start of line) in /etc/fstab and then use command swapoff (see man swapoff) or reboot.

When a filesystem or swap partition is not in use, it can be removed (see man lvremove).

To create a new swap on LV, one probably needs commands lvcreate, mkswap, and blkid.

Thanks so much, I think I’m making progress, but still running into problems. The ‘mdadm’ create command seems to have worked, however, the instructions say to run the mdadm --examine --scan >> /etc/mdadm/mdadm.conf command and look at the result. They say it should be something like:

ARRAY /dev/md0 level=raid1 num-devices=2 UUID=6fe4b95e:4ce1dcef:01b5209e:be9ff10a
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=aa4ab7bb:df7ddb72:01b5209e:be9ff10a

However, mine shows simply:

ARRAY /dev/md0 UUID=4e8ffb7c:a57ba210:e368bf24:bd0fce41
ARRAY /dev/md/1  metadata=1.2 UUID=b5393191:87c4bda6:158611dc:888b146e name=ubuntu:1

(“Ubuntu” because I am using an Ubuntu USB stick as my OS while I work on the drives. Should I change that name somehow?)

Then the instructions say to edit /etc/fstab, and change the original drive designation to match the md0, ie, replace /dev/sdb1 with /dev/md0. It says the fstab file should look like:

proc            /proc           proc    defaults        0       0
/dev/mapper/debian-root /       ext3    defaults,errors=remount-ro 0       1
/dev/sdb1       /boot     ext3    defaults     0       2
/dev/mapper/debian-swap_1 none      swap    sw     0       0
/dev/hdc      /media/cdrom0   udf,iso9660 user,noauto     0    0
/dev/fd0       /media/floppy0  auto    rw,user,noauto  0   0

…however, mine only shows:

overlay / overlay rw 0 0
tmpfs /tmp tmpfs nosuid,nodev 0 0

No ‘mapper’ line or anything. Should I just replace the fstab file with the content they show (replacing dev/sdb1 with /dev/md0), or what? If I should paste it in, should I leave the lines that are there?

They also say to edit the /etc/mtab file, but it does not look anything like what they show either.

Also, before all that, it says to create a filesystem (they say ext3, but I used xfs) on the non-LVM RAID array /dev/md0 with mkfs.xfs /dev/md0, but for the md1, to just use pvcreate to make it LVM. It doesn’t say anything about creating a filesystem for it first, but should I, since it also needs to be xfs (I think)? Perhaps that is the cause of the discrepancies described above?

So, I’m confused as to how to proceed. As always, any help would be much appreciated. Thanks again!

Why? All the work should be possible from the OS that is in the drive(s).

You should generate the /etc/mdadm/mdadm.conf and modify the /etc/fstab that
are in the filesystem that is within almalinux-root LV. You have booted Ubuntu stick and the /etc that you therefore see is in (or from) the Ubuntu stick. That is not the /etc/ that you are looking for.

The AlmaLinux installer image (USB stick) does have – in addition to install options – a troubleshoot / rescue option. When booting to that, the stick does scan the drives for possible installs (of AlmaLinux) from drives and (if found) mounts one into subdirectory. It is possible then to chroot into the mount so that the visible /etc is that within the drive, not the one on the stick that one booted with.

As said, such rescue should not be necessary here.

Plain filesystem in partition and LVM are very different.

With the sdb1, sdh1the idea is apparently (I did not read the "guide" -- the 'ext3' already shows that it is old) to create a filesystem into RAID1 array and *copy files* from current/bootinto it. Then use the new filesystem as (mount to) /boot. Finally, make the array store its data to the partitions sdb1andsdh1`.

The LVM part is something else. An LV is like “partition”. It has a filesystem. Filesystem has files. An LV does use some “extents” of a VG. It is possible to move an LV dynamically to use different extents of the VG. Extents that are on different physical drive. One does not need to create filesystem, because one already has one and files will not be copied to different filesystem.

Obligatory reminder: do have a backup of your data, because the operations that you do (well, everything that we do with computer) can fail and corrupt the data.

Thanks again for your patient help! The reason I tried using the Ubuntu stick was because in my bumbling, I messed up the bootloader or MBR and could not boot to the Almalinux disk.

So this time, using the Ubuntu stick OS again, I mounted /dev/almalinux/root, going to /etc there and doing the command like so:
mdadm --examine --scan >> /mnt/almalinux/etc/mdadm/mdadm.conf

That seems to have worked, and I made the edit to the fstab and mtab files as directed, however the mtab was symlinked, so I edited mtab~ instead, hopefully that is correct.

Next, the instructions get into editing the grub, but as you say, it’s old, and it seems they’re referring to the old grub, and not grub2. So the instructions don’t seem to apply.

How can I fix the grub2 so I can boot into the almalinux OS? Could the edit I did to fstab work by itself, since grub2 creates its grub.cfg automatically?

Also, once that’s done, the instructions get into the moving of data to the RAID array. Can that also be done while booted into the Almalinux OS, or should that be done from a boot stick? My instincts say the latter, but what do you think? The instructions say:

To move the contents of our LVM partition /dev/sda5 to our LVM RAID array /dev/md1, we use the pvmove command:
pvmove /dev/sda5 /dev/md1

Of course, I would use my own relevant “/dev/sdX” info.

Lastly, you mentioned earlier about resizing the LV to allow room for the RAID metadata. You said:

The LV “swap” of VG “almalinux” is in use. The /etc/fstab has entry for it. One way to stop the use of swap is to first comment the line (insert # at start of line) in /etc/fstab and then use command swapoff (see man swapoff) or reboot.

When a filesystem or swap partition is not in use, it can be removed (see man lvremove).

To create a new swap on LV, one probably needs commands lvcreate, mkswap, and blkid.

So, when to do this? I edited fstab already for the md0 line, should I also make the edit you suggest now? And then I do man swapoff and lvremove to remove the almalinux-swap LV. Once that’s done, when do I create the new swap? After fixing grub2 and rebooting? If I do it before, how much smaller should I make it? Right now it’s 4.4G.

Thanks so much for all your help!

Please ignore my previous post, as I’ve changed my mind. I’ve had so much trouble with this that I’ve decided to just start over and reinstall AlmaLinux. But I’d like to know the best way to install the OS in a RAID1 environment.

I had gotten most of the way with setting it up as a website server, so I didn’t want to have to start over, but it seems that I must at this point.

I have seen comments in this forum saying you can and should set up RAID at the same time you install. It said you could choose it from a pulldown menu during installation, when there were choices about partitioning presented, or something like that – it was pretty vague. I have not been able to find any detailed instructions on this forum or anywhere on this site about it. I don’t see that option as I am reinstalling AlmaLinux. It seems like a common need, installing with a software RAID setup, yet it is not well-addressed, that I can see.

So, can you tell me the best way to install AlmaLinux with a software RAID1 setup? Should I start a new thread about this? Thanks again.

Chapter 4. Customizing your installation Red Hat Enterprise Linux 9 | Red Hat Customer Portal particularly subsection 4.6.6.

Thanks! More questions:

In Manual Partitioning, which partitions should be set as RAID1? The documentation doesn’t really give a hint.

The Boot partition should be kept as a Standard Partition, correct? That’s the default setting. So it would not be part of the RAID1 setup, correct?

The System, Data (Home) and Home LVM partitions would all be included in the RAID1, correct? I’ve read conflicting opinions on including the Swap partition, though… What do you think?

Also, I was wondering what you’d recommend for Software Selection/Base Environment. This will be for a website server, so I’m thinking probably just plain “Server” rather than “Server with GUI”. The GUI would be nice, but there is a warning in the Security Profile section saying it decreases security. I do not have a Security Profile chosen, however, as the documentation says it’s only necessary if “a specific policy is mandated by your organization rules or government regulations.”

The default (legacy) setup on one drive is two partitions: one for /boot and another for LVM volumes.
The default (UEFI) setup on one drive is three partitions: /boot, /boot/efi, and third for LVM volumes.

The Debian guide was aiming for two RAID1 arrays: one for /boot and another for LVM.
One partition from each drive is used for array, so each drive has two partitions.

The /boot filesystem is in RAID1 array because then each drive have a copy of it. If the only drive with /boot would break, then system cannot boot even if the rest of data is still intact.
The RAID1 array that does host the /boot must use RAID metadata version 0.90.

One can have swap outside of the LVM, but that just makes the setup more complex.

These days I do use “Minimal”, because I do have configuration management system set up that install package selections, but “Server” sounds like a decent start. One can always add or remove packages later.

I don’t use security profiles.

Okay, I installed before seeing your answer, using the Custom partition, set to RAID. I didn’t see a choice for both LVM and RAID, so I assumed the RAID choice included LVM, since that was the default for AlmaLinux. But it doesn’t. So now I’m totally confused as to how to install with both.

I’ll have to reinstall anyway, since I didn’t RAID the /boot.

Also, here’s what fdisk -l showed. It is confusing, because the drives are different, once with 5 partitions, the other with 3. One extra is the /boot partition, since I didn’t raid it. But why the 5th? It’s shown as “W95 Ext’d (LBA)” – what the heck is that?

Disk /dev/sda: 465.76 GiB, 500107862016 bytes, 976773168 sectors
Disk model: WDC WD5003ABYX-0
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0xd62dfe00

Device Boot Start End Sectors Size Id Type
/dev/sda1 * 2048 2099199 2097152 1G 83 Linux
/dev/sda2 2099200 898719743 896620544 427.5G fd Linux raid autodetect
/dev/sda3 898719744 972113919 73394176 35G fd Linux raid autodetect
/dev/sda4 972113920 976773119 4659200 2.2G f W95 Ext’d (LBA)
/dev/sda5 972115968 976773119 4657152 2.2G fd Linux raid autodetect

Disk /dev/sdb: 465.76 GiB, 500107862016 bytes, 976773168 sectors
Disk model: WDC WD5003ABYX-0
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x6b0cb481

Device Boot Start End Sectors Size Id Type
/dev/sdb1 2048 896622591 896620544 427.5G fd Linux raid autodetect
/dev/sdb2 896622592 970018815 73396224 35G fd Linux raid autodetect
/dev/sdb3 970018816 974678015 4659200 2.2G fd Linux raid autodetect

mdstat shows this:

Personalities : [raid1]
md125 : active raid1 sda2[1] sdb1[0]
448178176 blocks super 1.2 [2/2] [UU]
bitmap: 0/4 pages [0KB], 65536KB chunk

md126 : active raid1 sda3[1] sdb2[0]
36663296 blocks super 1.2 [2/2] [UU]
bitmap: 1/1 pages [4KB], 65536KB chunk

md127 : active raid1 sda5[1] sdb3[0]
2325504 blocks super 1.2 [2/2] [UU]

The partition numbers from each drive are different in the same array, which also seems odd to me. Eg, in md125, sda2[1] is paired with sdb1[0]. Shouldn’t it be sda1 and sdb1?

So my main question is how to install as LVM & RAID together?

And, you said " The RAID1 array that does host the /boot must use RAID metadata version 0.90" – how do you set that in the installation process?

Also, before I reinstall, do I need to wipe the disks again?

Thanks again so much for your patient help.

Okay, now I see the option after picking LVM to add RAID with the Modify button. I removed and zeroed previous mdadm from disks and dd wiped them. Reinstalled and everything looks good!

Except I forgot about specifying “–metadata=0.90” for the /boot partition. Can I change that after the fact? It’s booting okay, but after the initial “Welcome to GRUB” line (or whatever it says), there’s an error listed, but it goes by too fast to read.

Do you think it’s safe to try something like this? If I have to reinstall again, it’s not too big a deal, but would rather not:

$ mdadm --zero-superblock /dev/sd[a-z]2
$ mdadm --create /dev/md126 --assume-clean \
    --level=1 --raid-devices=2 /dev/sd[a-z]2 --metadata=0.90

(In this example, I’m using sd[a-z]2 because those are the partitions marked with boot * in fdisk -l, and md126 because that is the smaller RAID div, so I assume it’s for \boot.)

Also, it’s not connecting to the internet. When I used the GUI version before, just moving the device’s “.ucode” file into /lib/firmware allowed it to connect. I’ve done that via the command line, but it still doesn’t connect when rebooting. How can I fix that? I can’t update anything without a connection.

I have a question about backing up the system. I’ve googled around and it’s confusing.

Can I just dd copy one of the drives? Or do I need two more drives to properly backup the whole RAID system and data?

Can I clone one of the drives to have as a backup?

To keep a regular updated backup, should I use special software, or can scripts be set up to handle this?

If you can point me to a good guide for keeping a current backup for a RAID system, that would be great.

Thanks again for all your help!

If it boots, then you should be ok. You can with some mdadm command examine the array and see its current metadata format. Actually, the mdstat has that detail too?

That is a question about network config. Some views to it you can see with:

nmcli d s
nmcli c s

The dd is a low level operation. It does not know anything about “files”.
One could make a clone of entire drive /dev/sda, of partition /dev/sda1, of content of RAID array /dev/md126, of content of LV /dev/mapper/almalinux-home or just some bytes.
While that is relatively simple, it is also a “blunt” approach.

A system has basically three things:

  • Software
  • Config
  • Data

Ideally, the software is simply the (RPM) packages that have been installed. If one has a list of packages, then one can quite trivially reinstall all those packages. The list can be considered a part of config.

The config is the changes that you made to the system. E.g. created user account, added Apache config (into /etc/httpd/conf.d/), certificate for HTTPS, and so on.

RHEL and AlmaLinux do have package ansible-core that provides Ansible configuration management system. Red Hat documents use of Ansible System Roles for doing the changes (also installation of packages) in automated/repeatable fashion. The main point is that the input for Ansible is mainly small text (YAML) files. It is easy to have a copy of them outside of the system.

That leaves data. The things you have in home directory, and probably somewhere like /var/www. It is easier to just copy the files from those locations.

I can’t point to guides.

RAID system is nothing special. It is merely a bit different “disk”.

A quick note on RAID systems: ensure that you are using spindles designed for the purpose. Cheaper, shingled, spindles (such as the Barracuda range) will give poor performance and IME start to fail after about 6 months.

The drives are new (unused) 500GB Western Digital RE4, mdl: SD5003ABYX, 7200 RPM 64MB Cache SATA 3.0Gb/s. Do you think they are okay for RAID?

Also, I’ve read conflicting info about using SSD drives instead. Are they a good choice for RAID? Some suggest they even negate the need for a RAID1 setup.

WD markets RE4 “for RAID”, so they are presumably ok.

RAID1 is a convenience.
If you don’t have RAID and the drive breaks, then you have to restore from backups.
If one “leg” of RAID1 does break, then you should be able to merely replace the broken drive without need to restore all data from backup.

Every drive, both HDD and SSD, can break.

I’ve had a quick look at Disktech and the manufaturer’s data sheet. I’d agree with jlehtone they appear to be being sold for RAID, NAS and NSA so one hopes (but doesn’t guarantee) that they’ll be OK. See Wikipedia for more details.

Just an update: got the wireless working. Main problem was the gateway wasn’t configured correctly.

Everything seems fine at this point. Thank you again so much for all your help!