Discussion:
problem booting GRUB to mdadm RAID10 root
(too old to reply)
Robert Riches
2018-01-28 04:45:17 UTC
Permalink
Raw Message
(... 18-year Linux user, relative newbie to Slackware ...)

I have a new Slackware64 14.2 installation (a VM to be precise)
that won't boot from GRUB to a mdadm-style RAID10 root
filesystem. Essentially the same installation steps worked fine
for a VM without RAID. In both cases, during installation,
instead of installing LILO, I executed these commands (found on
the web) in the installer's console 2:

chroot /mnt
grub-install --force /dev/vda1
grub-mkconfig -o /boot/grub/grub.cfg

On both VMs, the GRUB menu comes up fine, and the kernel starts
booting. On the VM without RAID, it boots up and works
beautifully. However, on the VM with RAID, I get a kernel panic
with this last line (indentation mine, and typed from a
screenshot):

[ 1.403053] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

The stack trace right before that line shows these function
names:

dump_stack+...
panic+...
? printk+...
mount_block_root+...
mount_root+...
prepare_namespace+...
kernel_init_freeable+...
? rest_init+...
kernel_init+...
ret_from_fork+...
? rest_init+...

Thinking I might need an initrd, I booted the installer ISO,
manually mounted the installed system's / and /boot under /mnt
(and /mnt/boot, respectively), and did this:

mkinitrd -c -k 4.4.14 -f ext4 -m md_mod:raid10:ext4 -r /dev/md127 -R

There were no obvious error messages, and an initrd.gz file was
created. However, the file was empty.

Any suggestions?

Thanks,
--
Robert Riches
***@jacob21819.net
(Yes, that is one of my email addresses.)
Eef Hartman
2018-01-28 06:24:32 UTC
Permalink
Raw Message
Post by Robert Riches
grub-mkconfig -o /boot/grub/grub.cfg
On both VMs, the GRUB menu comes up fine, and the kernel starts
booting. On the VM without RAID, it boots up and works
As far as I know /boot must NOT be on a striped RAID volume and both
kernel as well as initrd have to be ON that volume (cq partition).
So RAID 1 (pure mirror) or a non-RAID partition are OK for /boot,
but grub cannot handle the striping of RAID 0 cq RAID 10 (btw:
I don't think lilo can either).
Post by Robert Riches
There were no obvious error messages, and an initrd.gz file was
created. However, the file was empty.
Again, I'm no expert, but at least the md modules and the /etc config
file FOR the RAID10 volume (root) must be on the initrd and grub must
be configured to load it.
In my working days (but that was with grub-legacy) we had the whole
root volume on RAID 1 and the rest (this was a 14 disk server) on RAID
5, 3 different volumes of 4 disks each, under which /home (which were
very fast, 15krpm, disks in a RAID 5 config).
But as I said, /boot was under /root which was a 2-disk RAID1 volume.
Pascal Hambourg
2018-01-28 07:12:46 UTC
Permalink
Raw Message
Post by Eef Hartman
As far as I know /boot must NOT be on a striped RAID volume and both
kernel as well as initrd have to be ON that volume (cq partition).
GRUB 2 can read striped RAID, so /boot on RAID 0 is ok.
This is shown by the fact that the GRUB menu shows up and the kernel is
loaded.

It is just an initrd/initramfs issue. / on a modern (1.x) Linux RAID
requires an initrd or initramfs because the kernel cannot auto-assemble
such an array itself, it needds mdadm (and usually the RAID drivers are
built as modules).
Robert Riches
2018-01-30 04:11:58 UTC
Permalink
Raw Message
Post by Pascal Hambourg
Post by Eef Hartman
As far as I know /boot must NOT be on a striped RAID volume and both
kernel as well as initrd have to be ON that volume (cq partition).
GRUB 2 can read striped RAID, so /boot on RAID 0 is ok.
This is shown by the fact that the GRUB menu shows up and the kernel is
loaded.
It is just an initrd/initramfs issue. / on a modern (1.x) Linux RAID
requires an initrd or initramfs because the kernel cannot auto-assemble
such an array itself, it needds mdadm (and usually the RAID drivers are
built as modules).
Eef and Pascal, thank you both for your replies!

By way of clarification, / is on RAID10, but /boot is on a plain,
non-RAID partition. That arrangement has worked fine with Mageia
1 and Debian 7/Wheezy. (Yes, I'm a two-time systemd refugee.)

By way of update, the empty initrd.gz file was a PEBCAK issue, my
fault. I had evidently shut down the VM without doing a sync,
which ran afoul of ext4 journaling the metadata but not the file
contents.

With a non-empty initrd.gz, the boot gets farther but complains
about modules exporting duplicate kernel symbols, which I figure
means I'm trying to boot a kernel that has some things built in
that are also in modules in the initrd. It looks like I simply
need to trim the set of modules in the initrd to exactly
complement the kernel with no duplication. It appears that
should get me either to a booted installation or to another layer
of the onion.

Thanks also for pointing out that mdadm must be in the initrd.
I'll need to remember to make sure it's in there.
--
Robert Riches
***@jacob21819.net
(Yes, that is one of my email addresses.)
Robert Riches
2018-02-02 04:52:05 UTC
Permalink
Raw Message
Post by Robert Riches
Post by Pascal Hambourg
Post by Eef Hartman
As far as I know /boot must NOT be on a striped RAID volume and both
kernel as well as initrd have to be ON that volume (cq partition).
GRUB 2 can read striped RAID, so /boot on RAID 0 is ok.
This is shown by the fact that the GRUB menu shows up and the kernel is
loaded.
It is just an initrd/initramfs issue. / on a modern (1.x) Linux RAID
requires an initrd or initramfs because the kernel cannot auto-assemble
such an array itself, it needds mdadm (and usually the RAID drivers are
built as modules).
Eef and Pascal, thank you both for your replies!
By way of clarification, / is on RAID10, but /boot is on a plain,
non-RAID partition. That arrangement has worked fine with Mageia
1 and Debian 7/Wheezy. (Yes, I'm a two-time systemd refugee.)
By way of update, the empty initrd.gz file was a PEBCAK issue, my
fault. I had evidently shut down the VM without doing a sync,
which ran afoul of ext4 journaling the metadata but not the file
contents.
With a non-empty initrd.gz, the boot gets farther but complains
about modules exporting duplicate kernel symbols, which I figure
means I'm trying to boot a kernel that has some things built in
that are also in modules in the initrd. It looks like I simply
need to trim the set of modules in the initrd to exactly
complement the kernel with no duplication. It appears that
should get me either to a booted installation or to another layer
of the onion.
Thanks also for pointing out that mdadm must be in the initrd.
I'll need to remember to make sure it's in there.
(With apologies for following up to my own post...)

Again, thank you, Eef and Pascal, for your replies. In the end,
the onion had a couple more layers. For the benefit of anyone
else who might find this, here's the story:

This is a test VM modeled after a machine that has run Mageia 1
and Debian 7/Wheezy. It has 5 disk drives, with identical
partition schemes of several partitions each. Except for one
partition on each drive dedicated to /boot, each other partition
forms a RAID10. Here's the scheme, trimmed /proc/mdstat, as seen
by the installer:

md122 : active raid10 vda8[0] vdd8[3] vdb8[2] vdc8[1]
md123 : active raid10 vda7[0] vdd7[3] vdb7[2] vdc7[1]
md124 : active raid10 vda6[0] vdd6[3] vdb6[2] vdc6[1]
md125 : active raid10 vda5[0] vdd5[3] vdb5[2] vdc5[1]
md126 : active raid10 vda3[0] vdd3[3] vdb3[2] vdc3[1]
md127 : active raid10 vda2[0] vdd2[3] vdb2[2] vdc2[1]

This is an 'ls -l /dev/md' from the installer:

lrwxrwxrwx 1 root root 8 Feb 2 2018 t1backs -> ../md122
lrwxrwxrwx 1 root root 8 Feb 2 2018 t1home -> ../md126
lrwxrwxrwx 1 root root 8 Feb 1 03:32 t1slash -> ../md127
lrwxrwxrwx 1 root root 8 Feb 1 03:31 t1swap -> ../md123
lrwxrwxrwx 1 root root 8 Feb 1 03:33 t1tmp -> ../md124
lrwxrwxrwx 1 root root 8 Feb 1 03:33 t1usrlocal -> ../md125

This is an 'ls -l /dev/md' from the (eventually booting)
installed system:

lrwxrwxrwx 1 root root 8 Feb 1 20:06 Microknoppix:t1backs -> ../md122
lrwxrwxrwx 1 root root 8 Feb 1 20:06 Microknoppix:t1home -> ../md127
lrwxrwxrwx 1 root root 8 Feb 1 20:06 Microknoppix:t1slash -> ../md125
lrwxrwxrwx 1 root root 8 Feb 1 20:06 Microknoppix:t1swap -> ../md126
lrwxrwxrwx 1 root root 8 Feb 1 20:06 Microknoppix:t1tmp -> ../md123
lrwxrwxrwx 1 root root 8 Feb 1 20:06 Microknoppix:t1usrlocal -> ../md124

The first problem was during boot the RAIDs were not formed, so
there was no root fs to mount, which caused a kernel panic. The
solution to that was to create an initrd with a command similar
to this in /boot/grub:

mkinitrd -c -k 4.4.14 -f ext4 -m md_mod:raid10:jbd2:mbcache:ext4 -r /dev/md/Microknoppix:t1slash -R

Also, I needed to append this to each boot stanza:

initrd /initrd.gz

The second problem was the scrambling of the mapping between
/dev/md12{2,3,4,5,6,7} and {Microknoppix:}t1* between the
installer and the installed system. The solution was to edit
(using "sed -i -e 's,...,...,g' $file") both /etc/fstab and
/boot/grub/grub.cfg to change all the "/dev/md12*" references to
"/dev/md/Microknoppix:t1*" The edit to /etc/fstab had to wait
until just before rebooting from the installer to the installed
system.

The system appears to be functional. Thank you again for the
replies that helped lead to solutions.
--
Robert Riches
***@jacob21819.net
(Yes, that is one of my email addresses.)
Loading...