Discussion:
kernel panic with RAID10 root filesystem
(too old to reply)
Robert Riches
2018-05-23 03:43:33 UTC
Permalink
Attempted a Slackware 14.2 installation on a physical machine
with RAID10 filesystems, including the root filesystem, but with
/boot on a plain (non-RAID) partition. A similar test VM had
worked beautifully for months--with initrd and such. A second
installation on the same physical machine but with the root
filesystem on a plain partition works fine, at least to the point
of running X, launching an xterm, and from the xterm launching
xclock.

The problematic machine gets to about 5 seconds into booting,
past the point where the system tries (unsuccessfully) to find
RAID v0.90 superblocks/signatures. Then, there's a kernel panic
with a long backtrace taking up the whole screen. Everything in
between flies by on the screen so quickly there's no way for
human eyes to detect what messages happen right before the panic
and backtrace. The backtrace has function names that seem to
indicate attempts to mount the real root filesystem and functions
that have "APIC" in their names.

Sorry about so few details at this point, but it's getting close
to nightfall. I'll try to post more details tomorrow.

Meanwhile, if anyone has pointers to how-to documents for either
finding the root cause of a panic like this or solving (working
around) this sort of thing, I'd be extremely grateful.

(I'm _SOOOOO_ glad I structured the machine's partition scheme to
allow me to fall back to the previous OS installation with a few
spare partitions for experimenting. It saved my bacon this
time.)

Thanks.
--
Robert Riches
***@jacob21819.net
(Yes, that is one of my email addresses.)
Robert Riches
2018-05-27 03:18:40 UTC
Permalink
Post by Robert Riches
Attempted a Slackware 14.2 installation on a physical machine
with RAID10 filesystems, including the root filesystem, but with
/boot on a plain (non-RAID) partition. A similar test VM had
worked beautifully for months--with initrd and such. A second
installation on the same physical machine but with the root
filesystem on a plain partition works fine, at least to the point
of running X, launching an xterm, and from the xterm launching
xclock.
The problematic machine gets to about 5 seconds into booting,
past the point where the system tries (unsuccessfully) to find
RAID v0.90 superblocks/signatures. Then, there's a kernel panic
with a long backtrace taking up the whole screen. Everything in
between flies by on the screen so quickly there's no way for
human eyes to detect what messages happen right before the panic
and backtrace. The backtrace has function names that seem to
indicate attempts to mount the real root filesystem and functions
that have "APIC" in their names.
Sorry about so few details at this point, but it's getting close
to nightfall. I'll try to post more details tomorrow.
Meanwhile, if anyone has pointers to how-to documents for either
finding the root cause of a panic like this or solving (working
around) this sort of thing, I'd be extremely grateful.
(I'm _SOOOOO_ glad I structured the machine's partition scheme to
allow me to fall back to the previous OS installation with a few
spare partitions for experimenting. It saved my bacon this
time.)
Thanks.
For anyone who might be directed here by a search engine:

In this case, the problem boiled down to the feeeeeeechuuuuuuuur
in modern kernel where disks are randomly ordered at boot. So,
when I made fixes in /boot (on which /dev/sda2 was mounted), on
the next boot the fixes were likely not seen by Smart Boot
Manager or GRUB. For now, one workaround for my case (identical
partitions on all disks, originally intended to form a RAID from
each slice of partitions) is after each write to /boot, do two
things: 1) dd from this /dev/sda2 to all other /dev/sd*2
partitions, 2) use e2label to fix the labels.

A more elegant solution may be to use LABEL=... in /etc/fstab,
but that won't solve what SBM and GRUB see. Or, perhaps there
might be a kernel boot option to force disk ordering to be
deterministic at the expense of a few seconds of boot time.

Thanks,
--
Robert Riches
***@jacob21819.net
(Yes, that is one of my email addresses.)
Bit Twister
2018-05-27 06:59:08 UTC
Permalink
Post by Robert Riches
A more elegant solution may be to use LABEL=... in /etc/fstab,
but that won't solve what SBM and GRUB see.
Ah but you can drop a script in /etc/grub.d/ to make grub2 use label
for booting. That is what I have.

I am running Mageia Linux Release 6 with several installs. Snippets from
different files follow:

$ grep -i line /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT=" noiswmd "
GRUB_CMDLINE_LINUX="ipv6.disable=1 audit=0 rd.driver.pre=ehci_hcd"

$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz root=LABEL=mga6 noiswmd ipv6.disable=1 audit=0 rd.driver.pre=ehci_hcd


$ grep mga6 /etc/fstab
LABEL=mga6 / ext4 relatime,acl 1 1


$ grep mga6 /boot/grub2/grub.cfg
menuentry "mga6" {
search --no-floppy --label --set=root mga6
linux /boot/vmlinuz root=LABEL=mga6 noiswmd ipv6.disable=1 audit=0 rd.driver.pre=ehci_hcd
menuentry 'Mageia mga6' --class mageia --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-ceb808f8-a28c-4cfd-8d77-b703ee7085ac' {
echo 'Loading Linux 4.14.40-desktop-1.mga6 ...'
linux /boot/vmlinuz-4.14.40-desktop-1.mga6 root=UUID=ceb808f8-a28c-4cfd-8d77-b703ee7085ac ro ipv6.disable=1 audit=0 rd.driver.pre=ehci_hcd noiswmd
initrd /boot/initrd-4.14.40-desktop-1.mga6.img


Script for generating grub menu selections using labels:
----8<----8<----8< cut below this line ---8<----8<----8<----8<
#! /bin/sh

#*****************************************************************************
#* 10a_label_xx__grub - grub2 script to generate menu entries using partition label
#*
#* Install Procedure
#* save as 10a_label_xx__grub
#* chmod +x 10a_label_xx__grub
#* cp 10a_label_xx__grub /etc/grub.d or create a link in /etc/grub.d
#* and then run update-grub2 to generate a new grub2 menu.
#*
#* Assumptions:
#* All partitions containing /boot/vmlinuz-desktop have a label
#* and are ext4
#*
#*
#*****************************************************************************
#
#
# grub-mkconfig helper script.
# Copyright (C) 2006,2007,2008,2009,2010 Free Software Foundation, Inc.
#
# GRUB is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# GRUB is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with GRUB. If not, see <http://www.gnu.org/licenses/>.

set -e

prefix="/usr"
exec_prefix="/usr"
datarootdir="/usr/share"

. "/usr/share/grub/grub-mkconfig_lib"

device=""
_extra_cmd=""
line=""
label=""
_label_fn="/tmp/10a_label.lst"

export TEXTDOMAIN=grub
export TEXTDOMAINDIR="${datarootdir}/locale"


#****************************************************
#* create grub2 menu stanza. arg 1 is partition label
#* GRUB_CMDLINE_* are found in /etc/default/grub
#****************************************************

menu_stanza_1 () {
cat << EOF
menuentry "$1" {

set gfxpayload=text
insmod regexp
insmod gzio
insmod part_gpt
insmod ext2
search --no-floppy --label --set=root $1
linux /boot/vmlinuz root=LABEL=$1 ${GRUB_CMDLINE_LINUX_DEFAULT} ${2} ${GRUB_CMDLINE_LINUX}
initrd /boot/initrd.img
}
EOF

}
#*********************************************
#* create current mount as first menu entry
#*********************************************

mkdir --parents /mnt/10a_label
set -- $(mount | grep ' / ')
device=$1
label=$(e2label ${device} 2>/dev/null)
menu_stanza_1 $label

#***********************************************************
#* look through other ext4 partitions for /boot/vmlinuz-desktop
#* and create a menu entry using its partition label.
#***********************************************************

lsblk -lno NAME,LABEL,FSTYPE | grep ext4 > $_label_fn

while read -r line; do
set -- $line
if [ "$2" != "$label" ] ; then
mount -t auto /dev/$1 /mnt/10a_label
if [ -e /mnt/10a_label/boot/vmlinuz ] ; then
if [ $2 = "mga5" ] ; then
_extra_cmd="nokmsboot"
else
_extra_cmd=""
fi
menu_stanza_1 "$2" "$_extra_cmd"
fi
umount --lazy /mnt/10a_label
fi
done < $_label_fn

rmdir /mnt/10a_label
rm --force $_label_fn

#***************** end /etc/grub.d/10a_label_xx__grub *************************
Robert Riches
2018-05-27 16:18:15 UTC
Permalink
Post by Bit Twister
Post by Robert Riches
A more elegant solution may be to use LABEL=... in /etc/fstab,
but that won't solve what SBM and GRUB see.
Ah but you can drop a script in /etc/grub.d/ to make grub2 use label
for booting. That is what I have.
I am running Mageia Linux Release 6 with several installs. Snippets from
$ grep -i line /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT=" noiswmd "
GRUB_CMDLINE_LINUX="ipv6.disable=1 audit=0 rd.driver.pre=ehci_hcd"
$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz root=LABEL=mga6 noiswmd ipv6.disable=1 audit=0 rd.driver.pre=ehci_hcd
$ grep mga6 /etc/fstab
LABEL=mga6 / ext4 relatime,acl 1 1
$ grep mga6 /boot/grub2/grub.cfg
menuentry "mga6" {
search --no-floppy --label --set=root mga6
linux /boot/vmlinuz root=LABEL=mga6 noiswmd ipv6.disable=1 audit=0 rd.driver.pre=ehci_hcd
menuentry 'Mageia mga6' --class mageia --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-simple-ceb808f8-a28c-4cfd-8d77-b703ee7085ac' {
echo 'Loading Linux 4.14.40-desktop-1.mga6 ...'
linux /boot/vmlinuz-4.14.40-desktop-1.mga6 root=UUID=ceb808f8-a28c-4cfd-8d77-b703ee7085ac ro ipv6.disable=1 audit=0 rd.driver.pre=ehci_hcd noiswmd
initrd /boot/initrd-4.14.40-desktop-1.mga6.img
----8<----8<----8< cut below this line ---8<----8<----8<----8<
#! /bin/sh
...
Yet again, Mr. Twister comes to the rescue. :-)

Thank you very much for the idea to put a script in /etc/grub.d
to facilitate using labels for booting. Based on the comments in
the script that I snipped for brevity, do I guess correctly
that's the script you referred to?

Thanks.
--
Robert Riches
***@jacob21819.net
(Yes, that is one of my email addresses.)
Bit Twister
2018-05-27 21:25:51 UTC
Permalink
Post by Robert Riches
Post by Bit Twister
Post by Robert Riches
A more elegant solution may be to use LABEL=... in /etc/fstab,
but that won't solve what SBM and GRUB see.
Ah but you can drop a script in /etc/grub.d/ to make grub2 use label
for booting. That is what I have.
----8<----8<----8< cut below this line ---8<----8<----8<----8<
#! /bin/sh
...
Yet again, Mr. Twister comes to the rescue. :-)
Thank you very much for the idea to put a script in /etc/grub.d
to facilitate using labels for booting.
You can also put scripts there to boot iso's from your disks/storage media.
I have a script to boot the latest sysrescuecd.iso in my /spare directory.
Another script to generate menu entries for all Mageia*.iso files
found in /spare.
Post by Robert Riches
Based on the comments in the script that I snipped for brevity, do I
guess correctly that's the script you referred to?
Yes you can guess correctly that was the actual script as posted. :)

The problem is that you may have to modify it to match how/what your
Distribution provides the boot files. On Mageia Linux /boot/vmlinuz
and /boot/initrd.img are always soft linked to the latest installed files.

A quick glance at a Slack /boot partitions shows me the script will
have to modified to work with Slack's naming/booting methodology.
Loading...