I recently had a confusing issue with trying to install an updated kernel on a machine I could only reach remotely. This is a dedicated machine at a large web hosting company. You can do quite a bit remotely: you can boot from a rescue image, you can access the box through its ttyS0 port if the network is broken, you can even have the whole thing reinstalled. But you can't select the kernel that it is going to boot - it's only going to boot the default kernel.
Well, we wanted to update that kernel - it wasn't horribly old, but things change, and security issues come up, so the decision was made to update. Unfortunately, I really didn't look closely at the current configuration, and ended up with an unbootable machine.
I really don't have a complete answer to what did or didn't happen yet. Looking at the existing /boot directory, there were several kernels present, some of which had accompanying initrd files. However, the booted kernel had no initrd file, so I assume it was not needed. The purpose of initrd is to allow a two stage boot: a file system image from the indicated image is loaded onto a ram disk, and the kernel loads other modules that it needs from that image. It's often used for scsi disks and RAID controllers where the driver has been provided as a loadable module. This system is some sort of SATA RAID, so I would think it would need that, but then again there is the oddity that there was no initrd file for the booted kernel, nor any mention of it in /etc/lilo.conf. You create the initrd image with mkinitrd.
If there had been an initrd image, we could mount the image (on the loop device) and see what modules are in it. This is a script that helps do that: https://sial.org/howto/linux/initrd/initrd-util
But, as I said, the booted kernel had no such image. This low level stuff confuses me a bit - I can grok it at some level, but when faced with a puzzle like this, I'm a bit lost. I can look in /proc/modules and see there's nothing there about a scsi driver: there's md5, ipv6 , iptables stuff, microcode and binfmt_misc, but that's all.
I can see the drive stuff with "dmesg" but I don't yet know if the new kernel should have been able to boot this hardware (it was a yum download, not compiled from scratch - which is probably what I should have done)
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx Loading Adaptec I2O RAID: Version 2.4 Build 5go Detecting Adaptec I2O RAID controllers... Red Hat/Adaptec aacraid driver (1.1.2-lk2 Dec 21 2004) 3ware Storage Controller device driver for Linux v1.26.00.039. 3w-xyzx: No cards found. 3ware 9000 Storage Controller device driver for Linux v2.26.02.001. libata version 1.02 loaded. ata_piix version 1.02 PCI: Setting latency timer of device 0000:00:1f.2 to 64 ata1: SATA max UDMA/133 cmd 0x1F0 ctl 0x3F6 bmdma 0xFFA0 irq 14 ata1: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4003 85:3469 86:3c01 87:4003 88:207f ata1: dev 0 ATA, max UDMA/133, 156301488 sectors: lba48 ata1: dev 0 configured for UDMA/133 scsi0 : ata_piix Using anticipatory io scheduler Vendor: ATA Model: ST380817AS Rev: 3.42 Type: Direct-Access ANSI SCSI revision: 05 ata2: SATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xFFA8 irq 15 ATA: abnormal status 0x7F on port 0x177 scsi1 : ata_piix SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB) SCSI device sda: drive cache: write back sda: sda1 sda2 sda4 > sda5 sda6 sda7 > Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
That doesn't mean a lot to me, frankly. And of course because I can't see anything from the failed boot, I have little clue to what it was missing or had trouble with.
Got something to add? Send me email.
More Articles by Tony Lawrence © 2009-11-07 Tony Lawrence
The less accurate your mental model of a given process is, the less accurate is any guess you make about its malfunction. (Tony Lawrence)