Another geeky technical post. Ignore if it’s not your thing….
I wrote recently about the GRUB bootloader and how it can sometimes cause a remote or headless server not to come back online after, e.g. a kernel update.
This can happen in other situations: The configuration used on recent versions of Ubuntu is such that, if the system thinks that the last boot attempt failed, it stops it automatically booting into the same configuration again by cancelling the countdown timer, setting the number of seconds to -1, causing it to wait indefinitely at the boot menu until you decide on the best action to take. This is a sensible default, because a machine that goes into an infinite loop of reboots is doing nobody any good and puts a fair bit of strain on its own hardware.
Unfortunately, though, other things can trigger this behaviour. If you have a power fluctuation, for example, such that the machine restarts, gets part-way and then power-cycles again, you may find yourself with a machine that doesn’t come back online of its own accord.
On the most recent Ubuntu versions (12.04 with updates, and later) you can add a setting to /etc/default/grub:
GRUB_RECORDFAIL_TIMEOUT = 30
for a 30 second timeout after it has recorded a failure condition. You can use 0 if you don’t want it to pause and show the boot menu at all, but remember that it could then go into fairly fast repeated reboots if something really does go wrong.
On earlier versions, you’ll need to edit /etc/grub.d/00_header and find the line near the bottom, in the maketimeout()_ function, which sets
and change that to your preferred value.
In either case you’ll then need to run:
to make your changes take effect.
There’s also the one-shot approach (http://utcc.utoronto.ca/~cks/space/blog/linux/OneShotGrub) which is useful to test out new kernels etc.
Thanks for this. I noticed just adding/removing hardware causes my headless 12.0.4 server to get stuck at the GRUB bootloader. This will save me lots of time.