"... they were afraid to run fsck for several months". That little sentence fragment was part of a support email I received this week. The email went on to say "After we ran it the system will no longer boot. (perhaps their fears were justified)."
I've heard similar things in the past. The fear is that fsck will "make things worse". The system is running now, it's working (or at least mostly working), so why tempt fate? The mental analogy here seems to be one of going in for surgery and dying from the anesthesia.
That's the wrong idea. Anything that fsck sees as needing fixing has the strong potential to make things worse, often much worse, if left unfixed. Yes, there may be unusual circumstances where I would want to not run fsck in order to examine or copy something before letting it do its repair. There might be times where I'd want to first run it with a "-n" (meaning report what needs fixing but do nothing). But in none of these cases would I then turn the system over to users.
There are things that fsck will fix that might be fine to leave untreated. For example, fsck looks for files that don't have an entry in any directory. It makes entries for those in the "lost+found" directory of the filesystem. If you have no need for those "lost" files, they really don't matter. They can sit out there in never-never land and never cause a problem. Fsck is not going to have any problem fixing these, either, so there's no reason not to let it do what it wants to do.
However, other filesystem screw-ups are more dangerous. If a disk block that belongs to one file also ends up among the blocks that are supposed to belong to another file, or is on the "free list" (meaning available for the next file that needs more blocks), that can really mess you up. That block will contain data from whichever file was last written to at whatever position the block lies. You have no way of knowing whether your application is writing accounts receivable data into the midst of your kernel or any other system file. File permissions don't matter; data will get written. You can lose application data or lose system files.
All fsck can do in that case is allocate a new block and copy whatever it has to there and then fix up the inode so there are no cross-linked blocks (dups). That doesn't really change anything - if you've already messed up important data, fsck won't make anything worse!
If the block is on the free list and you never need any more space or never need enough to get to where that block is allocated, sure, no problem. Again, it could sit on the free list forever and never cause a problem, but the moment it is allocated, the original file is at risk.
This is why you are just tossing the dice if you ignore the systems desire to run fsck after a crash. You might have no problems, but if the issue really was that innocuous, fsck will have no problems fixing it quickly. Unless... well, unless you have hardware issues that will cause fsck itself to do something unpleasant. Bad ram or a malfunctioning disk controller could cause fsck to inadvertently make things worse. However, under such conditions, running anything, including your application, is also likely to mess things up - perhaps not as spectacularly or as immediately noticeable as what might happen with fsck, but damaged files will almost certainly result.
Don't be afraid to run fsck. If it has problems, you already have things to be worried about.
Got something to add? Send me email.
More Articles by Anthony Lawrence © 2012-07-14 Anthony Lawrence
By understanding a machine-oriented language, the programmer will tend to use a much more efficient method; it is much closer to reality. (Donald Knuth)