When backing up to tape (or really any media), you need to verify. Anything can go wrong: tape media is much more subject to physical damage than a hard drive, so data can be written incorrectly. Simple dirt or dust can affect a tape backup too. Verification is vital.
There are two basic ways to verify. One is to take a checksum of each file before writing it to tape, write that checksum, then write the tape data. The file can then be verified by reading back the checksum and comparing to a new checksum computed from the file data read from tape. Note that the drive itself may compute independent checksums for each block of data written; if that's the case storing a file checksum may seem redundant. Most any backup software (tar, cpio, etc.) is going to store that checksum anyway.
The other method is to read back the data from the hard drive and compare to the data on tape. That's where an apparently redundant checksum could be of value: you could save time by comparing the stored checksum to one newly computed, rather than re-summing on the actual tape data.
The "checksum on tape" method has the advantage of being able to verify backup integrity at any time without any need for comparing to on-disk data. But it is also valuable to compare back to the original on disk because it may have changed do to user activity or (more rarely, of course) hardware problems.
There are three basic conditions:
Data changed sometime during the backup of a particular file. Either method (stored checksum or re-read from disk) should identify that.
Data is corrupt on tape. Either method would know that.
Data changed after this file was backed up but before getting back to verify the file. The checksum method doesn't care that the on-disk file has changed. The re-read method does.
In this last case, the value of that is often forensic: if the file wasn't supposed to change during the backup period, why did it? In one particular situation, that anomaly caused us to look at who was logged in during the backup window and ultimately led to a discovery of embezzlement.
Of course there are always files expected to change during backup. The software I use lets you specify files that don't need verification (log files, etc.). It lists the files that do change, including just time stamp changes - which don't threaten the integrity of the backup, but again may indicate unexpected use. That time stamp difference once triggered an investigation that turned up an employee stealing customer contact information after hours. His goal apparently was to take that data with him to another employer or to start his own business. He had no access to the actual data files, but could print reports, which is just what he was doing. It would have been impossible for him to print the entire database in one night, so he was doing it in sections, all the "A"'s tonight, the "B"'s tomorrow, and so on. The time stamp difference caused that activity to be noticed; the next night I set a trap to see exactly what he was printing, and he was caught before he got to the "D"'s.
Have you tried Searching this site?
Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates
This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more. We appreciate comments and article submissions.
Many of the products and books I review are things I purchased for my own use. Some were given to me specifically for the purpose of reviewing them. I resell or can earn commissions from the sale of some of these items. Links within these pages may be affiliate links that pay me for referring you to them. That's mostly insignificant amounts of money; whenever it is not I have made my relationship plain. I also may own stock in companies mentioned here. If you have any question, please do feel free to contact me.
Specific links that take you to pages that allow you to purchase the item I reviewed are very likely to pay me a commission. Many of the books I review were given to me by the publishers specifically for the purpose of writing a review. These gifts and referral fees do not affect my opinions; I often give bad reviews anyway.
We use Google third-party advertising companies to serve ads when you visit our website. These companies may use information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide advertisements about goods and services of interest to you. If you would like more information about this practice and to know your choices about not having this information used by these companies, click here.
Click here to add your comments
Mon Aug 8 14:28:33 2005: Subject: BigDumbDinosaur
In this last case, the value of that is often forensic: if the file wasn't supposed to change during the backup period, why did it? In one particular situation, that anomaly caused us to look at who was logged in during the backup window and ultimately led to a discovery of embezzlement.
So that's how all those Enron crooks...er...executives got caught! <Grin>
Don't miss responses! Subscribe to Comments by RSS or by Email
Click here to add your comments
If you want a picture to show with your comment, go get a Gravatar