I guess I'm not surprised. This web site started out with 2GB of file space, but you add a few thousand pages, and suddenly three fourths of it is gone. That's not so bad really - a quarter is still left to grow with, but the problem is the darn web access logs. Amazingly enough, the access_log file here grows around 6 MB daily. Toward the end of the month, that starts to be a significant amount of space.
I could just zero the darn thing out regularly, but for various reasons, it's easiest to have a whole month's log available. I thought about various schemes to consolidate the data, but darn it, who knows what I might want to extract at some time or another? So I gave that idea up. I also thought about buying more disk space, but I'm just a cheapskate at heart. I'll have to bite that bullet someday, but I want to put it off (shared hosting can also be a bit of a pain to add disk space to - another reason to delay that).
Well, web access logs are easily compressed. A 30 MB access_log easily gzips down to a tenth of that. So a "trimlog" scipt running early Sunday mornng can do that:
DATE=`/bin/date +%m%d%y` cd ~/www/logs cat access_log | /usr/bin/gzip > $DATE.gz > ~/www/logs/access_log > ~/www/logs/error_log
That will lose a few lines, but I don't need 100% accuracy here. The zipping takes a few seconds; not too much comes in early Sunday morning anyway, so it might not even lose anything.
That only left me with the problem of how to feed both the compressed files and the current log to the programs that need them. That's easy:
cd /tmp zcat ~/www/logs/*.gz | cat - ~/www/logs/access_log | pct.pl > topten.pl cp topten.pl /usr/home/pcunix/www/htdocs/topten.html
The zcat prints the logs to stdout, that "-" after cat tells it to read stdin before it reads access_log, and then the whole thing is fed to the program that does all the stuff I need. Problem solved, or at least put off for a while.
Got something to add? Send me email.
More Articles by Tony Lawrence © 2012-06-27 Tony Lawrence