I discovered recently that one of my automated nightly backup processes had failed. I didn't discover this until about a week after it happened, and though I was able to fix it easily enough, I discovered another problem in the process: all of my backups for those systems had been wiped out. The cause turned out to be a nightly cron job that deletes old backups:
find /home/backup -type f -mtime +2 -exec rm -f {} +
This is pretty basic: find all files under /home/backup/ that are more than two days old and remove them. When new backups are added each night, this is no problems; even though all old backups get removed, newer backups are uploaded to replace them. However, when the backup process failed, the cron job kept happily deleting the older backups until, three days later, I had none left. Oops.
Fortunately, this didn't end up being an issue as I didn't need those specific backups, but nevertheless I wanted to fix the process so that the cleanup cron job would only delete old backups if newer backups exist. After a bit of testing, I cam up with this one-liner:
for i in /home/backup/*; do [[ -n $(find "$i" -type f -mtime -3) ]] && find "$i" -type f -mtime +2 -exec rm -f {} +; done
That line will work great as a cron job, but for the purpose of discussion let's break it down a little more:
1. for i in /home/backup/*; do
2. if [[ -n $(find "$i" -type f -mtime -3) ]]; then
3. find "$i" -type f -mtime +2 -exec rm -f {} +
4. fi
5. done
So, there are three key parts involved. Beginning with step 2 (ignore the for
loop for now), I want to make sure "new" backups exist before deleting the older ones. I do this by checking for any files that are younger than the cutoff date; if at least one or more files are found, then we can proceed with step three. The -n
test verifies that the output of the find command is "not null", hence files were found.
Step 3 is pretty much exactly what I was doing previously, ie., deleting all files older than two days. However, this time it only gets executed if the previous test was true, and only operates on each subdirectory of /home/backup instead of the whole thing.
This brings us neatly back to step 1. In order for this part to make sense, you must first understand that I backup multiple systems to this directory, each under their own directory. So, I have:
/home/backup/server1
/home/backup/server2
/home/backup/server3
etc.
If I just use steps 2 and 3 operate on /home/backup directly, I could still end up losing backups. Eg., let's say backups for eveery thing except server1 began failing. New backups for server1 would continue to get added to /home/backup/server1, which means a find
command on /home/backup (such as my test in step 2) would see those new files and assume everything just dandy. Meanwhile, server2, server3, etc. have not been getting any new backups, and once we cross the three day threshold all of their backups would be removed.
So, in step one I loop through each subdirectory under /home/backup, and then have the find
operations run independently for each server's backups. This way, if all but server1 stops backing up, the test in step 2 will succeed on server1/, but fail on server2/, server3, etc,, thus retaining the old backups until new backups are generated.
And there you go: a safer way to cleanup old files and backups.