SoftwarePractice.org: Home | Courseware | Wiki | Archive

Backing up to StrongSpace

From SoftwarePractice.org

It's nice to have a remote backup facility. At the least, this means that your databases are copied to a remote server. Even better is for the backups to be in a different data center on the other side of the country.

Here's how I make backups to StrongSpace. It's not tricky, but I like to write it down anyway. One point here: I think it's important to distinguish between your code and your data. You do want to back up your data to StrongSpace, but not your code. The code doesn't need to be backed up because either:

  • You can get it again easily enough (an unmodified web application, for instance), or
  • It's in a subversion (or some other) repository

(If you feel that your subversion repositories are important enough to make backup copies of, then by all means do so. Just don't back up the working web directories with the code in them as well...)

So, here is what I generally back up:

  • Database dumps
  • Configuration and management scripts
  • Generated website statistics
  • User uploads. This includes things like avatars, attachments, file uploads, photos for photo galleries, and so on.


Contents

Set up remote authentication

Since you want to have your backups run automatically, you want to have a script log in to your strongspace account without you having to be there to type your password. I do all of this over ssh, so here's how you get an automatic login.

  1. On the machine that you will be sending backups from:
      ssh-keygen -t rsa
    

    Select the default directory for storing the keys, and for the password, simply hit the Return key (i.e. no password required).

  2. On the file browser on StrongSpace, create a directory called .ssh.
  3. Copy the ~/.ssh/id_rsa.pub file to StrongSpace and rename it to authorized_keys:
      scp ~/.ssh/id_rsa.pub username@username.strongspace.com:/.ssh/authorized_keys
    

    The 'username' is of course your account name on StrongSpace.

  4. Check that it works:
      cd ~
      scp somejunkfile username@username.strongspace.com:
    

    The file should upload without you having to type in a password. In the StrongSpace browser, check that the file is there.

  5. Verify that rsync works as you expect. For example, do this:
      rsync -rtzv somedirectory username@username.strongspace.com:
    

    This tells rsync to use ssh, to recursively copy the directory, to preserve times, and to compress data for transmission. It also says to be "verbose" about what it's doing. For automated backups, you would not use the 'v' option.

  6. Restrict access permissions to your .ssh directory:
      cd ~/.ssh
      chmod 700 .
      chmod 600 *
    

Dumping databases

The database is the really valuable part of any web application, it's worth making sure that it's preserved properly.

  1. First off, you need to a script to generate the database backup files. Here's what mine looks like for my TextDrive account, based originally on this TextDrive knowledge-base article:
    for db in \
            "dbname1 dbuser1" \
            "dbname2 dbuser2" \
            "dbname3 dbuser3" \
    do
      set -- $db
      /usr/local/bin/mysqldump \
         --opt --skip-add-locks \
         --user=$2 --password=THEPASSWORD $1 \
         | gzip \
         > ${HOME}/backups/${1}-`date "+%Y-%m-%d"`.sql.gz
    done
    cd ${HOME}/backups
    /usr/bin/find *.gz -mtime +7 -delete
    

    Note that in this example, I am using a common password for all database users. Also note that mysqldump is not the best way to back up a large database, more on that later.

  2. Put the script somewhere convenient, and change the permissions to restrict access:
      chmod 700 daily_backup.sh
    
  3. Schedule a cron job to run it daily. I run it in the "dead of night" or about 10am GST.

Automate the backups

Now all you need to do is write a script that rsyncs everything that you want backed up.

  1. On your StrongSpace account, create directories to hold the data you want to preserve. I created two, called textdrive and domains
  2. Write a script to back up all the needed directories. I put all my TextDrive stuff into the textdrive directory, so make sure that you create that directory in the StrongSpace browser before running this script. The script looks like this:
    for dirname in \
            backups \
            etc \
            scripts \
            logs/awstats_data
    do
            origin=${HOME}/${dirname}
            target=textdrive/`echo $dirname | sed -e 's/\//-/g'`
            /usr/local/bin/rsync -rtz --delete $origin/* username@username.strongspace.com:${target}
    done
    for domaindir in \
            "domain1.com uploads" \
            "domain2.com images/avatars"
    do
            set -- $domaindir
            origin=${HOME}/domains/${1}/web/public/${2}
            target=textdrive/${1}-`echo $2 | sed -e 's/\//-/g'`/
            /usr/local/bin/rsync -rtz $origin/* username@username.strongspace.com:${target}
    done
    

    A couple of notes here:

    • I am backing up my AWStats data, but not the raw log files! I think you do need to be selective about what you back up -- if I lose the raw logs, it's no big deal, but I'd like to keep the stats.
    • You can't do a backup to a directory that doesn't exist. So, the script converts eg logs/awstats_data on the textDrive account to the directory logs-awstats_data on StrongSpace.
    • Similarly for the domains: you end up with StrongSpace directories named, for example, domain2.com-images-avatars.
    • For the first set of directories, I am using the --delete option. This deletes files on StrongSpace if I they no longer exist on TextDrive. For the database backups in particular, this is important -- otherwise you keep accumulating backup files forever! However, it does mean that you can't get back files that you accidentally deleted. So I don't use that option on the backups for files in the web directories (which might have been uploaded by users).
  3. Set permissions and check that the script works:
      chmod 700 remote_backup.sh
      ./remote_backup.sh
    

    (Go look at your StrongSpace file system to see that your data got backed up as expected.)

  4. Set up a cron job to run this script automatically once per day. I set it to run about 20 minutes after the database backup script runs.

Backup up large databases

mysqldump is not the most efficient way of backing up a large database. For a start, you are making a copy of the whole database every time you do a backup, when you would rather do, say, just the things that changed in the last 24 hours. And, you are copying the complete contents of that dump file every time, when again only the changed parts need to be copied. So, here's how to do a daily incremental backup, and a weekly full backup.

Note: this section split off to Backing up a web server.

Personal tools