Using Amazon’s S3 for Backups

I don’t have a backup system for home (which is where this site, and others are located), and I have generally relied on duplicating enough of my important stuff between friends and other computers. That, and I have a RAID5 setup for my large storage, and then home directories and website stuff is on a RAID1 ZFS volume. This doesn’t prevent accidental “oh-no”s, but it does protect me from some hardware failures.

Last year when I upgraded to the new server, I lost a lot of data because I forgot to backup all of my MySQL databases. I like to think I can learn from my mistakes, so a full year later I finally did something about it and signed up for Amazon’s S3 service.

The pricing is pretty nice, and I don’t have all that much data to backup. I figure, I’ll use up a few GB in total, and keep the monthly price around $1 – $2. That seems worth the price for off-site backup’s.

Now, I have 3 main websites that I need to backup, and one test site that I like to play around with:

After a quick “FreeBSD s3 backup” Google search, I found Gary Dalton’s blog post: http://dvector.com/oracle/2008/10/18/backing-up-to-amazon-s3/. After reading this post, I formulated my plan of attack:

  • Sign up for S3, create a “bucket” for each site
  • Use something to interface with S3 ( duplicity )
  • Automate MySQL and PostgreSQL backups
  • Create a service account to run both s3 and db backup scripts as
  • Set up a cron job for backups

So, after I signed up for S3, I had to create the buckets. I couldn’t find a way to do this though my Amazon account settings, so I created a little ruby script.

$ sudo gem install aws-s3
$ vim make-bucket.rb

#!/usr/local/bin/ruby
 
require 'aws/s3'
 
AWS::S3::Base.establish_connection!(
:access_key_id     => 'my-s3-key-id',
:secret_access_key => 'my-s3-secret-access-key'
)
AWS::S3::Bucket.create('mywushublog')
AWS::S3::Bucket.create('willowoak')
AWS::S3::Bucket.create('m87-blackhole')
AWS::S3::Bucket.create('evil-genius-network')

$ ./make-bucket.rb
Next, I had to install duplicity and py-boto
[root@server ~] cd /usr/ports/sysutils/duplicity
[root@server duplicity] make install
...
[root@server duplicity] cd ../../devel/py-boto
[root@server py-boto] make install clean
...
[root@server py-boto]

Next step, create a user (with access to shared data, and website data) to run the backups with the adduser command…
[root@server py-boto] adduser -g shared-data -G www -s /bin/tcsh -w random s3backupuser
...
[roott@server py-boto] su - s3backupuser
In tcsh, you can `set autolist' to have the shell automatically show
all the possible matches when doing filename/directory expansion.
%

I’ll have to set my Access ID and Access Key in the s3backupuser’s environment, as well as a GnuPG passphrase so the backups are encrypted (and compressed). I mean, I trust Amazon, but not THAT much :)
% vim .cshrc
setenv AWS_ACCESS_KEY_ID my-s3-key-id
setenv AWS_SECRET_ACCESS_KEY my-s3-secrect-access-key
setenv PASSPRASE AVeryRandonPassphraseForGnuPG

Next, I copied the very useful automysqlbackup.sh script into a separate script for each website. I could have just dumped every database that was running, but I wanted to segregate each site’s databases into a different directory. So, I’m complicating my cron job by running multiple backup scripts, but I really want to make the end result easily readable and identifiable by me. So for each site, I create a directoy under /u01/backups:
%ll /u01/backups/
total 8
drwxr-x---  5 s3-backupuser  mysql  5 Apr 25 15:46 evil-genius-network
drwxr-x---  5 s3-backupuser  mysql  5 Apr 25 15:47 m87-blackhole
drwxr-x---  5 s3-backupuser  mysql  5 Apr 25 15:46 mywushublog
drwxr-x---  5 s3-backupuser  mysql  5 Apr 25 15:47 willowoak

Next was the s3-backups.sh script, which is very crude and simple. If I’m really motivated, I’ll make it nicer but I’m lazy and if I don’t need anymore functionality then I’ll just leave it. One thing I initially forgot was that I set my Amazon S3 variables in the users .cshrc profile. This is not a good place to have those things, it was just handy as I was running the duplicity commands manually. So I had to add those in, otherwise the cron job would fail.

~/bin/s3-backups.sh:

#!/bin/sh
PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/games:/usr/local/sbin:/usr/local/bin:/home/s3/bin
 
# Amazon S3 keys, and GnuPG keys
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
PASSPHRASE=
export AWS_ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY
export PASSPHRASE
 
echo "*************************************************"
echo "*   Backing up Website content....              *"
echo "*                                               *"
echo "*     www.willowoakboarding.com...              *"
duplicity /www/www.willowoakboarding.com s3+http://s3.amazon.com/willowoak/www
echo "*     www.mywushublog.com...                    *"
duplicity /www/www.mywushublog.com s3+http://s3.amazon.com/mywushublog/www
echo "*     www.m87-blackhole.org...                  *"
duplicity /www/www.m87-blackhole.org s3+http://s3.amazon.com/m87-blackhole/www
echo "*************************************************"
echo "*   Backing up databases....                    *"
echo "*                                               *"
echo "*     www.willowoakboard.com...                 *"
duplicity /u01/backups/willowoak s3+http://s3.amazon.com/willowoak/db
echo "*     www.mywushublog.com...                    *"
duplicity /u01/backups/mywushublog s3+http://s3.amazon.com/mywushublog/db
echo "*     www.m87-blackhole.org...                  *"
duplicity /u01/backups/m87-blackhole s3+http://s3.amazon.com/m87-blackhole/db
echo "*************************************************"

And last but not least, a cronjob to tie it all together:
% crontab -e
@weekly ~/bin/s3-backups.sh
@weekly ~/bin/mywushublog-mysql-backup.sh
@weekly ~/bin/willowoak-mysql-backup.sh
@weekly ~/bin/m87-blackhole-mysql-backup.sh
@weekly ~/bin/evil-genius-network-mysql-backup.sh

I can check the status of a backup by running duplicity with the ‘collection-status‘ flag:
%duplicity collection-status s3+http://s3.amazon.com/mywushublog/db
Last full backup date: Sat Apr 25 15:08:02 2009
Collection Status
-----------------
Connecting with backend: BotoBackend
Archive dir: None
Found 0 backup chains without signatures.
Found a complete backup chain with matching signature chain:
-------------------------
Chain start time: Sat Apr 25 15:08:02 2009
Chain end time: Sat Apr 25 15:08:02 2009
Number of contained backup sets: 1
Total number of contained volumes: 1
Type of backup set:                            Time:      Num volumes:
Full         Sat Apr 25 15:08:02 2009                 1
-------------------------
No orphaned or incomplete backup sets found.

I can also list the files:
%duplicity list-current-files s3+http://s3.amazon.com/mywushublog/db
Last full backup date: Sat Apr 25 15:08:02 2009
Sat Apr 25 15:05:11 2009 .
Sat Apr 25 15:05:10 2009 daily
Sat Apr 25 15:05:10 2009 daily/mywushublog
Sat Apr 25 15:05:10 2009 monthly
Sat Apr 25 15:05:10 2009 weekly
Sat Apr 25 15:05:11 2009 weekly/mywushublog
Sat Apr 25 15:05:11 2009 weekly/mywushublog/mywushublog_week.17.2009-04-25_15h05m.sql.gz

Pretty sweet automated backup process. It is a lot cheaper than tapes or additional disk storage. With S3, I also don’t have to worry about buying additional hardware, the maintenance of a library or tape drive (which is what I had a few years ago, what a headache).

Setting up my own OpenID server

I’ve configured this blog to use my OpenID accounts. I have two (which totally goes against the single identity mindset of OpenID :) )

The second one I just stood up today. I’m always concerned with who has my information, and if I can, I try to keep it all within the realm of my control. Also, the evil genius domain has absolutely no purpose besides a testing ground that I have no problems destroying :)

Using my own OpenID service is attractive, most of all its a fun exercise. Lets go through what I did (so one day I can remember).

The easiest part was finding an OpenID server. A quick google search brought me here:

http://wiki.openid.net/Run_your_own_identity_server

The hard part was deciding which one I should use. I actually tried out 4 of them, phpMyID, Masquerade, DjangoID, and finally, Java OpenID Server. I got three of them running, and in the end I simply settled on JOS. For now. I had a lot of fun building a MCV app in both Ruby on Rails and Django. I’ve been on a MCV kick, as a month ago I got pretty excited about Ruby on Rails. The big part where I shy away from Django or RoR is integrating things into Apache. With Java, I have Tomcat, and I’ve used it before so I have an immediate comfort level with it. I did have to ask Chris for a little bit of help when it came to the mod_jk stuff.

First thing was to go over the JOS documentation. I knew I would need the following:

  • Java App Server – I decided to use Apache’s Tomcat 6.0
  • A database – PostgreSQL 8.3
  • JCalendar, this was simple, as the readme pointed me to one

Over the years, I’ve always used MySQL. It’s simple, light, and all the new fancy “Web 2.0″ site use it. I’m considering making the switch to PostgreSQL for two reasons. 1) Sun seems to be mishandling the QA and release engineering of MySQL . 2) Recent benchmarks with FreeBSD 7.1 and PostgreSQL have been phenomenally good, and even though I’m not running a big site with millions of visitors I do like to keep up with whats current and performs well.

Second, I built the required applications and enabled both tomcat and PostgreSQL in /etc/rc.conf. Think of rc.conf as a simple text-based chkconfig, except with rc.conf, you can specify additional command arguments, profile environment, and anything else the application might support. I like the ease of the chkconfig/service system works in Linux, but FreeBSD’s run command (rc) system is very flexible and easier to tune.

> sudo su -
$ cd /usr/ports/databases/postgresql83-server
$ make install
$ echo 'postgresql_enable="YES"' >> /etc/rc.conf
$ cd /usr/ports/www/tomcat6
$ make install
$ echo 'tomcat60_enable="YES"' >> /etc/rc.conf
$ cd /usr/ports/databases/postgresql-jdbc
$ make install

I could have simply added pre-built packages with “pkg_add -r tomcat6 postgresql83-server postgresql-jdbc” but I like seeing what compile time options are available, and then setting those. Hurray for the flexibility of FreeBSD!

One thing that you have to do with PostgreSQL (that you don’t have to do with MySQL) is initialize the database/config:

$ initdb /usr/local/pgsql/data
$ su - pgsql
> createdb jos-openid
> makepasswd --chars=13
 a nice 13 character random string
> createuser josuser -P
> psql jos-openid 

Welcome to psql 8.3.6, the PostgreSQL interactive terminal. Type:
        \copyright for distribution terms
        \h for help with SQL commands
        \? for help with psql commands
        \g or terminate with semicolon to execute query
        \q to quit

jos-openid=# select * from pg_user;
  usename | usesysid | usecreatedb | usesuper | usecatupd |  passwd  | valuntil | useconfig
 ---------+----------+-------------+----------+-----------+----------+----------+-----------
  pgsql   |       10 | t           | t        | t         | ******** |          |
  josuser |    16386 | t           | f        | f         | ******** |          |
(2 rows)
jos-openid=#

Next, I had to unpack the war file and modify the jdbc.properties to use PostgreSQL

jar -xvf jos-webapp-1.2.0.war .
...
jar -cvf /usr/local/tomcat6/webapps/ROOT.war .

Yeah, after configuring the app and zipping it back up, I called it ROOT, it was a lot easier this way. I didn’t want to manage multiple java apps at this point. I can be a very lazy admin :)

After starting both Tomcat and PostgreSQL up, I now had a working web app running on my server at port 8180. The last part is to mount the java application inside of apache. For that, I needed to install mod_jk:

$ cd /usr/ports/www/mod_jk
$ make install

Thats the easy part of installing mod_jk, the next parts are the worker.properties file, modifying httpd.conf, and then modifying my virtualhost configuration for the domain evil-genius-network.com. I also added a record for openid.evil-genius-network.com. So, in that order, this is what I did:

/usr/local/etc/apache2/worker.properties:

workers.tomcat_home=/usr/local/apache-tomcat6.0
workers.java_home=/usr/local/jdk1.6.0
ps=/
worker.list=localhost
worker.tomcat.type=lb
#worker.tomcat.balanced_workers=localhost
#worker.loadbalancer.local_worker_only=0
worker.localhost.port=8009
worker.localhost.host=localhost
worker.localhost.type=ajp13
worker.localhost.lbfactor=1

/usr/local/etc/apache2/httpd.conf:

LoadModule jk_module libexec/apache22/mod_jk.so
# mod_jk
JkWorkersFile /usr/local/etc/apache22/workers.properties
JkLogFile  /var/log/jk.log
JkShmFile  /var/log/jk-runtime-status
JkLogLevel error

/usr/local/etc/apache2/virtualhosts/evil-genius-network.com (in the openid.evil-genius-network.com VirtualHost section):

JkMount /* localhost

Then, I restarted apache:

$ /usr/local/etc/rc.d/apache2 restart

Now, have my own little OpenID server running at http://openid.evil-genius-network.com/

BTW, I had to re-edit EVERY pre section of this page about 6 times, that was the least-fun part of all of this.