Software

As mentioned many times, this is a FreeBSD based environment. Some good sysinfo output below:

Operating system release: FreeBSD 8.2-RELEASE
OS architecture: amd64
Kernel build dir location: /usr/obj/usr/src/sys/GENERIC
Currently booted kernel: /boot/kernel/kernel

Currently loaded kernel modules (kldstat(8)):
zfs.ko
opensolaris.ko

Bootloader settings for the Director/Database node:

The /boot/loader.conf has the following contents:
kern.ipc.semmni=1024
kern.ipc.semmns=2048
kern.ipc.semmnu=1024

All of the storage nodes and the director are running a GENERIC kernel with very few system tweaking. One of the storage nodes has a Chelsio 10Gb controller, but that hasn’t had a high enough load to crack the 1Gb/sec barrier.

I’m using Bacula from the ports tree, and the directory has a special Make flag to build with gcc’s debugging symbols. Jenny worked on getting that setup when we were having some stability issues.

The Bacula configuration one the director node is backed by a git repository. It adds a little bit of complexity for a systems administrator, when they want to add a client, but the benefit is clear. This backup project actually enforces change control and tracks all of the commits by who.

I’ve also setup Redmine as a project front-end, and I’ve begun to file tickets and reference what commit fixed what. This not only tracks my progress, but it is the first time I’ve had a backup server that was clearly documented and had some type of accountability.

A snippet of the Redmine site

The Structure

I’ve compared projects like bacula to a large box of LegosTM. It doesn’t enforce a structure by any means, and I’ve taken it upon myself to add meaning to the otherwise flat and incomprehensible bacula-dir.conf

The Bacula Port on FreeBSD installs all configuration files in /usr/local/etc.

Write, the Director, only contains the following in /usr/local/etc/bacula-dir.conf:

@/usr/local/etc/bacula/bacula-dir.conf
@/usr/local/etc/bacula/storage.conf
@/usr/local/etc/bacula/clients.conf
@/usr/local/etc/bacula/messages.conf
@/usr/local/etc/bacula/schedules.conf
@/usr/local/etc/bacula/pools.conf

As you can see, I place everything in etc/bacula/.

Here is a beautiful output of tree(1):

bacula
|-- bacula-dir.conf
|-- bin
|   |-- create_client.sh
|   `-- package_list.sh
|-- clients.conf
|-- clients.d
|   |-- 10am
|   |-- 10pm
|   |-- 11pm
|   |-- 12am
|   |-- 1am
|   |-- 2am
|   |-- 3am
|   |-- 4am
|   |-- 4pm
|   |-- 5am
|   |-- 5pm
|   |-- 6am
|   |-- 6pm
|   |-- 7am
|   |-- 7pm
|   |-- 8am
|   |-- 8pm
|   |-- 9am
|   |-- 9pm
|   |-- TEMPLATE-mac
|   |-- TEMPLATE-unix
|   `-- TEMPLATE-win32
|-- excludes.d
|   |-- common.conf
|   |-- mac.conf
|   |-- unix.conf
|   `-- win32.conf
|-- messages.conf
|-- pools.conf
|-- schedules.conf
|-- storage.conf
`-- storage.d
    |-- write-01.conf
    |-- write-02.conf
    |-- write-03.conf
    |-- write-04.conf
    |-- write-05.conf
    `-- write-06.conf

Storage Nodes

All of the storage nodes are using ZFS as the filesystem/Volume manager.

write-06# zpool list
NAME         SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
filevol001  90.6T  33.3T  57.3T    36%  ONLINE  -

They all have one volume, /filevol001, and I created 512 “drives” within that volume. Effectivly, each storage node has 512 drives, and clients are randomly assigned a drive.

Since I have 6 storage nodes, I wrote a little shell script to handle the directory creation:

#!/bin/bash
i=1
 
while [ $i -le 512 ]
do
        install -d -o bacula -g bacula -m 770 /filevol001/drive$i
        ((i++))
done

Simple, right? I also wrote a script to generate the bacula-sd.conf file on a storage node as well:

#!/bin/bash
 
usage()
{
cat << EOF
    Usage $0 NUMBER > /usr/local/etc/bacula-sd.conf
 
    Where "NUMBER" is just a single digit indicating which storage node this is.
 
    Example, for write-07:
    $ make_sd.sh 7 > /usr/local/etc/bacula-sd.conf
EOF
}
 
i=1
 
if [[ -z $1 ]]
then
    usage
    exit
fi
 
printf "Storage {\n"
printf "\tName = write-0$1.llnl.gov-sd\n"
printf "\tSDAddress = write-0$1.llnl.gov\n"
printf "\tSDPort = 9103\n"
printf "\tWorkingDirectory = \"/var/db/bacula\"\n"
printf "\tPid Directory = \"/var/run\"\n"
printf "\tMaximum Concurrent Jobs = 516\n"
printf "}\n"
 
printf "#\n"
printf "# List Directors who are permitted to contact Storage daemon\n"
printf "#\n"
printf "Director {\n"
printf "\tName = write.llnl.gov-dir\n"
printf "\tPassword = \"ItsASecret\"\n"
printf "}\n"
 
printf "#\n"
printf "# Restricted Director, used by tray-monitor to get the\n"
printf "#   status of the storage daemon\n"
printf "#\n"
printf "Director {\n"
printf "\tName = write.llnl.gov-mon\n"
printf "\tPassword = \"ItsANotherSecret\"\n"
printf "\tMonitor = yes\n"
printf "}\n"
 
printf "Messages {\n"
printf "\tName = Standard\n"
printf "\tdirector = write.llnl.gov-dir = all\n"
printf "}\n"
 
 
printf "Device {\n"
printf "\tName = W0$1FileStorage\n"
printf "\tMedia Type = File\n"
printf "\tArchive Device = /filevol001\n"
printf "\tLabelMedia = yes;\n"
printf "\tRandom Access = Yes;\n"
printf "\tAutomaticMount = yes;\n"
printf "\tRemovableMedia = no;\n"
printf "\tAlwaysOpen = no;\n"
printf "\tMaximum Concurrent Jobs = 2\n"
printf "}\n"
 
while [ $i -le 512 ]
do
        printf "\n"
        printf "Device {\n"
        printf "\tName = W0$1FileStorageD$i\n"
        printf "\tMedia Type = File\n"
        printf "\tArchive Device = /filevol001/drive$i\n"
        printf "\tLabelMedia = yes;\n"
        printf "\tRandom Access = Yes;\n"
        printf "\tAutomaticMount = yes;\n"
        printf "\tRemovableMedia = no;\n"
        printf "\tAlwaysOpen = no;\n"
        printf "\tMaximum Concurrent Jobs = 2\n"
        printf "}\n"
        ((i++))
done

On the Directory, a storage node definition is saved in /usr/local/etc/bacula/storage.d/write-0{N}.conf, which is included in /usr/local/etc/bacula/storage.conf:

@/usr/local/etc/bacula/storage.d/write-01.conf
@/usr/local/etc/bacula/storage.d/write-02.conf
@/usr/local/etc/bacula/storage.d/write-03.conf
@/usr/local/etc/bacula/storage.d/write-04.conf
@/usr/local/etc/bacula/storage.d/write-05.conf
@/usr/local/etc/bacula/storage.d/write-06.conf

Client Generation

There are two components, the TEMPLATE file (there are three, TEMPLATE-unix, TEMPLATE-win32 and TEMPATE-mac) and the shell script.

The Client TEMPLATE File

Here is what one of the TEMPLATE files looks like:

#
# Client Definition, the Password here must match
#  the clients bacula-fd.conf Client definition.
#
# Using Vi/m, you can easily replaced HOSTNAME with
#  the short hostname of the client with:
#  %s/HOSTNAME/yourhostname/
#
#

Client {
    Name = HOSTNAME.llnl.gov
    Address = HOSTNAME.llnl.gov
    FDPort = 9102
    Catalog = Catalog001
    Password = "ItsASecret"
    File Retention = 40 days
    Job Retention = 1 months
    AutoPrune = yes
    Maximum Concurrent Jobs = 10
    Heartbeat Interval = 300
}

Console {
    Name = HOSTNAME.llnl.gov-acl
    Password = ItsASecret
    JobACL = "HOSTNAME.llnl.gov RestoreFiles", "HOSTNAME.llnl.gov"
    ScheduleACL = *all*
    ClientACL = HOSTNAME.llnl.gov
    FileSetACL = "HOSTNAME.llnl.gov FileSet"
    CatalogACL = Catalog001
    CommandACL = *all*
    StorageACL = *all*
    PoolACL = HOSTNAME.llnl.gov-File
}

Job {
    Name = "HOSTNAME.llnl.gov"
    Type = Backup
    Level = Incremental
    FileSet = "HOSTNAME.llnl.gov FileSet"
    Client = "HOSTNAME.llnl.gov"
    Storage = FileStorageD##
    Pool = HOSTNAME.llnl.gov-File
    Schedule = "@@"
    Messages = Standard
    Priority = 10
    Write Bootstrap = "/var/db/bacula/%c.bsr"
    Maximum Concurrent Jobs = 10
    Reschedule On Error = yes
    Reschedule Interval = 1 hour
    Reschedule Times = 1
    Max Wait Time = 30 minutes
    Cancel Lower Level Duplicates = yes
    Allow Duplicate Jobs = no
    RunScript {
        RunsWhen = Before
        FailJobOnError = no
        Command = "/etc/scripts/package_list.sh"
        RunsOnClient = yes
    }
}

Pool {
    Name = HOSTNAME.llnl.gov-File
    Pool Type = Backup
    Recycle = yes
    AutoPrune = yes
    Volume Retention = 1 months
    Maximum Volume Bytes = 10G
    Maximum Volumes = 100
    LabelFormat = "HOSTNAME.llnl.govFileVol"
    Maximum Volume Jobs = 5
}

Job {
    Name = "HOSTNAME.llnl.gov RestoreFiles"
    Type = Restore
    Client= HOSTNAME.llnl.gov
    FileSet="HOSTNAME.llnl.gov FileSet"
    Storage = FileStorageD##
    Pool = HOSTNAME.llnl.gov-File
    Messages = Standard
    #Where = /tmp/bacula-restores
}

FileSet {
    Name = "HOSTNAME.llnl.gov FileSet"
    Include {
        Options {
            signature = MD5
            compression = GZIP6
                        fstype = ext2
                        fstype = xfs
                        fstype = jfs
                        fstype = ufs
                        fstype = zfs
                        onefs = no
                        Exclude = yes
                        @/usr/local/etc/bacula/excludes.d/common.conf
        }
                File = /
                File = /usr/local
                Exclude Dir Containing = .excludeme
    }
    Exclude {
        @/usr/local/etc/bacula/excludes.d/unix.conf
    }
}

The Create Client Script

So here is what really makes creating clients easy for us, the create_client script.

I didn’t want to do it this way, really, so part of me is very ashamed of this tool. I would have preferred to re-write this in Python, or make a web page out of it, and let admins create clients from their desktop. Or, I would have loved to create a puppet module to handle this automagically (but that would exlcude everything that *isn’t* running Puppet, which is huge).

With that disclaimer, here is my create_client shell script:

#!/usr/bin/env bash
# usage: cclient -t unix -s 12am -h hostname
#
 
umask 022
 
# Variables
## Randomize Schedule
SCHEDULES="4pm 5pm 6pm 7pm 8pm 9pm 10pm 11pm 12am 1am 2am 3am 4am 5am 6am 7am 8am 9am 10am"
s=($SCHEDULES)
num_s=${#s[*]}
RAND_SCHED=${s[$((RANDOM%num_s))]}
# Randomize which storage node we use
NODES="write-06 write-01 write-06 write-01 write-02 write-03 write-04 write-05"
n=($NODES)
num_n=${#n[*]}
RAND_NODE=${n[$((RANDOM%num_n))]}
 
export DRIVE=`jot -r 1 1 512`
export BDIR="/usr/local/etc/bacula"
export TYPE="unix"
export SCHEDULE=$RAND_SCHED
export HOSTNAME=""
export STORAGE_NODE=$RAND_NODE
export GIT_DIR="/usr/local/etc/bacula/.git"
export CLASS="desktop"
 
if [ $(whoami) == "root" ]
then
cat << EOF
                Please do not run this as root. This script runs a
                git add/commit, which is how changes are managed and
                tracked. If you run this as root, then it shows up
                as carlson39 or root.
 
                If you encounter a problem with your normal OUN account,
                please contact Mike Carlson, or submit a bug here:
                https://st-scm.llnl.gov/redmine/snt/projects/bacula/issues/new
EOF
 
exit 1
fi
 
usage()
{
cat << EOF
 
        Usage: $0 [OPTION]... -h HOSTNAME
 
        This script will generate a bacula client definition.
 
        OPTIONS:
        -s      schedule, (4pm|5pm|6pm|7pm|8pm|9pm|10pm|11pm|12am|1am|2am|3am|4am|5am|6am|7am|8am|9am). The default schedule is random.
        -t      type, (unix|win32|mac), unix is the default
        -n      storage node (write-01|write-02|...), the default is random.
        -h      hostname (use the short hostname)
EOF
}
 
cd $BDIR
 
while getopts 'c:t:s:n:h:' OPTION
do
        case $OPTION in
                c)
                        CLASS=$OPTARG
                        ;;
                t)
                        TYPE=$OPTARG
                        ;;
                s)
                        SCHEDULE=$OPTARG
                        ;;
                h)
                        HOSTNAME=$OPTARG
                        echo $HOSTNAME | egrep -q "(llnl.gov|ucllnl.org)"
                        if [ $? -eq 0 ]
                        then
                        HOSTNAME=`echo $HOSTNAME|sed -e 's/.llnl.gov//' -e 's/.ucllnl.org//'`
                        fi
 
                        ;;
                n)
                        STORAGE_NODE=$OPTARG
                        ;;
                ?)
                        usage
                        exit
                        ;;
        esac
done
 
if [[ -z $CLASS ]] || [[ -z $TYPE ]] || [[ -z $SCHEDULE ]] || [[ -z $HOSTNAME ]] || [[ -z $STORAGE_NODE ]]
then
        usage
        exit 1
fi
 
grep -w $HOSTNAME $BDIR/clients.conf
if [ $? -eq 0 ]
then
        echo 'client '$HOSTNAME 'already exists...'
else
        export RETRY_COUNT="2"
 
        if [ $STORAGE_NODE == "write-01" ]
        then
                DRIVE=`jot -r 1 33 512`
                sed -e 's/HOSTNAME/'$HOSTNAME'/g' -e 's/FileStorageD##/FileStorageD'$DRIVE'/' -e 's/\@\@/'$SCHEDULE'/' -e 's/RETRY_COUNT/'$RETRY_COUNT'/g' $BDIR/clients.d/TEMPLATE-$TYPE > $BDIR/clients.d/$SCHEDULE/$HOSTNAME.conf
                echo \@$BDIR/clients.d/$SCHEDULE/$HOSTNAME.conf >> $BDIR/clients.conf
        else
                export SN=`echo $STORAGE_NODE | cut -c 7-8`
                sed -e 's/HOSTNAME/'$HOSTNAME'/g' -e 's/FileStorageD##/W'$SN'FileStorageD'$DRIVE'/' -e 's/\@\@/'$SCHEDULE'/' -e 's/RETRY_COUNT/'$RETRY_COUNT'/g' $BDIR/clients.d/TEMPLATE-$TYPE > $BDIR/clients.d/$SCHEDULE/$HOSTNAME.conf
                echo \@$BDIR/clients.d/$SCHEDULE/$HOSTNAME.conf >> $BDIR/clients.conf
        fi
 
        chgrp st-bacula-admins $BDIR/clients.d/$SCHEDULE/$HOSTNAME.conf
        git add $BDIR/clients.d/$SCHEDULE/$HOSTNAME.conf $BDIR/clients.conf
        git commit
        echo 'created client definition: '$BDIR/clients.d/$SCHEDULE/$HOSTNAME.conf
        echo 'for '$HOSTNAME'.llnl.gov'
fi

This is always a work in progress, but at the core, it is a simple sed wrapper with a lot of randomization and a git commit.

Why all the randomization?

Because I had to add around 1000 clients in a VERY short amount of time. We didn’t have a problem pushing the Bacula client to all of the platforms, nor the bacula-fd.conf file either. What I could not do was spend the time to create and manage all of the resources for each client. That is why I have so many devices/drives, so I can attempt to have a 1:1 without having to actually think about it.

So, I wrote ANOTHER script to wrap around this one when I need to do bulk client creations. I’m not going to post that, it just loops through the above command.

Pre-Job command – Package List

I only do this on the Unix/Linux clients, and I thought it was a cool idea.

Yeah, I will pat myself on the back a little bit for that :)

I exclude the Operating System from backups for two reasons, 1) to reduce backing up duplicate and reproducible data and 2) Our build/Imaging process is so quick and clean it is just faster to rebuild than restore everything.

Still, I needed a way to keep the state of installed packages/software.

This is where the pre-job command comes in handy. This part right here:

    RunScript {
        RunsWhen = Before
        FailJobOnError = no
        Command = "/etc/scripts/package_list.sh"
        RunsOnClient = yes
    }

That package_list.sh file looks like this:

 
#!/usr/bin/env bash
 
export PLIST="/root/plist.txt"
 
case "`uname -s`" in
Linux)
                if [ -x /usr/bin/lsb_release ]; then
                        DIST=`lsb_release -d`
                fi
 
                # RHEL
                if [ -x /usr/bin/up2date ]; then
                        rpm -qa > $PLIST
                fi
 
                # RHEL 5
                if [ -x /usr/bin/yum ]; then
                        if [ -f /var/run/yum.pid ]; then
                                echo "Yum currently in use, exiting gracefully..."
                                exit 0
                        else
                        /usr/bin/yum list installed | awk '{print $1}' > $PLIST
                        fi
                fi
 
                # Ubuntu
                if [ -x /usr/bin/dpkg ]; then
                        /usr/bin/dpkg --get-selections | awk '{print $1}' > $PLIST
                fi
                ;;
 
FreeBSD)
                pkg_info|awk '{print $1}' > $PLIST
                ;;
SunOS)
                pkginfo |awk '{print $1}' > $PLIST
                ;;
esac

That file, /root/plist.txt, gets backed up.

Now we have a record of what was installed on our Unix platforms :)

That is it for now, see you at Part 3