The solution described in this article, can be used in two ways: to write a backup of data files from a workstation computer to a central Linux server, or to write backups from one server to another server. If applied on all servers in your network, it even gives you a cheap and efficient corporate back-up strategy that would work very well in small and medium-sized companies.
Linux system requirements
To apply this back-up procedure you will need an external drive. You could write to an external USB-disk, but that is not very efficient. Therefore, I'll assume that you write your backup to a server somewhere in the network. This server needs to be a Linux server (any Linux distribution will do) with SSH and rsync.
Apart from a back-up medium, you need rsync. This versatile synchronization utility is a default component on all Linux distributions. As its name suggests, it helps you in remote synchronization of files. But, it does not perform incremental or differential backups. This works when you want your data to be stored elsewhere in case your local hard drive crashes. You could tune the procedure to make incremental and differential back-ups as well, but this is not the purpose
Now the basics of our back-up solution is easy to understand. Rsync needs to contact the remote server and write data to that server. To do that, the default mechanism that rsync uses, is SSH. You could modify that, but SSH is available on nearly all Linux boxes so you should just use it. The following command gives an example of the rsync command:
rsync -rau --progress /data 192.168.1.98:/data
With this command, rsync begins to synchronize the contents of the local directory /data with the directory /data on the server 192.168.1.98. The connection to the server is established with the current user account. That means, if you're linda on the local workstation, you'll connect as linda on the remote machine and if you're root, you'll connect as root. Just make sure that you have enough permissions to read the source directory and write to the target directory on the server. Next, a few options are used with the rsync command. The
--progress option shows progress of the rsync command and the
-rau options just make sure that everything is synchronized, including the metadata on the files.
After issuing this command, the content of your local directory /data is synchronized with the /data directory on your server. But, the problem with this command is that you have to enter it manually, which means that you're likely to forget. So we need to automate it by creating a cron job. The problem with cron is that to make the connection with the server, the SSH daemon on the server is contacted and this SSH daemon asks for a password. The alternative is that you can configure SSH with public-private keys to make the process automatic.
Configuring automatic SSH login
The idea of working with public/private-keys is that on your workstation, you create a key-pair. This gives you a public and a private key. You next copy the public key to the ~/.ssh/authorized_keys file on your server (~ refers to the home directory of the current user). The next time you start an SSH command from your workstation, it will automatically try to connect with your public/private key pair first. The workstation generates an encrypted package by using the private key, and if the server can decrypt that, it is 100% sure that it really is you, and you'll be authenticated without entering a password. To create this configuration, you should execute the following procedure.
- On the workstation, use
ssh-keygen -t dsaand just press Enter to accept all the default answers that are provided. This gives you a file with the name ~/.ssh/id_dsa, which is your private key, and ~/.ssh/id_dsa.pub which is your public key.
- Now use
ssh-copy-idto copy the public key to your home directory on the server. The following command helps you in doing that:
ssh-copy-id -i ~/.ssh/id_dsa.pub 192.168.1.98
By using this command, the file .ssh/authorized_keys is created on the server and it allows you to log in with your public/private key pair
After following these steps you can now log in to the remote server by using the following command:
You'll see that at this point you can log in without entering a password.
Scheduling the back-up job using cron
Now that you know what command you have to use, and SSH is set up to log you in automatically, you need to tell your computer to synchronize your data automatically daily. To help you do that, you can use cron on your workstation, another default component that is used on all Linux distributions. To create a cron job for your current user account, you can use
crontab –e. This opens the crontab editor, which is either vi or joe. From the editor, type the following command:
0 10 * * * rsync -rau /data 192.168.1.98:/data
As you can see, in the crontab file you're putting the same rsync command that we've used before, with one exception: the
--progress option is omitted. This is because cron runs as a background job that is not attached to any terminal on your computer, so there is no way that it can show you progress.
Before the actual command itself, you need to tell cron when it should execute this command. To do this, in the example line I've used
0 10 * * *. In cron, you'll use five positions to indicate the date and time when a job needs to be executed. By using this command, the job runs every day at 10 AM. Don't forget to use the 0 in the first position to specify the exact minute the job should run, if you do forget it, you'll see that the job runs every minute from 10.00 until 10.59!
Here, we've walked through how to set up a basic but efficient back-up procedure. While there are many other solutions available, you won't find many that are simple and efficient like this one. And it's always more efficient solution than the most common back-up procedure employed by one-man companies and home users: nothing at all.
Was this helpful? Please email Leah Rosin, Site Editor.
ABOUT THE AUTHOR: Sander van Vugt is an author and independent technical trainer, specializing in Linux since 1994. Vugt is also a technical consultant for high-availability (HA) clustering and performance optimization, as well as an expert on SLED 10 administration.
This was first published in March 2010