Overview

Performing routine backups of the data insure the data can be recovered in the event of a failure.This document will provide details regarding backup check procedures to ensure you have a valid data set to recover from if required. This guide cover the snapshot, rsync and CrashPlan backup option used by ABS Online Backup.

For customers concerned about their backup status, Epicor provides an optional backup-monitoring service, which consists of backup engineers reviewing the system status and backup logs in detail, every business day and notifying you of issues if they occur. If you’re interested in this service, please contact your Epicor account manager for more information.

How Eclipse Data Backups Work

The Eclipse database consists of several thousand files constantly changing. In order to get a good successful full backup, all files must be backed up while no one is on the system and no data is changing. In many cases, it is very difficult to get everyone off the system to back it up while there is no activity, so instead, we create a snapshot of the data (which is basically freezing the data at a specific point in time), which allows the live data to continue to be changed while the snapshot data remains the same.

The snapshot feature is part of the Linux operating system. Once we have this “frozen” data we can then utilize backup software (rsync, CrashPlan, or some other method) to backup the data.

Validating a Successful Backup

A good backup on a Linux server would require a successful snapshot of the data as described above, then having the backup software (e.g. rsync, CrashPlan) backup all files successfully before the next snapshot occurs. These are two independent processes that run (snapshot and the backup) which do not know about each other, but must be in sync with each other in order to have a successful backup.

Snapshot is similar to a photograph of the data and directory structure. Snapshots are taken of the Eclipse database and files. The snapshot process only takes a few seconds to run.

Figure 1: Snapshot Process Flow
Figure 1 shows the Snapshot process flow.

The four main aspects of the snapshot to verify are:

Date/Time the snapshot ran
Was the snapshot successful?
Did the snapshot run out of room?
When did the next snapshot occur?

Verify the snapshot date and time and whether or not the snapshot was successful

Open a terminal session (e.g. PuTTY, or shell out of Eterm TCL)
Type: less /tmp/snapsave.log
While viewing the file
- Ctrl-D will scroll down a half a page at a time
- Ctrl-B will scroll up a page at a time
- Shift-G to jump to the bottom of the file
- Arrow keys to scroll up/down.
- :q! will quit
This file contains a history of the last few snapshots that have been run on the system.
- Press Shift-G to jump to the end of the file to display information on the last snapshot to display information on the last snapshot that ran
- You are now looking at the end of the last snapshot.
- Figure 5 shows a successful snapshot.
Verify there is a currently mounted /snap file system shown for each /u2 file system (you may need to scroll up one or two pages to see this info.)
- Note the date.time the snapshot was (i.e.If the snapshot ran at 6pm on 5-14-2011 you will see the snapshot for the u2 file system listed as: /snap/20110514.1800/u2. See Figure 2)

Figure 2: Verification of Snapshot files and date

If you would like to see more details on the snapshot being created, scroll up (CTRL-B) and you will see a section in the log give complete details on each snapshot created. For example:

Figure 3: Snapshot Log

Enter :q! to quit out of viewing the snapshot log

Did the snapshot run out of room?

If the snapshot runs out of room before the backup completes, then data will be incomplete.
At the command prompt prompt, enter in lvs to display the current snapshot status. L
- ook at the Snap% column to verify none of the snapshots have filled
- If any snapshot is at 100% then the data has changed in the snapshot and is no longer valid which means the backup copy is invalid.
NOTE, this lvs info is recorded in the snapshot log file too and is displayed at the beginning of the file before a new snapshot

Figure 4: Yesterday’s Snapshot Space

When does the next snapshot occur?

Snapshots are created when the snapsave_linux.sh script is run. The script is set to run at a predetermined time, which is placed in the root crontab. By default, the next snapshot will occur 24 hours from the previous snapshot. Do not continually run the script thinking that more snapshots means more data preserved for backup.

Verifying Rsync Backups

The /tmp/snapsave.log contains the output from the last successful rsync backup.

Open a command prompt
View the log: less /tmp/snapsave.log

The log for a successful rsync backup will show output similar to the following:

Copy done - status=0

Number of files: 2248894
Number of files transferred: 6519
Total file size: 181.30G bytes
Total transferred file size: 135.72G bytes
Literal data: 135.72G bytes
Matched data: 0 bytes
File list size: 82.91M
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 135.82G
Total bytes received: 152.13K

sent 135.82G bytes  received 152.13K bytes  14.44M bytes/sec
total size is 181.30G  speedup is 1.33

The /tmp/snapsave.rsync-local.log contains the files that were backed up or updated by local rsync backup operation (e.g. RD1000, NAS).

The following log file shows a successful backup and details the database and snapshot operations taking place. This is very useful for troubleshooting backup issues.

--------------------------------------------------------------------------------
Tue Sep  7 21:00:05 CDT 2010: Current snapshot status
Snapshots for /u2
Current  Location          512-blocks        Free Time
*        /dev/fslv00           163840       50432 Mon Sep  6 21:01:32 CDT 2010
Snapshots for /u2/eclipse
Current  Location          512-blocks        Free Time
*        /dev/fslv01          2818048     1512448 Mon Sep  6 21:01:39 CDT 2010
Snapshots for /u2/eclipse/ereports
Current  Location          512-blocks        Free Time
*        /dev/fslv02            32768       32000 Mon Sep  6 21:01:44 CDT 2010
Snapshots for /u2/pdw
Current  Location          512-blocks        Free Time
*        /dev/fslv03          1835008     1833984 Mon Sep  6 21:01:48 CDT 2010
--------------------------------------------------------------------------------
Tue Sep  7 21:00:16 CDT 2010: Releasing and unmounting previous snapshots
Tue Sep  7 21:00:18 CDT 2010: Unmounting /snap/u2/pdw
Tue Sep  7 21:00:23 CDT 2010: Removing snapshot(s) of /u2/pdw
rmlv: Logical volume fslv03 is removed.
Tue Sep  7 21:00:31 CDT 2010: Unmounting /snap/u2/eclipse/ereports
Tue Sep  7 21:00:32 CDT 2010: Removing snapshot(s) of /u2/eclipse/ereports
rmlv: Logical volume fslv02 is removed.
Tue Sep  7 21:00:39 CDT 2010: Unmounting /snap/u2/eclipse
Tue Sep  7 21:00:40 CDT 2010: Removing snapshot(s) of /u2/eclipse
rmlv: Logical volume fslv01 is removed.
Tue Sep  7 21:00:47 CDT 2010: Unmounting /snap/u2
Tue Sep  7 21:00:47 CDT 2010: Removing snapshot(s) of /u2
rmlv: Logical volume fslv00 is removed.
--------------------------------------------------------------------------------
Tue Sep  7 21:00:53 CDT 2010: Suspending database
--------------------------------------------------------------------------------
Tue Sep  7 21:00:59 CDT 2010: Performing snapshots:
Tue Sep  7 21:00:59 CDT 2010: Taking snapshot of /u2
Snapshot for file system /u2 created on /dev/fslv00
Tue Sep  7 21:01:04 CDT 2010: Taking snapshot of /u2/eclipse
Snapshot for file system /u2/eclipse created on /dev/fslv01
Tue Sep  7 21:01:11 CDT 2010: Taking snapshot of /u2/eclipse/ereports
Snapshot for file system /u2/eclipse/ereports created on /dev/fslv02
Tue Sep  7 21:01:16 CDT 2010: Taking snapshot of /u2/pdw
Snapshot for file system /u2/pdw created on /dev/fslv03
--------------------------------------------------------------------------------
Tue Sep  7 21:01:21 CDT 2010: Database suspend released.
--------------------------------------------------------------------------------
Tue Sep  7 21:01:21 CDT 2010: Mounting snapshot filesystems
Tue Sep  7 21:01:23 CDT 2010: Mounting snapshot: /snap/u2
Tue Sep  7 21:01:25 CDT 2010: Mounting snapshot: /snap/u2/eclipse
Tue Sep  7 21:01:27 CDT 2010: Mounting snapshot: /snap/u2/eclipse/ereports
Tue Sep  7 21:01:30 CDT 2010: Mounting snapshot: /snap/u2/pdw
rmt0 changed
Tue Sep  7 21:02:05 CDT 2010: Starting backup from /snap
--------------------------------------------------------------------------------
Tue Sep  7 22:40:15 CDT 2010: Mailing backup report
--------------------------------------------------------------------------------

Verifying CrashPlan Backups

(For ABS Online Backup customers only.)

CrashPlan Pro is the software used to backup the snapshot and other files on the server. The software provides enough granularity to enable backups of files during a defined scheduled period. Typically, CrashPlan is scheduled to run between 12 and 5 hours dependent upon the system’s availability as indicated by the customer.

CrashPlan uses a checksum method to determine if a file has been changed. The checksum is run between files obtained in the last backup and current files on the system. CrashPlan can be pre-set to use a specific percentage of resources during the backup phase; however, the less resources committed to the backup means a longer timeframe for the backup.

When the file size is too large for the scheduled time the backups will occur, files will not be backed up (not good).

Figure 5: Basic CrashPlan Pro Backup
Figure 5 shows the basics behind CrashPlan performing a backup. Remember, the Snapshot will stop and restart CrashPlan so we can backup the current Snapshot. CrashPlan only backs up new versions of the file and will ignore file duplicates.

In order to determine if CrashPlan has performed the backup, there is a log file provided under /usr/local/crashplan/log/history.log.0. Figure 6 shows what a good backup looks like. To view the file enter the command vim –R /usr/local/crashplan/log/history.log.0

Figure 6: Good CrashPlan Pro Backup

The timeframe or window for the CrashPlan backup is 18:10 (6:10 PM) to 06:00 (6 AM). The first line shows that CrashPlan service has started at 6:00PM which would have resulted from the snapshot process starting up CrashPlan after it takes a snapshot of the database. You are interested in the backup performed within the backup schedule.

Figure 7 shows that CrashPlan found 72GB to backup and at 12:52 AM CrashPlan completed transmission of the backup in two scans to the Online Backup Server.

Figure 7

Figure 8 shows CrashPlan performs one more scan of the file system at 5 AM. This is known as the “verify backup file selection time”. Notice CrashPlan found 72.01 GB at 05:08 AM. The files were not transmitted because they contained the same data as the 12:52 AM scan that was transmitted. You always want to see results similar to Figure 6 with your backups.

Figure 8: CrashPlan Final Scan

Figure 9 shows a typical error when CrashPlan performs a scan and wants to transmit a backup to the online or local media when it is not accessible. If the media is local, check to ensure the device is on-line with a ‘df –h` command (in TCL). If the device is not showing, mount the device to the server.

Figure 9: CrashPlan Pro unable to connect to device

Summary

Verifying your snapshot and backup logs are important. Keep these points in mind while reviewing your data backups:

Make sure you have both an on- and off-site backup. Multiple backups are important.
Make sure the backup window allows enough time for all of the backups to complete.
Make sure your backup devices have enough room to complete the backups, and watch the logs for warnings about space.
Make sure the snapshots are mounted and the size matches the active filesystems.
Make sure there are no warnings or errors in the snapshot log.
Make sure there are no warnings or errors in the rsync log.
(ABS Online Backup) Make sure there are no warnings or errors in the CrashPlan log.
Make sure the check the backup log after each backup (i.e. if the backup runs every night, check the logs every morning)

If you are having problems determining if there is a problem with the backups, seek knowledgeable help. Also, Epicor provides an optional backup-monitoring service, which consists of backup engineers reviewing the system status and backup logs in detail, every business day and notifying you of issues if they occur. If you’re interested in this service, please contact your Epicor account manager for more information.

Backup Checklist

Date:

Backup Window:
Start Time:
Finish Time:

Did snapshot run?
Time Completed:

Any errors in the /tmp/snapsave.log?

Did all of the /snap directories remount?

Did the Snapshot run out of room?

When will the next Snapshot occur?

Did CrashPlan Pro run during the scheduled times?

Were the number of files and size of the scan backed up?

Did the final verification of the backup window complete?

Did the backup complete at the scheduled finish time?

Did the backup complete before the next snapshot occurred?

(ABS Online Backup) Did you notice any errors in the /usr/local/crashplan/log/history.log.0 file?

If your backup destination runs out of space then backups to that destination will simply stop. CrashPlan will not automatically delete backed up data in order to make room at the destination.

In order to resolve the situation you’ll need to perform one of the following actions:

Option 1: Add more disk space to the destination
Option 2: Select fewer files to back up or change the retention settings
Option 3: Run Compact on the destination to prune any data that shouldn’t be backed up anymore

Option 1: Add Storage

If you are backing up to a local folder, consider increasing the amount of available space in the destination filesystem.

If you are backing up to an external drive, consider replacing the drive with a larger device. You could also consider rotating drives.

If you are backing up to an external network device, consider allocating additional storage to the NFS or iSCSI share.

If you are backing up to another server on your network, consider increasing the amount of available space in the destination filesystem or migrating the filesystem to a new location (Settings > Backup > Inbound Backup > Default Backup Archive Location) .

Option 2: Adjust Retention Settings

Option 3: Compact or Maintain Archive

Archive maintenance involves cleaning up the backup archive to remove backed up data according to the retention settings you’ve specified. Archive maintenance removes:

File versions that are too old
Deleted files that no longer need to be kept
Files that are no longer selected for backup
Self-heal archive blocks that have become corrupted (re-requests those blocks from the client).

A user can perform a manual archive maintenance by clicking the Compact button in PRO Client for a specific destination.

Within your client, de-select the file/folder you want to remove
Click Destinations and then select the appropriate destination type (Computers or Folders.)
Click the specific destination
Click the Compact button

NOTE: Removing files from your backup selection permanently removes the files from the backup archive. After compacting, de-selected and old data will be removed in the remote destination. You will not be able to restore any de-selected or old data once the maintenance/compact process is complete.

Tag: crashplan

Validating Eclipse Backups on Linux