Snapshot Processing

Prev Next

Overview

The purpose of this document is to explain the processing that occurs during snapshots by the Nasuni Edge Appliance. This can also be useful in understanding data propagation.

Note: With each Nasuni snapshot, configuration information is included, in case it is necessary to recover the Edge Appliance. The configuration bundle is encrypted in the same way that all the customer data is encrypted. If you receive an alert that such backup configurations have failed, this might be due to intermittent network issues, or possibly due to DNS issues. If you see notifications that the Edge Appliance has successfully completed a snapshot after the backup alert, then you can safely ignore the alert.

Snapshots

The snapshot process is the act of creating a new version in the UniFS file system. These new versions can become available to remote Edge Appliances for data propagation.

Snapshots can be triggered under the following circumstances:

  • Scheduled snapshot interval: Configurable interval.

  • Cache usage exceeds 85 percent usage. A snapshot is forced to run once the cache reaches 85 percent usage, even if the Snapshot schedule is disabled.

Note: Should you not want data to be pushed to the cloud when the cache reaches 85 percent or higher usage, contact Nasuni Support to disable Snapshots internally.

  • Customer-requested snapshot via Edge Appliance User Interface and NMC’s Take Snapshot Now button or Prioritized Snapshot button.

  • As part of the processing when Snapshot Retention is enabled.

  • If the volume has Global File Acceleration (GFA) configured as Active, the GFA Manager service instructs the Edge Appliance to perform a snapshot.

  • Customer-requested “Perform snapshot before shutting down” when shutting down a Nasuni Edge Appliance.

Because multiple Edge Appliances can share multiple volumes, snapshot handling simplifies processing in these ways:

  • On a given Edge Appliance, only one volume can perform a snapshot at a time.

  • A volume that is shared on multiple Edge Appliances can only perform the metadata push phase of a snapshot on one of the Edge Appliances at a time.

Snapshot scheduling

Snapshots are scheduled in the Nasuni Edge Appliance UI, in the Nasuni Management Console (NMC) UI, or by Global File Acceleration (GFA) if it is enabled.

Snapshots can be scheduled for any or all days of the week. Snapshots can be scheduled to occur either throughout the day (“Allow snapshots all day”) or between specified start and stop times on each selected day.

During each specified snapshot window, you can set the time between snapshots (Frequency). The available Frequency values are different for local volumes (minimum: 1 hour between snapshots; default: 1 hour between snapshots) and for shared volumes (minimum: 1 minute between snapshots; default: 5 minutes between snapshots).

During the allowed snapshot window, a snapshot is scheduled based on the selected Frequency.

  • For local volumes, if a snapshot is still running when the next snapshot interval occurs, the next snapshot is queued to run immediately after the running one completes. Only one additional snapshot is ever queued to run.

  • For shared volumes, if a snapshot is still running when the next snapshot interval occurs, then the next snapshot is not scheduled until after the running snapshot completes.

For example, suppose that a snapshot schedule has a Frequency of every 15 minutes, but that each snapshot takes more than 30 minutes to complete:

  • At 1:05, a snapshot (#1) is scheduled. Since no other snapshot is running, snapshot 1 begins.

  • At ~1:20, snapshot #1 is still running, but another snapshot (#2) is scheduled. Snapshot #2 begins running after snapshot #1 completes.

  • At ~1:35, if snapshot #1 is still running, snapshot #2 is already queued, and so no additional snapshots are scheduled.

Tip: On volumes with Global File Lock enabled, we recommend increasing the snapshot frequency and the synchronization frequency of the volume. If the normal snapshot and synchronization frequency of the volume are decreased, new files take longer to propagate, because new files depend on snapshot and synchronization to propagate.

If a snapshot has been queued to run, or has already started running, it continues to be queued to run (if queued) or to run (if running), even if the snapshot window has ended. However, no further snapshots are scheduled outside of the snapshot window.

For shared volumes that are under the control of Global File Acceleration (GFA), all snapshot scheduling is performed by the GFA Manager service. The Snapshot Schedule UI is replaced with a “Global File Acceleration Enablement Window” UI to allow you to set times when GFA-directed snapshots can and cannot occur. The “enablement window” specifies the days and times when snapshot activity is allowed; hence, the term “enablement window”.

Local volume (remote access disabled)

For a purely local volume, which does not have Remote Access enabled, the snapshot processing proceeds as follows:

  1. At the snapshot frequency interval, a snapshot is scheduled.

  2. If Antivirus is enabled, an antivirus scan is performed.

  3. Data push phase includes the following:

    1. The Nasuni Edge Appliance then proceeds to queue up unprotected data and send it to the cloud.

    2. At the end of 10 minutes (default, but is configurable by Nasuni personnel), after the current push is complete, the volume is queued and scheduled for a subsequent push. This time limit means that some unprotected data might not be protected in a single snapshot.

  4. Metadata push phase processing includes the following:

    1. The Nasuni Edge Appliance sends to the cloud all of the metadata that corresponds to the data processed during the data push phase.
      This generally takes under a minute but, depending on the depth (lots of directories) or width (large number of objects per directory) of the directory structure protected, it can take significantly longer. (Metadata push phase can take hours, depending on how much data has been ingested.)
      Also, when metadata is being transferred, it might not realize the full bandwidth available compared with when data is pushing.

  5. If the next snapshot interval comes up while the current snapshot is in progress, another snapshot is scheduled, but only one. This new scheduled snapshot does not start until the previous one completes, as well as any other scheduled snapshots for other volumes that might be in the queue also complete.

Tip: To verify that a snapshot has been completed (both data phase and metadata phase), see Appendix: Verifying Snapshots on page 10.

If a Nasuni Edge Appliance cannot complete either phase, the number of retries is based on the specific snapshot interval set:

  • If the snapshot interval is less than 5 minutes: skip the snapshot and wait until the next scheduled snapshot.

  • 5-minute interval: 9 retries

  • 10-minute interval: 18 retries

  • 15-minute interval: 25 retries

  • 25-minute interval or longer: 40 retries

Shared or Remote volume (Remote Access enabled – even if volume is not connected to any other Nasuni Edge Appliances)

For volumes that are under the control of Global File Acceleration (GFA), all snapshot scheduling is performed by the GFA Manager service. The following material describes the processing for volumes that are not under the control of GFA.

The Nasuni Orchestration Center (NOC) keeps track of one snapshot “lock” for each volume. This prevents multiple Nasuni Edge Appliances from trying to update a given volume at the same time. When a snapshot completes for a volume, its snapshot lock becomes available for another Edge Appliance to perform a snapshot.

Tip: To verify that a snapshot has been completed (both data phase and metadata phase), see Appendix: Verifying Snapshots on page 10.

For a volume that has Remote Access enabled, even if that volume is not connected to any other Nasuni Edge Appliances, the snapshot processing proceeds as follows:

  1. At the snapshot frequency interval, a snapshot is scheduled. Note that the snapshot frequency has some randomness built in, so that not all Nasuni Edge Appliances are asking for the snapshot lock at exactly the same time for the same volume.

  2. If Antivirus is enabled, an antivirus scan is performed.

  3. Data push phase includes the following:

    1. Check with the NOC to see if the latest version of the volume is the same as ours. Sync up to the latest version, if necessary.

    2. The Nasuni Edge Appliance then proceeds to queue up unprotected data and send it to the cloud.

    3. At the end of 10 minutes (default, but is configurable), after the current push is complete, the volume is queued and scheduled for a subsequent push. This time limit means that some unprotected data might not be protected in a single snapshot.

  4. Metadata push phase processing includes the following:

    1. Check with the NOC to see if the latest version of the volume is the same as ours. Sync up to the latest version, if necessary.

    2. Now, the Nasuni Edge Appliance requests the snapshot lock for this volume from the NOC for the metadata push phase.

    3. If the Nasuni Edge Appliance succeeds in getting the snapshot lock, the Nasuni Edge Appliance sends all of the metadata that corresponds to the data processed during the data push phase (or sent via Global File Lock since the last metadata snapshot) to the cloud.
      This generally takes under a minute but, depending on the depth (lots of directories) or width (large number of objects per directory) of the directory structure protected, it can take significantly longer (up to hours).
      Also, when metadata is being transferred, it might not realize the full bandwidth available compared with when data is pushing.

    4. After all of the metadata is protected, the Nasuni Edge Appliance releases the snapshot lock for this volume.

If a Nasuni Edge Appliance cannot get the snapshot lock from the NOC for either phase, the number of retries is based on the specific snapshot interval set:

  • If the snapshot interval is less than 5 minutes: skip the snapshot and wait until the next scheduled snapshot.

  • 5-minute interval: 9 retries

  • 10-minute interval: 18 retries

  • 15-minute interval: 25 retries

  • 25-minute interval or longer: 40 retries

Because there is only one snapshot lock per volume in the NOC, the more Nasuni Edge Appliances that are connected to the volume in Read/Write mode, the more possible contention there might be for this snapshot lock. Nasuni Edge Appliances in Read-Only mode make no changes, so do not add to any possible contention.

For example, suppose that 10 Nasuni Edge Appliances share a volume, and all of them are attempting snapshots every 15 minutes. If the metadata push phase takes 1 minute for each snapshot, then most or all of the Nasuni Edge Appliances should complete a snapshot during that 15-minute timeframe. However, this is not guaranteed. If one Nasuni Edge Appliance has a lot of changes, and holds the snapshot lock for 5 minutes, some of the other Nasuni Edge Appliances might not complete their snapshots during that interval.

As a second example, suppose that 40 Nasuni Edge Appliances are each attempting snapshots every 15 minutes. Again, if the metadata push phase takes 1 minute each, there is really no possibility of having all the Nasuni Edge Appliances complete their snapshots within that interval. In fact, with that many Nasuni Edge Appliances seeking the snapshot lock, between initial tries and retries, there might be a lot of contention for the snapshot lock. In this case, many of the Nasuni Edge Appliances take a lot longer to complete a snapshot, in both phases.

Folders on volumes with Global File Locking enabled

For a volume that has Global File Locking enabled, the snapshot processing proceeds as follows:

  1. When a file that has Global File Locking enabled is opened, the Nasuni Edge Appliance goes to the global file lockserver to obtain a global file lock on the file.

  2. When the file is saved or closed, the file is protected in the cloud outside of the regular snapshots. This allows a user on another Nasuni Edge Appliance to open the file, go to the global file lockserver to get the lock for the file, and obtain (sync) the latest version of the file from the cloud. This guarantees that the user is always working on the latest version of the file.

  3. However, metadata for the file is not connected with the cloud version of the file until the metadata push phase occurs for the volume. Because of this arrangement:

    1. New files are not seen on remote Nasuni Edge Appliances until the metadata push phase of the regular snapshot completes, and the remote Nasuni Edge Appliance runs a sync.

    2. Timestamps of the files under Global Lock are not updated until the metadata push phase of the regular snapshot completes, and the remote Nasuni Edge Appliance runs a sync. Because a Global Lock-enabled file always syncs on lock and open of the file, it opens the latest version, even if the timestamp does not reflect this.
      Tip: To verify that a snapshot has been completed (both data phase and metadata phase), see Appendix: Verifying Snapshots.

    3. The notes above, related to retries for the snapshot lock in the NOC and possible contention, still apply for the metadata push phase snapshots.

  4. Any other files or folders on the volume, in folders that do not have Global Lock enabled, snap under the regular snapshot schedule (see above).

Tip: On volumes with Global File Lock enabled, we recommend increasing the snapshot frequency and the synchronization frequency of the volume. If the normal snapshot and synchronization frequency of the volume are decreased, new files take longer to propagate, because new files depend on snapshot and synchronization to propagate.

Global File Lock and the Antivirus Service

If an open file has Global File Lock enabled, and if that file is saved, then that file is protected in the cloud outside of the regular snapshot, even if that file is still open. However, if the Antivirus Service is enabled for that file, then that open file is not immediately protected in the cloud. This is because the Antivirus Service must check that file before that file can be moved to cloud storage. In this case, after the Antivirus Service checks that file, and that file has no infections, then that file is protected in the cloud.
If a file does have antivirus infections, and those infections are marked “Ignore”, then the file experiences the usual Global File Lock processing.
For details of Antivirus processing, see Nasuni Antivirus Service.

Monitoring snapshot processing

Each snapshot includes processing for the data and for the metadata.

For both data and metadata, the Notifications in the NMC or the Edge Appliance UI show when each snapshot starts and when each snapshot completes. Each notification of a snapshot starting includes the volume name and the version number. Each notification of a snapshot completing includes the volume name, the version number, the number of objects succeeded, the number of objects failed, and the number of objects skipped.

For the data, during a snapshot, the Volumes page of the NMC shows the current percent status completion of a snapshot.

Tip: To verify that a snapshot has been completed (both data phase and metadata phase), see Appendix: Verifying Snapshots on page 10.

Snapshots triggered by GFA appear in the NMC or the Edge Appliance UI.

Appendix: Verifying Snapshots

A snapshot is a complete picture of the files and directories in your file system at a specific point in time. Snapshots are either manually initiated, or automatically performed as part of a Snapshot Schedule that you specify.

The snapshot process includes saving both the data and the associated metadata to cloud object storage. For this reason, a snapshot consists of both a data phase (sometimes called “phase 1”) and a metadata phase (sometimes called “phase 2”). To be sure that data is protected in the cloud, both phases of each snapshot must complete successfully. Only then can you be certain that no unprotected data remains in the cache.

Various procedures, including the recovery of an Edge Appliance, require you to perform a snapshot, and to then verify that the snapshot has completed successfully. This ensures that no unprotected data remains in the cache.

This section describes how to verify that a snapshot has completed successfully.

Verifying that a snapshot completed successfully

To verify that a snapshot has completed successfully, follow these steps:

  1. Log in to the NMC.

  2. Click the bell-shaped Notifications icon at the top right.
     

  3. Click View all Notifications. The Notifications page appears.

  4. In the Filter text box, type “snapshot”, then click "Apply Filter".
    The list is limited to notifications that include the word “snapshot”.
     

  5. For the most recent snapshot, find the “Snapshot started” notification for your Edge Appliance and for your volume that contains the label “Data”.
    For that notification, find the corresponding “Snapshot completed” notification for the same Edge Appliance, volume, and version number.
    This verifies that the data phase of this snapshot completed.

  6. Similarly, for the most recent snapshot, find the “Snapshot started” notification for your Edge Appliance and for your volume that contains the label “Metadata”.
    For that notification, find the corresponding “Snapshot completed” notification for the same Edge Appliance, volume, and version number.
    This verifies that the metadata phase of this snapshot completed.

Unprotected Files list

The Unprotected Files list on the Edge Appliance UI or the NMC is not sufficient verification that a snapshot has completed. The files in the Unprotected Files list are not yet protected, so any snapshots containing any of those files have not completed.

However, even if the Unprotected Files list has no files in it, that does not mean that all snapshots have completed. It could be, for example, that the data phase of a snapshot has completed, but that the metadata phase has not completed.

“New Data in Cache (not yet protected)” chart

The “New Data in Cache (not yet protected)” chart on the Edge Appliance UI is not sufficient verification that a snapshot has completed. The files in the “New Data in Cache (not yet protected)” chart are not yet protected, so any snapshots containing any of those files have not completed.

However, even if the “New Data in Cache (not yet protected)” chart has no files in it, that does not mean that all snapshots have completed. It could be, for example, that the data phase of a snapshot has completed, but that the metadata phase has not completed.