Volume Architecture Data Propagation Best Practices

Prev Next

Introduction

The purpose of this document is to help you configure your cloud storage volumes, snapshot frequencies, and synchronization frequencies, so that new data and changed data propagate through the Nasuni platform as rapidly as possible.

Overview of Data Architecture with Nasuni

When architecting a Nasuni environment, there are five components to consider. It is important to know how they all work together. These components are:

  • Cloud provider storage subscription

  • Nasuni volumes

  • Nasuni shares

  • Nasuni Edge Appliances

  • Nasuni Orchestration Center (NOC)

Cloud Provider Storage Subscription

Nasuni can only store data in object stores. Object storage is the underlying technology of any public or on-premises cloud. Therefore, to start, the customer must obtain a storage account with their provider of choice, such as Amazon S3 or Microsoft Azure Blob Storage. If the customer doesn't already have a storage account, it only takes a few minutes to sign up.

This account resides within the cloud provider. Through this cloud account, the customer determines in which data center their data is stored, the level of redundancy, and the tier of storage. Once setup is complete, the customer generates a set of credentials that are input into the Nasuni Management Console (NMC). This allows Nasuni to create volumes in the storage subscription.

Nasuni Volumes

Nasuni volumes are synonymous with "buckets" in AWS S3 or "containers" in Microsoft Azure Blob Storage. When a customer creates a volume in Nasuni through the NMC, Nasuni generates a bucket or container in the cloud.

Volumes are isolated file systems with their own version history. They are logical groupings of data that allow administrators to apply rules or settings to the data. They have no file size limits, capacity limits, inode limits, and no limits to the number of files that can be stored in them. Nasuni volumes dynamically expand as data is written, so you never have to remember to return and manually "grow" the volume, as with legacy storage. Volumes can be made accessible to multiple Nasuni Edge Appliances. Volumes are not known by the end users. Volumes reside in storage subscriptions. Volumes are accessed through shares.

Nasuni Shares

Shares can be created using the NMC. Shares present themselves to users or applications as drive mappings or as a UNC path. Shares connect users to volumes that reside in the cloud. Shares operate as a network mount point on an Edge Appliance.

Nasuni Edge Appliances

Edge Appliances are the local caches that provide fast access to data that resides on volumes in the cloud. They house shares that connect to Nasuni volumes. Edge Appliances are installed either on-premises or in the cloud, depending on the use case.

Nasuni Orchestration Center (NOC)

The NOC is a series of instances built in AWS that help Nasuni offer services, including licensing, global file locking, software updates, antivirus patching, and proactive health monitoring of customer Edge Appliances. All Edge Appliances connect to the NOC. The NOC is Nasuni-owned and controlled. It is completely out of the data path and maintains only management connections to customer Edge Appliances.

Overview of Data Propagation

Nasuni Cloud File Services™ is based on a “cloud-first” architecture that stores and protects unstructured file data on cloud storage volumes, while caching just the actively used file data from these volumes on Nasuni Edge Appliances.

The process of propagating data from one Edge Appliance to another Edge Appliance consists of several parts:

  • Snapshots, which move data from the cache of an Edge Appliance to cloud storage. The snapshot process is also called a “push”, as in pushing data to cloud storage.

  • Syncs (or synchronizations), which move data from cloud storage to other Edge Appliances. The sync process is also called a “pull”, as in pulling data from cloud storage.

  • When Global File Lock™ is enabled, that processing also propagates data when changes are made to existing files.

Data propagation requires one Edge Appliance to push data to cloud storage and another Edge Appliance to pull that data from cloud storage.

Nasuni’s patented UniFS® global file system, which spans both your cloud storage and your local Nasuni Edge Appliance storage, uses snapshots to store new files and file changes in cloud storage. You can configure when and how frequently snapshots occur.

Syncs (or synchronizations) make the metadata from snapshots visible to connected Edge Appliances, which enables each Edge Appliance to merge its local data with any new or changed data from other Nasuni Edge Appliances connected to the same volume. This helps ensure that everyone in your organization uses the most current data. You can configure when and how frequently syncs occur.

Details about Snapshots

A snapshot is a complete picture of the files and directories in your file system at a specific time. Snapshots offer data protection by enabling you to recover a file deleted in error or to restore an entire file system. Snapshots act as the transfer mechanism for data from the cache to the cloud and your data protection Recovery Point Objective (RPO). The snapshot interval you set determines how often data gets sent to the cloud and how granular your restore intervals are. Both requirements must be considered when setting the snapshot schedule.

After a snapshot has been taken and sent to cloud storage, it is not possible to modify that snapshot.

With snapshots, you can find, view, and restore past versions of your files quickly. You can restore a single file, a directory, or a volume.

The Nasuni Edge Appliance captures complete snapshots of files regularly and stores all snapshots in cloud storage to protect your files. Only one volume on an Edge Appliance can snapshot at a time. You can select which days of the week to perform snapshots, what time of day to start and stop creating snapshots, and the frequency for creating snapshots.

For example, you can configure snapshots to not occur during worktime and only push new and changed data during off-hours when network usage is low.

If the volume does not have Remote Access enabled, your frequency choices are every 1, 2, 4, 8, 12, or 24 hours. If the volume does have Remote Access enabled, your frequency choices are every 1, 5, 10, 15, 25, or 30 minutes or every 1, 2, 4, 8, 12, or 24 hours. For a discussion of configuration considerations, see Configuration Suggestions below.

If there is no new or changed data at the scheduled time of the snapshot, the snapshot does not occur.

During initial data ingestion, connecting more than one Nasuni Edge Appliance to a cloud storage volume can slow down data ingestion. Use one Nasuni Edge Appliance to push all data to cloud storage first. After ingestion is complete, then connect other Edge Appliances to the same storage volume.

You cannot have an unlimited number of Edge Appliances connected to a shared volume because this would cause snapshot contention.

For further details about snapshots, see Snapshot Processing.

Details about Syncs (Synchronizations)

You can schedule when and with what frequency; each Edge Appliance synchronizes its local data with cloud storage and merges it with any new or changed data from other Nasuni Edge Appliances connected to the same volume. This helps ensure that everyone in your organization uses the most current data. This applies only to volumes that have Remote Access enabled. Local volumes do not sync.

If you enable the “Auto Cache” option, this attempts to bring new or changed data into the local Edge Appliance cache from cloud storage, even if the data has not yet been accessed. If “Auto Cache” is not enabled, only the metadata of new or changed data is brought into the cache, and the data is only brought into the local Edge Appliance cache the next time that data is accessed. For further information on “Auto Cache”, see Nasuni Edge Appliance Administration Guide.

You can select which days of the week to sync data; at what time of day to start and stop syncing data; and the frequency for syncing data: every 1, 5, 10, 25, or 30 minutes, or every 1, 2, 4, 8, 12, or 24 hours for each volume.

Tip: On volumes with Nasuni Global File Lock™ enabled, we typically can reduce the volume's normal snapshot and synchronization frequency because Global File Lock provides file synchronization on existing files independently of the snapshot and synchronization frequency. In most cases, the synchronization frequency can be 5 minutes. However, new file propagation and restore points still depend on the configured snapshot and synchronization frequency.

Warning: Frequent syncs can increase the system load significantly if you have directories with tens of thousands of files but few changes during each snapshot or large files requiring multiple snapshots. Consider reducing the frequency of syncs.

Use Cases

Use cases tend to fall into a few common categories. Each of these use cases requires different data propagation performance. By choosing the right volume architecture and snap and sync schedules for your different use cases, you can optimize the speed of data propagation.

Typical use cases include:

  • Local use case: Files are created locally, and primarily used locally. Users experience high-performance access. All files are fully protected in cloud storage by storing snapshots at the frequency you choose.

  • Sync use case: Multiple sites access the same files, but do not typically collaborate on (open and lock) the files at the same time. Files can be created at any site, and can be accessed by any site. For example, 2-4 Edge Appliances could have acceptable synchronization times of 10-20 minutes between sites. Nasuni Global File Lock is not necessary. All files are fully protected in cloud storage by storing snapshots at the frequency you choose.

  • Collaborate use case: Multiple sites collaborate on (open and lock) the samesimultaneouslyme time. Files can be created at any site, and can be accessed and modified by any site. Synchronization times of 2 minutes or less might be necessary. Nasuni Global File Lock is necessary to prevent multiple users from changing the same file simultaneously. All files are fully protected in cloud storage by storing snapshots at the frequency you choose.

Note that a single enterprise might utilize more than one use case. For example, a large hotel chain might need to distribute reservation data widely (Sync use case), but have its advertising designers in a single location (Local use case). Similarly, an architectural company might need to support designers in different cities collaborating on the same design files (Collaborate use case), but have accounting and financial offices in a single location (Local use case).

Managing Use Cases to Improve Data Propagation

A best practice is to identify the use cases in your enterprise. This can also help set expectations for performance times. For example, if an operation typically falls into the Local use case, and Collaboration becomes necessary, users should not expect data propagation to be as fast in the Collaborate use case as it was in the Local use case.

A further best practice is to identify data sets by use case. For example, identifying the set of data necessary for close collaboration, and distinguishing that from other data sets that do not require close collaboration, can help with data propagation performance. Different data sets are typically driven by the application (such as CAD), or by user requirements for their work.

Since Nasuni’s snapshot and sync policies operate at the volume level, it is a best practice to define volumes that are specific to the data sets that you identify for your use cases. Nasuni Edge Appliances can connect to and cache data from multiple cloud storage volumes, giving you the flexibility to do this.

As another example, if three Edge Appliances all use the same data, but other Edge Appliances do not need the same data, then that data could be on a volume only shared by those three Edge Appliances.

Having established different volumes that align with the different data sets for the different use cases, you can now specify snapshot schedules that are appropriate for each volume. Data sets that change infrequently might need snapshots only once each day. Data sets with active collaboration might require more frequent snapshots.

Only one volume on an appliance can snapshot at a time. For this reason, there is an option to select a certain volume to have Prioritized Snapshots. The Prioritized Snapshot state forces the selected volume on the Edge Appliance to be the next volume to obtain the snapshot lock. Essentially, this volume on this Edge Appliance gets the priority for snapshot processing. If no new data is placed on this volume on this Edge Appliance, this state continues until new data is placed on this volume on this Edge Appliance, or until the expiration time passes. This state then continues until either the snapshot for this volume on this Nasuni Edge Appliance completes, or until the expiration time passes. The expiration time is 24 hours.

Similarly, you can now specify sync schedules that are appropriate for each volume. The goal is to ensure that each site has the most up-to-date versions of any data that has changed. The Collaboration use case would require more frequent syncs than the Synchronization use case, for example. Since syncs require less system and bandwidth overhead, syncs can be more frequent than snapshots.

The next step is to enable Nasuni Global File Lock for those data sets that support close collaboration, where it is necessary to prevent multiple users in different sites from changing the same files. Because Global File Lock is enabled at the directory level, you can specify Global File Lock protection much more finely than a complete volume. Only enable Global File Lock on specific directories where necessary because it can slow down the access time of an Edge Appliance, add Edge Appliance overhead, and add network overhead. Again, it is good to set user expectations: the protection that Global File Lock provides can affect data access performance.

Having configured volumes and directories, you should perform testing to establish that the data propagation behavior is appropriate for each use case. Some adjustments of the snapshot or sync schedules can improve data propagation performance.

To track the progress of snapshots and syncs, you can use the notifications issued at the start and completion of each snapshot and sync.

Configuration Suggestions

You can enable or disable access to a selected "owned" local volume by your remote locations. If remote access to this volume is enabled, you can then select permissions for remote access to this volume. Because the “Read Only” permission has much less overhead than the “Read/Write” permission, consider using the “Read Only” permission when possible.

The default settings that influence data propagation performance have been found to fit most customer situations, and special adjustment is generally not necessary.

However, rare situations do arise where data propagation is not optimal with the default settings. In such cases, the following suggestions can improve data propagation performance.

Edge Appliance Configuration

Volume snapshot schedule:

  • Use 30 minutes for the initial snapshot schedule.

  • If data propagation is consistently completed satisfactorily in less than 30 minutes, reduce the snapshot schedule to 15 minutes.

Volume sync schedule: 5 minutes.

Account Configuration Options

There are account configuration parameters, adjustable only by Nasuni personnel, that can help improve the performance of data propagation. For situations where the above considerations do not improve data propagation, adjusting these account configuration parameters could improve performance.

Copyright © 2010-2024 Nasuni Corporation. All rights reserved.