Filesystem Scanning Tools

Overview of Third-Party Filesystem Scanning with Nasuni

Nasuni enables enterprises to store and synchronize files across all locations at any scale. Powered by the Nasuni UniFS^® global file system, the Nasuni^® file services platform stores unstructured file data in private or public cloud object storage from providers such as Amazon, Dell EMC, IBM, WD ActiveScale, Hitachi Vantara, and Microsoft, while intelligently caching actively-used data on virtual or hardware Nasuni Edge Appliances for high-performance access. Nasuni serves a variety of use cases, including NAS/file server consolidation, multi-site collaboration, business continuity, digital transformation, and active archiving.

Third-party tools can help to:

Monitor the content size of directories and volumes.
Protect enterprise data such as sensitive files and emails; confidential customer, patient, and employee data; financial records; strategic and product plans; and other intellectual property.
Detect threats and cyberattacks by analyzing data, account activity, and user behavior.
Support data governance, compliance, classification, and threat analytics.
Scan potential vulnerabilities.
Produce reports on any of the above.

This document discusses the deployment considerations and the best practices around them as they relate to third-party tools when paired with the Nasuni cloud-based file architecture.

Caution: Because all connections are authenticated, SSL inspection or decryption is not supported.

Deploy a Dedicated Nasuni Edge Appliance for Scanning

Typically, Nasuni Edge Appliances are deployed at the edge to provide end users with high-performance access for their actively used data. Since the Nasuni UniFS^® global file system stores the authoritative copy of all files and metadata in private or public cloud object storage platforms, special considerations are necessary for use cases that require scanning that content.

Nasuni strongly recommends deploying a dedicated Nasuni Edge Appliance with which the third-party tools can interact. By deploying a dedicated Nasuni Edge Appliance along with a third-party tool local to the region where the authoritative copy resides, the need for transferring the entire data set across a wide area network to the edge is eliminated.
This reduces the duration of scans as needed by the third-party tool use case.
This architecture also minimizes any egress fees charged by the public cloud provider, because the data transfer between the Nasuni Edge Appliance and the third-party tool occurs within the same region.
Using a dedicated Nasuni Edge Appliance also allows the administrator of the third-party tool to run scans during normal business hours without fear of impacting end-user performance. It also avoids the possibility of the scan forcing the eviction of active data in favor of older data that the tool needs to scan.

Figure 1: Example Enterprise Search Architecture

Architecture and Deployment Considerations

Nasuni Edge Appliance and Third-party Tool Ratio

Traditionally, siloed NAS architectures required a one-to-one relationship between the NAS device and a third-party tool, in order to avoid inefficient scans across wide area networks.

The Nasuni UniFS^® global file system eliminates the need to deploy a third-party tool along with every Nasuni Edge Appliance, since changes to data on volumes that are shared across Nasuni Edge Appliances are propagated to all connected appliances.

In addition, every Nasuni Edge Appliance is capable of sending file system audit events via the Nasuni Auditing API back to the dedicated third-party tool. The third-party tool can then analyze these events. If a threat is detected, the third-party tool can then generate administrative alerts and take proactive actions to lock down data or users. These events can also be used by the third-party tool to identify which files and folders have been modified and perform incremental scans of just those items. See the Nasuni Management Console Guide for details about configuring file system auditing.

Public Cloud Networking

The Nasuni Edge Appliance requires connectivity to Active Directory domain controllers. The third-party tool might also require access to domain controllers, as well as to a central management server. If these infrastructure resources are deployed solely on-premises, then a trusted network path, in the form of a VPN or direct connection, must be created between the public cloud and your datacenters. Alternatively, infrastructure resources that already exist in the public cloud can be used by the Edge Appliance and the third-party tool.

Multi-Region Deployments

When data is hosted across multiple cloud service provider regions, it is recommended that a dedicated Nasuni Edge Appliance and third-party tool be deployed in each region. This ensures the best performance, because the Nasuni and third-party tool VMs are close to the data. It also minimizes egress fees. In addition, this arrangement addresses data sovereignty concerns, such as those of the European Union’s General Data Protection Regulation (GDPR), by ensuring that content scanning happens within a particular region.

In a multi-region scenario that does not involve data sovereignty concerns, and where cost takes precedence over scan duration, it can be more cost-effective to use a single Nasuni Edge Appliance and third-party tool pair to scan multiple volumes/regions. Each cloud service provider has different costs associated with data transfers. Consideration should be given to how much data is scanned by the third-party tool and, thus, how much data traverses the cloud service provider’s network. In the case of a third-party tool that only involves the scanning of metadata, egress fees are almost always less than the cost of deploying multiple VMs in each region.
However, when certain third-party tools are used that involve scanning file contents, significantly more data can be involved in each scan. The major cloud service providers offer cost calculators that can be used to determine the break-even point for data being scanned across regions vs. the cost of deploying additional VMs.

Network Architecture

In a multi-region deployment, network traffic can be routed across region-specific secure connections between a regional office and the cloud service provider, or across the cloud service provider’s backbone to a shared secure connection at a single customer location.

The specific network configurations for each scenario vary depending on the cloud service provider. Please consult your provider’s documentation for the latest deployment guidance.

Run Third-Party Metadata Scans After Metadata Is Brought into Cache

Before the initial scan of a Nasuni volume by the third-party tool, ensure that metadata for the volume has been pulled into the cache completely by using the Nasuni File Browser. After the initial scan, metadata changes are minimal, and follow-on scans can automatically trigger the download of the incremental changes.

Important: Performing this operation on a user-facing Edge Appliance might lead to longer synchronization times and slower data propagation. Nasuni recommends only performing this action on an Edge Appliance dedicated to the filesystem scanning workload.

To bring metadata into the cache, follow these steps:

Log in to the Nasuni Management Console (NMC).
Click Volumes.
Click File Browser in the left-hand column.
From the Volume drop-down list, select the volume to scan.
From the Filer drop-down list, select the Nasuni Edge Appliance closest to the third-party tool.
In the Version drop-down list, ensure that “Current Version” is selected.
In the Volume Actions area, click “Bring into Cache”. The “Bring Volume Into Cache” dialog box appears.
Select “Bring Metadata Only”.
Important: If you do not select “Bring Metadata Only”, the Nasuni Edge Appliance starts downloading all of the data on the volume into the cache.
Click “Start Transfer”. This begins the process of copying metadata into the local cache of the Nasuni Edge Appliance.
Monitor the Notifications on the NMC for messages indicating that metadata is being brought into cache and that the job is complete. The message is of the form, “Metadata for entire volume <volume_name> has been successfully brought into cache.”

Note: The default is that notifications are NOT generated when a “Bring into Cache” job starts, continues, or completes. However, you can request Nasuni Support to specify that such notifications ARE generated, except if you are using Varonis, which can cause issues.

Important: This message indicates that the Nasuni Edge Appliance has finished downloading the metadata associated with the volume. However, it is possible that some directories might have been skipped. Nasuni Support can review system logs to determine whether any directories have been skipped.

Use the NMC API to Bring Metadata into Cache

You can also bring metadata into the cache using the NMC API. By default, both the metadata and data for the specified path are brought into the cache. Bringing only the metadata into cache is an option if $MetadataOnly is set to "true".

Important: The NMC API can be used to pin metadata in the cache, or to enable Auto Cache for metadata. Pinning metadata in the cache and enabling Auto Cache for metadata can affect the amount of data in the cache, and the display of data in the cache. Also, bringing all metadata into the cache adds time to the sync process and might affect user performance. With no users on a dedicated appliance (for example, to change permissions or perform searches), the effect on sync times due to syncing the entire metadata tree would not affect any user-related snapshot or sync changes. The NMC API can also be used to verify that these features have been configured for a directory. Because metadata-only pinning and Auto Cache pinning are currently possible only with the NMC API, directories with such pinning enabled are not displayed in the File Browser of the NMC and the Edge Appliance, nor on the NMC Pinned Folders and NMC Auto Cached Folders pages.

Required Inputs: NMC hostname, username, password, volume_guid, filer_serial, path, metadata only, force

Compatibility: Nasuni 8.5 or higher required

Script Name: BringPathIntoCache.ps1

Use Incremental Scans After the Initial Scan

After the initial scan of the Nasuni volume, it is recommended to configure the third-party tool to perform incremental scans rather than use, for example, a 24-hour full-scan frequency. The incremental scans rely on the stream of events from the Nasuni Edge Appliances to identify new data to scan.

While the bulk of the metadata for the volume is already resident in the cache due to the initial “Bring into Cache” procedure, any new metadata must be downloaded to the cache by the Nasuni Edge Appliance as the third-party tool requires it. The amount of new metadata to download to the cache depends on how much data changes and how frequently the data changes in the scanned volume.

Scan File Content with Third-Party Tools

Unlike third-party tool processes that only scan metadata, other third-party tool processes can scan the contents of files in order to build a comprehensive search index or identify sensitive information.

In the case of sensitive data discovery, they might copy the relevant files from the Nasuni Edge Appliance to the third-party tool, and then analyze them, based on a set of discovery rules configured by the administrator of the third-party tool. If a change is made to the rules that define the sensitive information, the third-party tool might rescan the data. This can happen due to an update to the ruleset provided by the third-party tool, or due to a configuration change made to the ruleset by the administrator of the third-party tool.

To accommodate the scanning of data, careful consideration must be given to the amount of data to scan. Ideally, the cache of the Nasuni Edge Appliance should be large enough to contain the dataset that the third-party tool is scanning, in addition to any space necessary for the cache to perform its other tasks. As an example, a 40 TB volume might only have 2 TB of data to scan. In this case, the cache of the Nasuni Edge Appliance should include 3 TB for this scanning task, in addition to any space necessary for the cache to perform its other tasks. This would allow the third-party tool dataset to remain in the cache, with some allowance for future growth.

Increasing the default number of threads used by the third-party tool to perform scans improves the speed of the scans. Contact the Support for the third-party tool for assistance with increasing the thread count for the third-party tool.

Regular scans performed by the third-party tool help ensure that the data remains resident in the cache, so it is not necessary to pin specific data to the cache.

Enable “Auto Cache” for the volume, in order to proactively load as much new data created by other Nasuni Edge Appliances as possible.

It might not be possible or practical to specify a cache large enough to contain the entire dataset to be scanned. For example, policy requirements might specify that the entire volume must be scanned by the third-party tool. In such cases, it is critical that the Nasuni Edge Appliance be located as close as possible to the cloud storage provider. This helps to ensure that adequate bandwidth is available for downloading large amounts of data from the object store into the Nasuni Edge Appliance’s cache, as needed. In this scenario, a scan takes additional time to complete, because the Nasuni Edge Appliance must bring required data into the cache, and also evict already-scanned data from the cache to make room for more data, before the third-party tool can perform the specified scans. This frequent rolling-over of the contents of the Nasuni Edge Appliance’s cache would have a negative impact on the end-user experience, further emphasizing the need for a dedicated Nasuni Edge Appliance.

Reserved Instances

Public cloud providers might offer special pricing for reserved instances of virtual machines. This special pricing can provide considerable cost savings over the life of a virtual machine. Consult your cloud provider’s product offering for information about purchasing reserved instances for the dedicated Nasuni Edge Appliance and third-party tool.

Use External Vulnerability Scanners with Nasuni

Most vulnerability scanners work by probing the appliance from the outside, as an attacker would. The scanner connects to open ports, determines what is running, and makes recommendations based on their findings. More recently, offline scanners have been implemented that can extract the Nasuni appliance disk from the virtual machine and read that disk directly. These scans provide disk contents rather than what might be reachable or executable on a running system.

The primary limitation of this approach is that the results might be misleading. Scanners work by listing all software packages they find and comparing that list against a list of known vulnerabilities. These hits might be incorrect or confusing, such as when the package in question happens to share a version number with a vulnerable version that has already been patched to be safe (such as "1.2.4" versus "1.2.4-patch1"). The package might also not be vulnerable due to a lack of usage by Nasuni. For example, code might be in a chroot "jail", or a vulnerable feature of the package might go unused.

This behavior can result in dozens of simultaneous vulnerabilities that are false positives. Therefore, the recommendation by Nasuni for vulnerability scanning is to utilize external scanners that mimic a real-world attack and that uncover vulnerabilities from an "outside-in" perspective.

Caution: Because all connections are authenticated, SSL inspection or decryption is not supported.

Virtual Resource Recommendations for Scanning Workloads

The below specifications are recommendations for optimizing the performance of the virtual Nasuni Edge Appliance for use by a third-party tool. Customers may choose to start with much lower specifications, and only increase the resources if they wish to decrease the scanning times.

Nasuni Virtual Edge Appliance

When scanning only metadata:

8 vCPUs
32 GiB Memory
1 TiB Cache (SSD) providing at least 5000 IOPs
256 GiB COW (SSD)

When scanning full file content:

16 vCPUs
64 GiB Memory
1 TiB Cache (SSD) providing at least 5000 IOPs
256 GiB COW (SSD)

Third-party Tool

Refer to the configuration documentation of the third-party tool for sizing guidance.

Technical Support

Online self-help resources and Technical Support are available at www.nasuni.com/support.

Documentation Index