SharePoint Search Best Practices

Introduction

Nasuni enables enterprises to store and synchronize files across all locations at any scale. Powered by the Nasuni UniFS^® global file system, the Nasuni file data platform stores unstructured data in private or public cloud object storage from providers such as Amazon, Dell EMC, IBM, Western Digital ActiveScale, Hitachi Vantara, and Microsoft, while intelligently caching actively-used data on virtual or hardware Nasuni Edge Appliances for high performance access. Nasuni serves a variety of use cases, including NAS/file server consolidation, multi-site collaboration, business continuity, digital transformation, and active archiving.

SharePoint Search allows you to:

See refined and relevant search results with SharePoint’s built-in enterprise search capabilities.
Easily see sites you frequent or files you have recently viewed or edited.
Take advantage of the type-ahead and smart results features as you search for sites, files, or people.
Search for sensitive content and use SharePoint’s search capabilities to support eDiscovery and compliance.

This document discusses deployment considerations, and the best practices around them, as they relate to the SharePoint Search use cases, when paired with the Nasuni cloud-based file architecture.

Deployment

SharePoint Search can be deployed using several different architectures:

Standard Search: Involves hosting your own SharePoint server farm, and makes no use of any hosted SharePoint offering, including SharePoint Online.
Cloud Hybrid Search: Allows users to search for files and documents in both SharePoint Server and Office 365 simultaneously. Stores content from SharePoint Server and Office 365 in the Office 365 search index. Office 365 customers have access to Cloud Hybrid Search.
Hybrid Federated Search: Stores content from SharePoint Server in a separate index from the Office 365 content. Office 365 customers have access to Hybrid Federated Search.

When a user performs a search in the Cloud Hybrid Search model, the user receives search results ranked in a single result block. When a user performs a search in the Hybrid Federated Search model, the user receives search results ranked in two result blocks: one for SharePoint and one for Office 365.

Microsoft generally recommends the use of Cloud Hybrid Search, unless there is sensitive content on-premises that should not be included in the Office 365 index.

Typically, Nasuni Edge Appliances are deployed at the edge to provide end users with high-performance access for their actively used data. Since the Nasuni UniFS global file system stores the authoritative copy of all files and metadata in private or public cloud object storage platforms, special considerations are necessary for use cases that require scanning that content.

Nasuni strongly recommends deploying a dedicated Nasuni Edge Appliance with which the SharePoint Search server interacts. By deploying a dedicated Nasuni Edge Appliance along with a search server local to the region where the authoritative copy resides, the need for transferring the entire data set across a wide area network to the edge is eliminated. This reduces the duration of crawls that the SharePoint Search server performs. This architecture also minimizes any egress fees that the public cloud provider charges, because the data transfer between the Nasuni Edge Appliance and the search server occurs within the same region. Using a dedicated Nasuni Edge Appliance also allows the SharePoint administrator to perform crawls during normal business hours without fear of impacting end-user performance.

Standard Search

The standard search architecture involves the use of only on-premises SharePoint. A SharePoint server farm is deployed in a customer's datacenter and users access the farm directly across private network connections or the public Internet.

Figure 1: Standard Search.

A SharePoint indexing server and an Edge Appliance dedicated to the SharePoint server should be located as close to the object store as possible, in order to provide the best performance for the indexing server. In the case of a public cloud object store, this means deploying the SharePoint Search server and Edge Appliance in the same cloud region as your data. If you are using an on-premises object store, the SharePoint Search server and Edge Appliance should be deployed in a datacenter close to the private object store.

Cloud Hybrid Search

Cloud Hybrid Search can be used by Office 365 customers. It combines a customer-managed SharePoint server instance used for indexing Nasuni volumes, along with a search index hosted within the Office 365 service. In this architecture, a user receives search results from both the Nasuni and Office 365 sources ranked in a single result block.

Figure 2: Cloud Hybrid Search.

An Edge Appliance dedicated to the SharePoint Search server should be located as close to the object store as possible, in order to provide the best performance for the index server. In the case of a public cloud object store, this means deploying the SharePoint Search server and Edge Appliance in the same cloud region as your data. If you are using an on-premises object store, the SharePoint Search server and Edge Appliance should be deployed in a datacenter close to the private object store.

Hybrid Federated Search

Hybrid Federated Search combines an on-premises SharePoint deployment along with a search index hosted within the Office 365 service. Hybrid Federated Search would only be used if you are using a private object store as well as Office 365. This architecture allows you to keep your on-premises data out of the public cloud, namely, Office 365, while maintaining the ability to conduct searches of both on-premises and Office 365 data. In this architecture, a user receives search results ranked in two result blocks: one for the Nasuni volumes and any other on-premises SharePoint data sources; and one for the Office 365 data source.

Figure 3: Hybrid Federated Search.

An Edge Appliance dedicated to the SharePoint Search server should be located as close to the object store as possible, in order to provide the best performance for the index server. Since this use case involves an on-premises object store, the SharePoint Search server and Edge Appliance should be deployed in a datacenter close to the private object store.

Advantages of dedicated Edge Appliance

The use of a dedicated Edge Appliance offers several advantages over scanning an existing Edge Appliance:

Minimizes cache thrashing due to differences in access patterns between indexer and end users.
Allows for indexing operations during normal business hours with no impact to end users.
Independently scales Edge Appliances based on load.

Considerations

Public Cloud Networking

The Nasuni Edge Appliance and SharePoint Search virtual machines require connectivity to infrastructure resources. Both the Edge Appliance and search server require access to Active Directory Domain Controllers. If these infrastructure resources are deployed solely on-premises, then a trusted network path, in the form of a VPN or direct connection, must be created between the public cloud and your datacenters. Alternatively, infrastructure resources that already exist in the public cloud can be used by the Edge Appliance and the search server.

Cloud Service Provider

This architecture does not require you to use Azure as the cloud service provider for the dedicated Edge Appliance and SharePoint Search server. For example, if you are using AWS S3 to store your Nasuni volumes, you should deploy the Edge Appliance and SharePoint Search server in AWS EC2. The SharePoint Search server sends search metadata to the index in Office 365. This does mean that there are egress fees associated with the transfer of this data from AWS to Office 365. However, these charges are lower than if the VMs are deployed in Azure where the Edge Appliance would have to egress data from EC2 as the indexing process occurs.

Multi-Region Deployments

When data is hosted across multiple cloud service provider regions, it is recommended that a dedicated Nasuni Edge Appliance and SharePoint Search server be deployed in each region. This ensures the best performance, because the Nasuni and SharePoint VMs are close to the data. It also minimizes egress fees. In addition, this arrangement addresses data sovereignty concerns, such as those of the European Union’s General Data Protection Regulation (GDPR), by ensuring that content scanning happens within a particular region.

In a multi-region scenario that does not involve data sovereignty concerns, and where cost takes precedence over scan duration, it can be more cost-effective to use a single Nasuni Edge Appliance and SharePoint Search server pair to scan multiple volumes. Each cloud service provider has different costs associated with data transfers. Consideration should be given to how much data is crawled by SharePoint and, thus, how much data traverses the cloud service provider’s network. The major cloud service providers offer cost calculators that can be used to determine the break-even point for data being scanned across regions vs. the cost of deploying additional VMs.

Network Architecture

In a multi-region deployment, the Edge Appliances and SharePoint Search server in each region need to be able to communicate with Active Directory domain controllers. This communication can be routed across region-specific secure connections between a regional office and the cloud service provider, or across the cloud service provider’s backbone to a shared secure connection at a single customer location.

Figure 4: Using region-specific secure connections for trusted traffic.

Figure 5: Using cloud service provider backbone to central, secure connection for trusted traffic.

The specific network configurations for each scenario vary depending on the cloud service provider. Consult your provider’s documentation for the latest deployment guidance.

Search Results Pathing

The path to search results is based on the path that the SharePoint server is configured to crawl. If the path points directly to a share on the dedicated Edge Appliance, then all users need to access data via the dedicated Edge Appliance, which has performance implications for both the users and crawler. Ideally, users should access relevant search results via the closest available Edge Appliance. This can be accomplished via several methods including through the use of path rewriting rules, and DFS or site-specific DNS name resolution.

Path rewriting settings are controlled via the “Server Name Mappings” settings for each Search Service Application. A rule can be created to replace the name of the Edge Appliance being crawled with a different path that sends users to the closest Edge Appliance. For example, file://spfiler1/ can be changed to file://domain.com/corporate where domain.com\corporate is a DFS namespace.

Another potential redirection would be to send users to an Edge Appliance configured for Web Access. In this example, file://spfiler1/ can be changed to https://files.company.com/fs/view where files.company.com is a URL pointing to an Edge Appliance with Web Access enabled for the same shares as being crawled by SharePoint. When a user clicks on these links in the search results, the user is prompted to login to Nasuni Web Access. As long as the Web Access session remains open, the user does not have to reauthenticate.

Important: For this redirection to function properly, all of the shares being crawled by SharePoint must have Web Access enabled on the Edge Appliance referenced by the HTTPS URL.

Run Crawls After Metadata Is Pulled into Cache

Before the initial crawl of a Nasuni volume by SharePoint, ensure that the metadata for the volume has been pulled into the cache completely, by using the Nasuni File Browser.

To bring metadata into the cache, follow these steps:

Log into the Nasuni Management Console (NMC).
Click Volumes.
Click File Browser in the left-hand column.
From the Volume drop-down list, select the volume to scan.
From the Filer drop-down list, select the Nasuni Edge Appliance closest to the SharePoint Search server.
In the Version drop-down list, ensure that “Current Version” is selected.
In the Volume Actions area, click “Bring into Cache”. The “Bring Volume Into Cache” dialog box appears.
Select “Bring Metadata Only”.
Important: If you do not select “Bring Metadata Only”, the Nasuni Edge Appliance starts downloading all of the data on the volume into the cache.
Click “Start Transfer”. This begins the process of copying metadata into the local cache of the Nasuni Edge Appliance.
Monitor the Notifications on the NMC for messages indicating that metadata is being brought into cache and that the job is complete. The message is of the form, “Metadata for entire volume <volume_name> has been successfully brought into cache.”

Note: The default is that notifications are NOT generated when a “Bring into Cache” job starts, continues, or completes. However, you can request Nasuni Support to specify that such notifications ARE generated, except if you are using Varonis, which can cause issues.

Important: This message indicates that the Nasuni Edge Appliance has finished downloading the metadata associated with the volume. However, it is possible that some directories might have been skipped. Nasuni Support can review system logs to determine whether any directories have been skipped.

Incremental SharePoint Crawls

After the initial full crawl of the Nasuni volume, it is recommended to configure SharePoint to perform incremental crawls. This improves crawl performance and minimizes the amount of data pulled from the object store into cache.

Cache Management

Incremental crawls performed by SharePoint focus on only new and changed files, so it is not necessary to pin specific data to the cache.

Enable “Auto Cache” for the volume, in order to proactively load as much new data created by other Nasuni Edge Appliances as possible in between scheduled SharePoint crawls.

Nasuni version 8.5 added the ability to configure metadata for Pinning and Auto Cache. When metadata is Pinned, the Edge Appliance does not evict it from cache. When Auto Cache is enabled for metadata, the Edge Appliance proactively downloads metadata changes to a volume as they are made by remote Edge Appliances. Both of these features can improve the performance of SharePoint crawls by keeping the metadata cache-resident and as up to date as possible.

Metadata Pinning and Auto Cache are configurable via the NMC API. The settings are specific to the Edge Appliance and should be targeted to the Edge Appliance being crawled by SharePoint. The NMC API is a REST API and many different tools can be used to configure Pinning and Auto Cache.

PowerShell Example: Metadata Pinning

Metadata Pinning can be configured using scripting languages such as PowerShell. An example script for enabling metadata pinning on a specific path is provided below. Substitute values specific to your use case for the $hostname, $username, $password, $volume_guid, $filer_serial, and $FolderPath variables.

# Pin the metadata for the specified path to the cache

# populate NMC hostname and credentials

$hostname = "insertNMChostnameHere"

# username for AD accounts supports both UPN (user@domain.com) and DOMAIN\\samaccountname formats (two backslashes required ).

# Nasuni Native user accounts are also supported.

$username = "username"

$password = "password"

$credentials = '{"username":"' + $username + '","password":"' + $password + '"}'

# specify Edge Appliance and Volume

$volume_guid = "InsertVolumeGuid"

$filer_serial = "InsertFilerSerial"

# Set the path to pin to cache. The path should start with a "/" and is the path as displayed in the file browser

# and is not related to the share path. If you want to pin the metadata for the entire volume, set this to "/".

$FolderPath = "/Insert/path/here"

# Allow untrusted SSL certs

if ("TrustAllCertsPolicy" -as [type]) {} else {

Add-Type -TypeDefinition @"

using System.Net;

using System.Security.Cryptography.X509Certificates;

public class TrustAllCertsPolicy : ICertificatePolicy {

public bool CheckValidationResult(

ServicePoint srvPoint, X509Certificate certificate,

WebRequest request, int certificateProblem) {

return true;

}

[System.Net.ServicePointManager]::CertificatePolicy = New-Object -TypeName TrustAllCertsPolicy

# set the correct TLS Type

[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12

}

# build JSON headers

$headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"

$headers.Add("Accept", 'application/json')

$headers.Add("Content-Type", 'application/json')

# construct Uri

$url="https://"+$hostname+"/api/v1.1/auth/login/"

# Use credentials to request and store a session token from NMC for later use

$result = Invoke-RestMethod -Uri $url -Method Post -Headers $headers -Body $credentials

$token = $result.token

$headers.Add("Authorization","Token " + $token)

# Set the URL for the folder update NMC API endpoint

$CacheUrl="https://"+$hostname+"/api/v1.1/volumes/" + $volume_guid + "/filers/" + $filer_serial + "/pinned-folders/"

# build the body for the folder update

$body = @{

path = $FolderPath

mode = "metadata"

}

# set folder properties

$response=Invoke-RestMethod -Uri $CacheUrl -Method Post -Headers $headers -Body (ConvertTo-Json -InputObject $body)

write-output $response | ConvertTo-Json

PowerShell Example: Metadata Auto Cache

Metadata Auto Cache can be configured through the use of scripting languages such as PowerShell. An example script for enabling metadata Auto Cache on a specific path is provided below. Substitute values specific to your use case for the $hostname, $username, $password, $volume_guid, $filer_serial, and $FolderPath variables.

# Enable Auto Cache of the metadata for the specified path

# NOTE: The volume must have Remote Access enabled before enabling Auto Cache

# populate NMC hostname and credentials

$hostname = "insertNMChostnameHere"

# username for AD accounts supports both UPN (user@domain.com) and DOMAIN\\samaccountname formats (two backslashes required ).

# Nasuni Native user accounts are also supported.

$username = "username"

$password = "password"

$credentials = '{"username":"' + $username + '","password":"' + $password + '"}'

# specify Edge Appliance and Volume

$volume_guid = "InsertVolumeGuid"

$filer_serial = "InsertFilerSerial"

# Set the path on which to enable Auto Cache. The path should start with a "/" and is the path as displayed in the file browser

# and is not related to the share path. If you want to enable Auto Cache of metadata for the entire volume, set this to "/".

$FolderPath = "/Insert/path/here"

# Allow untrusted SSL certs

if ("TrustAllCertsPolicy" -as [type]) {} else {

Add-Type -TypeDefinition @"

using System.Net;

using System.Security.Cryptography.X509Certificates;

public class TrustAllCertsPolicy : ICertificatePolicy {

public bool CheckValidationResult(

ServicePoint srvPoint, X509Certificate certificate,

WebRequest request, int certificateProblem) {

return true;

}

[System.Net.ServicePointManager]::CertificatePolicy = New-Object -TypeName TrustAllCertsPolicy

# set the correct TLS Type

[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12

}

# build JSON headers

$headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"

$headers.Add("Accept", 'application/json')

$headers.Add("Content-Type", 'application/json')

# construct Uri

$url="https://"+$hostname+"/api/v1.1/auth/login/"

# Use credentials to request and store a session token from NMC for later use

$result = Invoke-RestMethod -Uri $url -Method Post -Headers $headers -Body $credentials

$token = $result.token

$headers.Add("Authorization","Token " + $token)

# Set the URL for the folder update NMC API endpoint

$CacheUrl="https://"+$hostname+"/api/v1.1/volumes/" + $volume_guid + "/filers/" + $filer_serial + "/auto-cached-folders/"

# build the body for the folder update

$body = @{

path = $FolderPath

mode = "metadata"

}

# set folder properties

$response=Invoke-RestMethod -Uri $CacheUrl -Method Post -Headers $headers -Body (ConvertTo-Json -InputObject $body)

write-output $response | ConvertTo-Json

Cache Sizing

SharePoint Search crawls the contents of files in order to extract relevant information. It copies files from the Nasuni Edge Appliance to the SharePoint Search server, then analyzes their content and extracts necessary information to populate the search index.

Ideally, the cache of the Nasuni Edge Appliance should be large enough to contain the dataset that SharePoint is crawling, in addition to any space necessary for the cache to perform its other tasks. As an example, a 40 TB volume might only have 2 TB of data to scan. In this case, the cache of the Nasuni Edge Appliance should include 3 TB for this scanning task, in addition to any space necessary for the cache to perform its other tasks. This would allow the SharePoint dataset to remain in the cache, with some allowance for future growth.

It might not be possible or practical to specify a cache large enough to contain the entire dataset to be crawled. In such cases, it is critical that the Nasuni Edge Appliance be located as close as possible to the object store. This helps to ensure that adequate bandwidth is available for downloading large amounts of data from the object store into the Nasuni Edge Appliance’s cache. In this scenario, a SharePoint crawl takes additional time to complete, because the Nasuni Edge Appliance must bring required data into the cache, and also evict already-scanned data from the cache to make room for more data, before SharePoint can perform the specified crawl. This frequent rolling-over of the contents of the Nasuni Edge Appliance’s cache would have a negative impact on the end-user experience, further emphasizing the need for a dedicated Nasuni Edge Appliance.

Reserved Instances

Public cloud providers might offer special pricing for reserved instances of virtual machines. This special pricing can provide considerable cost savings over the life of a virtual machine. Consult your cloud provider’s product offering for information about purchasing reserved instances for the dedicated Nasuni Edge Appliance and SharePoint Search server.

Virtual Resource Recommendations

Hybrid Search

Deploying virtual machines with additional resources showed no increase in crawl performance, due to the workflow involving sending data to the Office 365 index.

Nasuni Virtual Edge Appliance

Recommended resources:

8 vCPUs.
16 GiB Memory.
1 TiB Cache (SSD) providing at least 5000 IOPs.
256 GiB COW (SSD).

SharePoint Search Server

Recommended resources:

16 vCPUs.
32 GiB Memory.
Consult Microsoft documentation for disk requirements.

Standard Search

When deploying an architecture that does not involve the Office 365 index, increased crawling performance can be obtained by spreading the crawling load across multiple Edge Appliances.

Nasuni Virtual Edge Appliance

Recommended resources:

8 vCPUs (maximum: 32 vCPUs *).
16 GiB Memory (maximum: 64GiB *).
1 TiB Cache (SSD) providing at least 5000 IOPs.
256 GiB COW (SSD).

SharePoint Search Server

Recommended resources:

16 vCPUs (maximum: 32 vCPUs *).
32 GiB Memory (maximum: 64 GiB *).
Consult Microsoft documentation for disk requirements.

* Adding resources beyond these does not lead to increased crawling performance.

Technical Support

Online self-help resources and Technical Support are available at www.nasuni.com/support.