Overview
Microsoft Distributed File System (DFS) allows administrators to combine multiple file shares across file servers into a unified namespace. There are two distinct components to DFS: DFS Namespaces (DFS-N) and DFS Replication (DFS-R):
DFS-N provides a hierarchical, unified namespace.
DFS-R provides a replication service to keep the data stored on the underlying file servers in sync.
DFS-R only runs on Windows servers and is, therefore, not compatible with Nasuni.
DFS-N, on the other hand, fits in well with Nasuni’s global file system capabilities. Customers can combine Nasuni and DFS-N technologies to provide users with a single namespace through which their data can be accessed. Since this namespace is abstracted from the underlying Edge Appliances, storage administrators can change the underlying storage while keeping consistent path names for users.
Tip: If the same user has logged into multiple Edge Appliances in a day, this might indicate that DFS is switching which Edge Appliance they are connected to. You can check for such logins on multiple Edge Appliances by examining the audit logs. If this is happening, check to ensure that DFS is configured properly.
Terminology
DFS link: a folder in the DFS tree structure that points to one or more shared folders.
DFS namespace: a logically structured grouping of shared folders located on different servers; provides a virtual view of shared folders.
domain-based namespace: a namespace hosted by multiple servers where the topology data is stored in Active Directory.
namespace server: a Windows server running the DFS service and hosting a namespace; see also “root target”.
preferred target: the shared folder that a client connects to, based on the referral ordering.
referral: ordered list of targets supplied to a DFS client by a namespace server.
referral ordering: the process of sorting DFS targets for presentation to a DFS client.
root target: a Windows server running the DFS service and hosting a namespace; see also “namespace server”.
site: a grouping of well-connected networks defined by IP subnets; normally tied to the underlying physical structure of Active Directory; also known as “AD site”.
site link: reflection of the inter-site connectivity among Active Directory sites.
site link cost: a control mechanism used to define the relative priority of Active Directory site links.
standalone namespace: a namespace hosted by one server.
target: a shared folder to which a DFS link points.
Requirements
Namespace Servers
DFS namespaces must be hosted by a Windows Server. The namespace component is available on all currently supported versions of Windows Server. In Windows Server 2008, Microsoft introduced a new mode for namespaces called “Windows Server 2008 mode”. This mode added enhanced scalability and support for access-based enumeration (ABE) to namespaces. Any new namespace deployed since Windows Server 2008 defaults to this mode, unless the administrator overrides the setting. Legacy Windows Server 2003 namespaces can be upgraded to the new mode.
Clients
Windows: All currently supported versions of Windows support DFS namespaces.
Important: A number of Windows protected words cannot be used as the DFS path or DFS target path, including: CON, PRN, AUX, CLOCK$, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9. For details, see Naming and referencing shares, directories, files, and metadata.
macOS: All currently supported versions of macOS support traversing DFS namespaces. While a macOS client can navigate a DFS namespace, it does not have the same access to utilities for managing or troubleshooting as a Windows client.
Linux: Linux clients can connect to DFS namespaces. Consult distribution documentation for specifics.
Dependencies
Active Directory Sites and Services
DFS-N requires a carefully planned Active Directory site topology in order to function properly. Active Directory sites should be defined to match the underlying physical network topology. Subnets must be assigned to the appropriate sites. Particular attention should be paid to any subnets used by VPN users to ensure that the subnet belongs to the site where the VPN server is located.
Site links among Active Directory sites must also be properly configured. Appropriate costs should be applied to each link since DFS-N leverages this information when ordering referrals. Improper costing might lead to sub-optimal routing of users, namely, users sent across a slow WAN link to a remote Edge Appliance.
Namespace Servers
Namespace servers are Windows servers running the DFS service. Clients connect to a namespace server and browse through the namespace until they reach a DFS link, which is a folder with one or more targets. At that point, the clients are directed to the appropriate underlying target. While only one namespace server is required to host a namespace, it is recommended to deploy multiple namespace servers to provide resiliency and increased performance.
Resiliency
DFS namespaces can be configured as either standalone or domain-based. The configuration for a standalone namespace is stored locally on the namespace server. The configuration for a domain-based namespace is stored within Active Directory. Each namespace type has different requirements for resiliency.
Standalone Namespace
With a standalone namespace, the only method for providing resiliency is to deploy the namespace server on a cluster. This protects against failure of a single server.
Domain-based Namespace
The configuration information for a domain-based namespace is stored within Active Directory. Resiliency can be accomplished by deploying the DFS role across multiple Windows servers and adding each as a namespace server. The individual namespace servers pull the namespace configuration information from Active Directory and create the necessary local resources. This method allows for the deployment of namespace servers in multiple locations, improving resiliency by protecting against both server and site failures.
Performance
When clients first connect to a namespace, they must communicate with a namespace server. As they browse through the namespace, they continue to interact with a namespace server until they come to a DFS link and are handed off to an underlying Edge Appliance. This navigation process is no different than browsing any UNC path via SMB, other than the fact that it only involves metadata, that is, folder traversal. Therefore, any concerns regarding the impact of latency on the user experience apply here. It is best to deploy namespace servers as close to the users as possible. This minimizes latency and makes for a more responsive environment leading to a better user experience.
With a domain-based namespace, this can be accomplished by deploying the DFS role on at least one server in each site hosting DFS clients. This server could be a domain controller or any other Windows server, such as a print server or SCCM distribution point. The DFS Namespaces role can be added via the “Add Roles and Features Wizard”:
After the role is installed, the server can be added as a namespace server by using the “Add Namespace Server…” wizard in the DFS Management console:
Recommended Deployment
In-site Only Targets
To avoid orphaned global locks, “missing” data, users conflicting with themselves, and other problems due to the time it takes for caches to synchronize across Edge Appliances, Nasuni recommends configuring the referral ordering to only return targets within a client’s site.
Open the DFS Management Console.
In the console tree, under the Namespace node, right-click the namespace and click Properties.
On the Referrals tab, select “Exclude targets outside of the client’s site”.
Manual Failover
While DFS does provide options for automatic failover of clients from one target to another, Nasuni recommends that customers develop a manual failover process to provide better control over which target is used. In this model, multiple targets are defined for a folder, but only one target is marked as active, while the remaining target is manually disabled. In the case of an outage of the primary target, an administrator can then mark the secondary target as active. When the primary target is available, the administrator can choose an appropriate time, most likely outside of normal working hours, to reactivate the primary target and disable the secondary target.
Advanced Deployment
Data Propagation Concerns
When designing any DFS infrastructure involving data hosted on Nasuni Edge Appliances, consideration must be given to how long it takes data to propagate among the appliances. Since DFS can transparently direct clients to different Edge Appliances, clients might encounter scenarios where the Edge Appliance caches have not yet synchronized. This can manifest as “missing” files, users conflicting with themselves, orphaned global locks, or stale data.
If file system consistency is the paramount concern, then users should be prevented from connecting to appliances outside of their site by using the “Exclude targets outside of the client’s site” referral override setting.
If maintaining access to data in the case of a failure is the paramount concern, then users must be educated regarding the data propagation process. They must understand that, if they are redirected to a new appliance due to a failure of their preferred appliance, their latest data might not have propagated. Further, proper Active Directory site configuration, including appropriate subnet-to-site mapping and site link costs, are required to ensure that clients are connected to the appropriate Edge Appliance not only during normal operations, but also in case of a failure. Any changes to the Active Directory site configuration — for example, updating site link costs, moving subnets to different sites, or adding new sites — must take into account the effect they have on the referral ordering for DFS clients.
Example Deployment
Lowest-cost Routing
DFS-N can be used to connect clients to the closest Edge Appliance while providing a common path for all users. This routing is accomplished by defining multiple targets pointing to the same share on different Edge Appliances.
In the screenshot above, there are two targets for the Projects folder. Note that the Site column is different for each entry. Assuming the default referral ordering settings, clients in the Hub site connect to “ea1”, whereas clients in the Spoke A site connect to “ea2”. This provides optimal performance for each set of clients, because they access their data via the closest Edge Appliance and do not need to traverse slow WAN links to get to their data.
Clients in Spoke E have no local target to connect to. Therefore, they use site costs to determine the closest available target. In this example, the Hub site is linked directly to Spoke E, so Spoke E clients connect to the DFS Target Hub. If that target is unavailable, clients calculate the next lowest cost target: either A, B, or C, all of which have a link cost of 250: Spoke E to Hub (150) + Hub to Spoke A, B, or C (100) = 250. If the default referral ordering rules are in place, each client in Spoke E receives the following referral list:
Rank | Target | Reason |
0,1,2 | Random ordering of DFS Target A, DFS Target B, or DFS Target C, for example: 0: DFS Target B 1: DFS Target A 2: DFS Target C | The cost to connect to Spoke A, B, or C from Spoke E is 250 (150 to hub + 100 to Spoke A, B, or C = 250); the ordering of A, B, and C is random because the cost is the same and there are no referral overrides in place |
3 | DFS Target D | The cost to connect to Spoke D from Spoke E is 300 (150 to hub + 150 to Spoke D = 300) |
In this scenario, clients in Spoke E might connect to different targets, so particular attention must be paid to the consequences of the data propagation process (see above).
For an option to mitigate the random ordering of Targets A, B, and C, see the “Last Among Targets” section below.
Automatic Failover
DFS-N can be used to provide protection against data unavailability due to the failure or unreachability of an Edge Appliance. When a DFS folder has multiple targets, clients connect to a preferred target based on the referral ordering.
In the screenshot above, assuming that the default referral ordering rules are in effect and that clients can connect to targets outside of their site, clients in Spoke A prefer to connect to ea2. If ea2 becomes unavailable for some reason, those clients connect to ea1 instead.
When ea2 becomes available again, clients in Spoke A do not reconnect to it immediately. When a Spoke A client updates its referral (due to the referral cache TTL expiring), it checks whether ea1 is still in the list. If it is, ea1 remains as the active target, even though there is a lower cost alternative (ea2) available. This is referred to as client “stickiness.”
When a Spoke A client goes through an event such as a logout or sleep/resume, it reconnects to ea2.
Client Failback
If the option “Clients fail back to preferred targets” is set at the namespace root or on the DFS folder (DFS folders inherit the failback setting from the root), then the clients reconnect to ea2 as soon as it is available.
While it might appear that it is preferable for clients to fail back to their normally preferred (local) Edge Appliance as soon as possible, you must consider the implications to data availability due to data propagation and global locks. When the failure of ea2 happens, and clients are reconnected to ea1, they are likely to experience missing data, because the files they have most recently worked on have not propagated to ea1 yet. As the users start working on ea1, they create new data on it. When ea2 becomes available again, if the users immediately fail back to ea2, they can once again experience missing data due to the delay in data propagation from ea1 to ea2. To avoid this second occurrence of data propagation induced problems, the clients can be left connected to ea1 until they go through a logoff/logon or sleep/resume cycle, at which point they reconnect to ea2. This is more likely to afford time for ea1 and ea2 to sync so that the same data is available across both Edge Appliances.
Last Among Targets
It is possible to limit the failover options so that clients either connect to their local appliance or only to a hub appliance. Using the example diagram above, the referral ordering can be modified in this fashion:
Enable the “Exclude targets outside of the client’s site” setting.
Mark the DFS Hub Target as “Last among All Targets”.
With these overrides in place, a client in Spoke A receives the following ordered referral:
Rank | Target |
0 | DFS Target A |
1 | DFS Hub Target |
This means that the client normally connects to DFS Target A, its local appliance, but, if there is a failure, it only connects to the DFS Hub Target. This technique can be used to mitigate issues with users in the same site connecting to different remote targets in the case of a failure. This avoids confusion related to data propagation delays.
Troubleshooting
Confirming a client is in the correct Active Directory site
To confirm that a client is in the correct Active Directory site, run this command from an elevated command prompt on the client workstation:
nltest /dsgetsite
The results should match the site and subnet as listed in Active Directory Sites and Services. For example, if a client is physically located in Site A, but is showing an IP address of clients in Site B, it shows the wrong site on that local client. This would cause the client to attach incorrectly to the Nasuni Edge Appliance in Site B, instead of to the local Edge Appliance in Site A.
Windows
DFS Tab
You can view the list of referrals for a DFS folder by opening the properties for the folder via Windows Explorer. On the DFS tab of the Properties window, you find a list of targets, as well as an indication of which target is currently active.
In the screenshot above, there are two targets for the Projects folder: \\ea1\Projects and \\ea2\Projects. The Active column indicates to which target the client is currently connected; in this example, the ea2 target is marked “Yes” indicating that the client is connected to that host. The active target is determined by the referral ordering settings and the availability of the underlying targets. A user with administrator rights can override the active target by selecting another target from the list and clicking the “Set Active” button.
The example above illustrates the view of a DFS Link. If you were to look at the properties of a DFS folder that does not contain any targets, the view would indicate to which namespace server the client is connected. At this level, there is no interaction yet with the Nasuni layer.
dfsutil.exe
The dfsutil.exe program can be used to manage DFS namespaces, servers, and clients from the command line. It is not installed by default on clients, but it can be added to the client by installing the DFS management tools or by copying it from a namespace server.
Viewing the Referral Cache
On the client side, dfsutil is useful for examining the client’s referral cache. Running “dfsutil cache referral” returns results like the example below.
C:\Users\jdoe>dfsutil cache referral
2 entries...
Entry: \DC\public\Projects
ShortEntry: \DC\public\Projects
Expires in 0 seconds
UseCount: 0 Type:0x1 ( DFS )
0:[\ea2\Projects] AccessStatus: 0 ( ACTIVE TARGETSET )
1:[\ea1\Projects] ( TARGETSET )
In this example, we see an entry for the DFS link named Projects. There are two targets for the link: \ea1\Projects and \ea2\Projects. Each link has a rank associated with it. This rank is based on the referral ordering settings for the namespace. In this example, \ea2\Projects has a rank of 0 and is marked as “ACTIVE” because it is in the same Active Directory site as the client. The second target, \ea1\Projects, has a rank of 1 and it is not currently active.
Flushing the Referral Cache
The client’s referral cache has a default TTL of 300 seconds for namespace folders and 1800 seconds for DFS links. (These can be overridden by an administrator.) The dfsutil program can be used to flush the referral cache by running “dfsutil cache referral flush” from an elevated command prompt.
Setting the Active Target
dfsutil can be used to manipulate the active target for a DFS link by running “dfsutil client property state <dfspath> active”. This accomplishes the same result as using the “Set Active” button in the DFS folder Properties.
Client Fail Back
You can view if the “Clients fail back to preferred target” override is set by examining the referral cache. Run “dfsutil cache referral” and look for text that says “DFS FAILBACK_ENABLED in the output (line 2):
Expires in 51 seconds
UseCount: 2 Type:0x8001 ( DFS FAILBACK_ENABLED )
0:[\ea2\Projects] AccessStatus: 0 ( ACTIVE TARGETSET )
1:[\ea1\Projects] ( TARGETSET )
macOS
Unlike Windows, macOS does not provide a built-in graphical utility to display referral ordering and the currently active paths, but both can be achieved using the Mac Terminal.
List referral ordering
The macOS terminal utility “smbutil” can be used to list DFS referral ordering. This syntax can be used with smbutil to list DFS referral ordering:
# smbutil dfs <DFS path>
For example:
# smbutil dfs smb://support.nasuni.net/dfs-demo/homefolders
Example smbutil dfs results:
List Active DFS Path
While macOS doesn’t list the active DFS path, you can use the macOS terminal to see the IP addresses the client is accessing over port 445 (the default SMB port). The following command lists SMB connections:
# netstat -an | grep ESTABLISHED | grep ".445 "
Example netstat output:
Appendix: Using Microsoft Distributed File System (DFS) to Perform a Minimally Disruptive Update of NEAs
Overview
Microsoft Distributed File System (DFS) provides flexibility for multiple SMB servers with common shares to be distributed locally or across geographic Active Directory sites. Nasuni’s global file system is a supported target of Microsoft’s DFS capabilities, and has been demonstrated useful for failover situations. The ability for an administrator to issue DFS namespace changes in order to steer clients to specific available Nasuni Edge Appliances (NEAs) makes it a viable option for planned maintenance events.
The intent of this section is to ensure that, during a planned maintenance event, Nasuni Edge Appliance snapshot and sync remain consistent, so that the risk of conflicts and other issues of file integrity are best avoided.
To leverage DFS capabilities so that planned disruptions provide the greatest protection of data synchronization, a prescribed process involving both DFS and Nasuni administration is detailed here. The scope of the procedure focuses on initially configuring DFS for a scenario of updating targets belonging to a single site that is within reach of all clients, using target referral manipulation and CIFS client resetting to help force clients to change to a standby NEA. Scalable options are possible with a variety of DFS designs. Consult Nasuni’s DFS Configuration Guide for additional information.
The duration of a planned minimally disruptive update (MDU) partly depends on the process of draining active connections and performing final synchronizing of the volume.
Desired Outcomes of Procedure
The following are the desired outcomes of the procedure:
Windows Clients
Completely idle or disconnected Windows clients should automatically begin operations following contact with the DFS namespace server and they are issued the updated referral. When they begin to perform I/O under failover, normal behavior is expected.
Connected clients with intermittent activity:
Should automatically switch new activity to the failover, after the referral cache is expired and the target changes are learned.
If not automatically switched, the clients might become briefly affected during the client reset, but Windows has been observed to recover automatically. Example: An open Explorer window to the Nasuni share.
Clients with read or write operations “in-flight” are expected to continue under the previous DFS referral order until it is interrupted. The Windows CIFS client should change to the new preferred target, and re-negotiate a new connection to the standby NEA using the original namespace.
Higher level application recovery behavior can vary, depending on its architecture and the type of SMB operations taking place. User actions might be needed to remedy the interruption.
Non-Windows Clients
Linux: Depending on Linux distribution and CIFS client, results can vary from automatically switching to needing to re-connect.
MacOS: After being reset, MacOS 12 Finder was observed to hang for approximately 30 seconds until an error was displayed. The error could be responded to with an “Ignore” button, after which new Finder activity could resume on the failover target NEA.
Nasuni Global Locking
The act of disconnecting clients on a given NEA also releases any file locks that a client has placed. The release of the lock also propagates to Nasuni’s Global Locking service.
The outcome from forcefully releasing a lock varies, depending on the workflow.
For example, a MS Office document can continue to be edited and eventually be saved to either the original NEA or the secondary during the MDU. However, a time window is created where the document can be re-opened by another user, risking potential conflict or data corruption.
Important: Nasuni’s Global Locking technology can introduce challenges to restarting an interrupted new write operation. The function of global locking places an exclusive lock on files being created. Therefore, previous interruptions leave a file artifact and a lock that blocks the re-try of the save until the lock can clear.
Requirements for Procedure
The following are required for performing the procedure:
One or more shared volumes between NEAs participating in an active-standby capacity.
A DFS namespace hosted by a Windows Server in at least “Windows Server 2008” mode.
A DFS share with referrals in an active-standby approach for the group or zone of Edge Appliances, which CIFS clients are referencing as their SMB server. See configuration example below.
DFS cache duration should be relatively low, and deemed acceptable for clients to learn of a record update during the MDU.
In preparation for the event, DFS cache duration can be optionally set to a lower than normal value, and then returned to its previous setting when the maintenance window is over.
It is recommended that the effective failback mode for DFS clients is enabled. See configuration example below.
Overview of Failover and Update Procedure
Design and deploy a DFS namespace with multiple NEAs within a site. See configuration example.
Optional: Consider first performing the update on the standby NEAs.
Optional: Before the maintenance window, lower the DFS cache duration setting for the shares. The change would need to occur at an early enough time to account for previous referral cache values to also expire.
Perform a snapshot, or ensure a recent snapshot has completed, on the active NEA, then synced to the standby NEA.
On the active NEA, make the shares Read Only.
Making the active NEA shares Read Only prevents clients that attempt to re-home to the primary NEA, after having their connection reset, from being able to write new data.On the DFS namespace server, disable the shares referral target to begin moving clients to the standby appliance.
Reset CIFS clients on the previously active NEA.
Perform a final snapshot and sync on the active NEA volumes.
Complete the active NEA update.
Overview of Failback after Update
Revert the primary NEA shares to Read/Write.
Perform or ensure a recent snapshot is completed on the passive NEA and synced to the active NEA.
Re-enable the disabled referral target to begin moving clients back to the active NEA.
Reset CIFS clients on the standby NEA.
Perform a final snapshot and sync on the standby NEA volumes.
Failover and Update Procedure
Optional: Update on the standby NEAs. Follow the steps of the update procedure in the Edge Appliance Administration Guide.
Optional: Before the maintenance window, lower the DFS cache duration setting for the shares.
From the DFS Management console on the namespace server, adjust the cache on the namespace settings, or on the shares.
Perform or ensure a recent snapshot is completed on the active NEA, then synced to the standby NEA, by following these steps:
On the NMC, click Volumes.
Expand the relevant Volumes to view the time of the Last Snapshot.
Manually perform the latest snapshot as needed.
On the active NEA, make the shares Read Only, by following these steps:
From the list of Volumes, select the Volumes shared between the active-standby NEAs.
Click Shares.
Select shares to edit on the active NEA, then click Edit.
Important: Ensure that only the NEA targeted for client disconnecting and then the MDU are selected.Check the Read Only box.
Click Update Share.
On the DFS namespace server, disable the shares referral target to begin moving clients to the standby appliance, by following these steps:
From the DFS Management console on the namespace server, disable the referral target of the active NEA.
Reset the CIFS clients on the previously active NEA, by following these steps:
On the NMC, click Filers.
ClickCIFS Clients.
Select Reset All Clients.
Select the active NEA and then click Reset All Clients.
Repeat step 3 to perform a final snapshot and sync on the active NEA volumes.
Complete the active NEA update procedure in the Edge Appliance Administration Guide.