S3 Edge

Nasuni S3 Edge is a feature developed for the Nasuni Edge Appliance to provide support for S3 protocol access to volume data. The S3 protocol is in addition to the SMB, NFS, and FTP protocols already available. While most customers use the SMB protocol to access Nasuni volumes, S3 is a newer, more efficient protocol to access the same volumes and to read and write data.

The benefits of Nasuni S3 Edge include the following:

Access to data might require other traditional NAS storage protocols such as SMB or NFS. Nasuni provides multiple protocol flexibility with the front-end protocol support of S3 to facilitate a wide range of applications and use cases.
Because classification and compliance are important in many file management scenarios, Nasuni S3 Edge supports an extended number of tags (up to 20).
Provides the capability of improving the write performance to Nasuni Edge Appliances (NEA) using S3 protocol when compared to traditional NAS storage protocols.

Overview of Nasuni S3 Edge

The S3 protocol is used for interfacing with object storage over a network, by using buckets, keys, and operations. Even though the S3 API (Application Programming Interface) is developed and released by Amazon, it has implementations within a wide variety of storage systems. The level of complexity for these solutions varies from basic functions (create, update, and delete) to full S3 compatibility.

S3 is an HTTP REST API. It is an API that uses HTTP requests to get, put, post, and delete data. S3 supports the REST API as described in S3 Edge API.

Nasuni S3 Edge

Nasuni S3 Edge is a solution for enabling S3 protocol front-end access to UniFS volumes. Nasuni S3 Edge integrates the S3 protocol into the Nasuni Edge Appliance (NEA) in order to support your goals, particularly around write performance. Nasuni S3 Edge is implemented as a new web service that provides an S3 interface concentrating on write throughput.

Any backend storage option is supported. The S3 aspect of this feature is front-end access, so that only the client or application needs to "speak" S3 to NEA.

You can run a mix of S3-enabled volumes and non-S3-enabled volumes on the same appliance.

Features supported

Nasuni S3 Edge enables you to perform all of the following actions:

Utilize the standard S3 API methods, including GET, PUT, and DELETE.
Support both path style and virtual hosted style access.
Upload files up to 5 TB.
Configure S3 Buckets at any point in the file system tree of any existing volume.

Note: Only NTFS Exclusive volumes and Public volumes are currently supported.

For buckets configured on volumes:
Create new directories at any place in the Bucket's file system.
Upload new files to any place in the Bucket's file system.
List the contents of any directory in the Bucket, including the top level of the bucket itself.
Recursively list the contents of any directory in the Bucket, including sub-directories.
Retrieve the contents of a whole file in the Bucket.
Retrieve a sub-range of bytes from a file in the Bucket.
Delete files and (empty) directories from a Bucket.
Create any number of Access Key and Secret Key pairs on an Edge Appliance, which can be used to authorize access to all the Buckets on that Edge Appliance.
Specify a set of up to 20 Object Tags (key value pairs) to place on any file or directory in the Bucket.
Retrieve the set of Object Tags on a file or directory.
Remove all the Object Tags from a file or directory, leaving the actual object intact.

Important use cases

Nasuni aims to support S3 workloads, including the following real-world use cases:

Accelerating the sharing of large DICOM images and instrument data for faster diagnosis of patient conditions.
Accessing a large data set stored in an S3 bucket in Boston from England. This requires the S3 API on an Edge Appliance to take advantage of local caching.
Distributing game builds to multiple devices as an alternative to a content delivery network (CDN) that is slow to populate.
Faster sharing of medication and prescription information.
Implementing multiple protocol access to integrate with legacy applications.
Leveraging artificial intelligence (AI) and machine learning (ML) tools that further automate processes.
Modernizing data centers by converting workloads from SMB to S3 for performance and manageability improvements.
Reducing complexity by standardizing on a single platform that replaces a private cloud solution.
Reducing NEAs for smaller contract manufacturers and suppliers.
Reducing data duplication across regions by avoiding expensive replication.
Sharing CT scans of defective devices from facilities around the world.

Requirements and limitations

The following requirements or limitations apply to Nasuni S3 Edge:

An NEA running 9.14.5 or later is required.
The S3 Edge feature license must be enabled on your account to use S3. To enable the S3 feature license for your account, contact your Account Manager.
S3 protocol access is configured using your Nasuni account (https://account.nasuni.com) and enabled on a per-appliance level.
Buckets and end-users access keys and secret keys are configured in account.nasuni.com (In the S3 Edge tab in the Account page).
Supports standard update and recovery Edge Appliance features.
Only s3v4 (Amazon S3 Signature Version 4) authentication and authorization is supported.
Only NTFS Exclusive volumes and Public volumes are currently supported. Any file appearing on an NTFS Exclusive volume that exists within an S3 bucket is fully accessible by the S3 user.
All end-user access keys that access your Nasuni volumes are recorded as a single user (“Scube”) in logs. ("Scube" is Nasuni's implementation of the S3 API originally developed by Amazon. "Scube" is short for “S cubed” or S³.)
Multiple protocol volumes are supported. Nasuni S3 Edge supports multiprotocol access to the same data via SMB, NFS, FTP, and S3. To enable multiprotocol with NFS, FTP, and S3, you must use Public volumes. To enable multiprotocol with SMB and S3, you can use either Public or NTFS Exclusive volumes. In either case, you must create or have an existing volume first. Afterwards, you can configure S3 protocol access for that volume.
If you have used this feature before general availability, you must redeploy your implementation for production use.

Understanding S3 Edge

This section describes how the service works.

Implementation

Nasuni designed and implemented a service that “talks” S3 via HTTPS so that requests using standard core S3 clients and libraries can be used. Cyberduck is an example of such a client. An example of a library is boto3.

The NEA is able to accept S3 API requests directed to it. S3 requests are authenticated via header signature inspection by s3v4 (Amazon S3 Signature Version 4), followed by reading or writing to the UniFS volume hosting the destination bucket.

The Cyberduck and AWS CLI clients, as well as the boto3 library, are automatically recognized by the S3 service. They can be used without special configuration.

SSL

S3 uses the NEA SSL certificate. It is recommended to install a certificate from a known authority if not already in use.

Otherwise, by default, NEAs operate with a self-signed certificate, which does not validate as a CA-supplied certificate. If this is the case, you might want to disable SSL verification for your S3 clients to not receive extraneous errors.

Activating S3 Edge

By default, S3 Edge is dormant.

To enable the S3 Edge license for your account, first contact your Account Manager and discuss the use case you would like to pursue.

After the license is enabled, to configure the S3 Edge feature, you must use your account at account.nasuni.com.

Note: The information entered through nasuni.com is encrypted so that only NEAs can read it.

API documentation

The following documentation on APIs is available:

Nasuni S3 Edge API documentation.

Configuring S3 Edge

This section explains how to configure the S3 Edge service.

Configuration flow

The system diagram shown below illustrates how the S3 configuration is established onto the NEA. The green arrows depict how the S3 service is set up. The yellow arrow depicts how you write the configuration in the form of JSON data and send it to the NEAs with S3 enabled. The blue arrows show the end result: the end-user talking S3 to the S3 Edge Server after the configuration credentials have been saved.

Configuration flow.

This section focuses primarily on the yellow arrow and describes how the desired configuration is saved onto the appliance.

Considerations

The following are caveats to keep in mind as you fill out a new S3 configuration:

Bucket configuration is per-volume: Most Nasuni access points are configured on particular appliances, and are not shared across appliances. However, an S3 bucket configured on a volume is available on every appliance that shares that volume and has the S3 service enabled.
Bucket names must be unique: Bucket names must be unique among all bucket definitions in the S3 Edge configuration NOC form.
All keys are equivalent: Any keys defined in the configuration file are available on all S3-enabled appliances, and have read/write access to all buckets.

Configuration procedure

Note: It is assumed that Nasuni personnel have already enabled the related S3 feature license for your account.

To configure S3, follow these steps:

To start a new S3 Configuration, log on to account.nasuni.com.
Click “S3 Edge”.
On the “S3 Edge” page, there is a Configuration text box.
In the Configuration text box, provide a JSON payload detailing the buckets and keys that reflect the desired configuration. (For more information on that JSON, see Configuration Structure below.)
There is one S3 configuration file for all volumes in an account.
Configuration information includes the following:
- S3 buckets: Defines the existing volumes and starting directory paths within the volumes that shall be accessed via S3.
- S3 access keys/secret keys: Provides user access to defined S3 buckets, similar to username and password.
- SDDL (Security Descriptor Definition Language) information (for NTFS Exclusive volumes): Applies a template NTACL (file permissions) to S3-created files and directories so that they are accessible on NTFS Exclusive volumes; otherwise, SMB users won't be able to see S3-created files and directories.
Click Save.
The form uses the JSON to generate an encryption string that is made available in the account config INI file for the NEAs to download.
Note: JSON syntax is checked to ensure that the information is in the right format. If the JSON syntax is accurate, a green banner appears at the top of the page.
However, the values provided are not verified. You must ensure that information, such as the volume name and path, is accurate.
Tip: Copy and save the configuration text. Nasuni does not store the information in plaintext.
When the operation is completed, a green banner appears that says, “S3 Edge Configuration changes saved. Changes can take up to an hour to take effect.”
Note the “Last Updated” timestamp below the “Encrypt” button.
You can use this to cross-check the times, and i, and itamp found in the response headers sent from the S3 Edge Server.
Any changes to the S3 configuration can take up to 60 minutes to take effect.
After the changes are committed, the configuration process downloads the encrypted JSON, decrypts it, checks for any changes made, and writes the new configuration to the S3 database.
It is important to note that the configuration process does not affect the operation of the S3 Edge Service. After the new configuration arrives at the database, it takes at most 30 seconds for the server to acquire the new credentials and use them to provide access.
On the same page, in the “Enable S3 Edge on appliances” section, there is a list of the Edge Appliances on your account, including the Serial Number and Description.
Enable S3 Edge on appliances by following these steps:
1. In the list, find those Edge Appliances on which you want to enable the S3 protocol.
2. For each of the Edge Appliances on which you want to enable the S3 protocol, enable the S3 Edge box to the right of the Edge Appliance’s Serial Number.
3. When finished, click Save.
4. When the operation completes, a green banner appears that says, “S3 Edge for selected serial numbers is successfully Saved.”
Use the NMC to refresh the licenses for the applicable appliances:
1. On the Filers tab, click Refresh License. The Refresh Subscription License page appears.
2. From the list of Edge Appliances, select the Edge Appliances whose license you want to refresh.
3. Click Update Filers. The Refresh Subscription License dialog box appears.
4. Click Refresh License.
  This manually refreshes the license for the selected NEAs and pulls down the latest S3 configuration.
To verify that the new configuration has taken effect, you can query the S3 Edge Server and inspect the response headers that come back in every request-response cycle.
1. You can verify in several ways:

Using Cyberduck, follow these steps:
i. In Cyberduck, "Open Connection". The Open Connection dialog box appears.

ii. From the connection type drop-down list, select "Amazon S3".
iii.In the Server text box, enter the IP of your Edge Appliance.
iv. Ensure that the Port is 443.
v. In the Access Key ID text box, enter the Access Key ID.
vi. In the Secret Access Key text box, enter the Secret Access Key.
vii. Click Connect.
A list of configured S3 buckets appears, along with the response.
viii. Verify that the bucket names are what you expect.
ix. Continue with step b below.

Using the AWS CLI, you can use the IP address in a command of the form:
- aws --no-verify-ssl --endpoint-url https://172.31.42.3/ s3 cp .\MYFILE.txt s3://Bucket_Public_Test/MYFILE.txt“

Note: --no-verify-ssl is only needed for self-signed certificates; signed certificates do not require this flag.

Continue with step b below.

Using the boto3 library, use commands such as these:

>>> import boto3
>>> client = boto3.session.Session(
aws_access_key_id=<ACCESS_KEY>,
aws_secret_access_key=<SECRET_KEY>,
).client('s3', endpoint_url=<URL of NEA>, verify=False)

>>> client.list_buckets()

{'ResponseMetadata': {'HTTPStatusCode': 200,
'HTTPHeaders': {'date': 'Tue, 07 Nov 2023 14:50:17 GMT',
'server': 'Apache',
'strict-transport-security': 'max-age=31536000',
'vary': 'User-Agent',
'nasuni-s3-config': '2023-11-06T21:04:24Z', # Last Successful config timestamp
'nasuni-s3-config-failed': '2023-11-06T21:02:23Z',
'nasuni-nea-version': '9.14.5',
'nasuni-s3-version': '1.0.0',
'content-type': 'application/xml',
'content-length': '303',
'x-frame-options': 'SAMEORIGIN',
'x-content-type-options': 'nosniff',
'content-security-policy': "report-uri /fs/cspalerts; default-src 'none'; script-src 'self' 'unsafe-inline' 'unsafe-eval'; style-src 'self' 'unsafe-inline' 'unsafe-eval'; font-src 'self' 'report-sample'; img-src 'self' data: 'report-sample'; connect-src 'self' 'report-sample'; form-action 'self' 'report-sample'; media-src 'self' 'report-sample'; child-src 'self' 'report-sample'; base-uri 'self' 'report-sample'; frame-src 'self' 'report-sample'; frame-ancestors 'self'; "},
'RetryAttempts': 0},
....

Continue with step b below.

b. There are two entries in the response header that can be inspected to verify that the configuration has changed:

nasuni-s3-config: Timestamp of the latest successful configuration. You can verify the timestamp reported by the form with this value.
nasuni-s3-config-failed: Timestamp of the last configuration to have failed. Does not appear if there is no failure in the latest-submitted configuration.

c. Verify that the timestamp you recorded matches the timestamp in the response.

If the end-user can query buckets from the S3 Edge Server with keys that reflect the new configuration, then the configuration process is successful.
If not, then there might have been some issue along the way that requires attention. See Troubleshooting configuration on page 20.
After a bucket is set up on an existing volume, the end-user can connect to the S3 service using their preferred S3 client (such as Cyberduck).

Configuration Structure

The data is specified as standard JSON and should follow the outlined structure below. The JSON data structure is currently two simple object types, grouped in organizing lists. For Buckets, each object in the list should contain a name, path, and volume attribute. For Keys, a name, access, and secret attribute is required.

Bucket names should follow the following rules:

Bucket names must be between 3 (minimum) and 63 (maximum) characters long.
Bucket names can consist only of lowercase letters, numbers, periods (.), and hyphens (-).
Bucket names must begin and end with a letter or number.
Bucket names must not contain two adjacent periods.
Bucket names must not start with the prefix xn--.

This is the general schema:

{
"buckets": [
{
"name": "bucket-name",
"path": "VOL_PATH",

"mpu_assembly_type": "local"
"volume": "VOL_NAME"
}
],
"keys": [
{
"name": "ACCESS_KEY",
"access": "ACCESS_KEY",
"secret": "SECRET_KEY"
}
]
}

Here is an example with two buckets on the same volume, and two keys:

{
"buckets": [
{
"name": "first-bucket",
"path": "s3/first",
"volume": "my_volume_1"
},
{
"name": "other-bucket",
"path": "s3/other",
"volume": "my_volume_1"
}
],
"keys": [
{
"name": "automation",
"access": "my-access-key",
"secret": "fs5d&g2heHE#@r25"
},
{
"name": "george",
"access": "my-access-key-2",
"secret": "gj50*(gwlsn9jv^i"
}
]
}

Note: JSON syntax is checked to ensure that the information is in the right format. However, the values provided are not verified. You must ensure that information, such as the volume name and path, is accurate.

SDDL Configuration

SDDL (Security Descriptor Definition Language) strings are optional values for S3 Edge configuration information when using NTFS Exclusive volumes. SDDL strings represent NTACL permissions. The permissions applied by specifying the SDDL strings govern how SMB users can interact with the files and directories created via S3 Edge operations. This allows SMB users to view S3 uploaded files. To obtain the appropriate SDDL strings to apply to S3 Edge configuration, follow these steps:

Create a file within a directory with the permissions that you want applied to your S3 uploaded resources.
In Windows PowerShell, use the "Get-Acl" command and the “Format-List” cmdlet to print out the two SDDL strings related to that file and directory.
1. Examples:
  1. Get-Acl '.\File.pptx' | Format-List -Property Sddl
  2. Get-Acl '.\Directory' | Format-List -Property Sddl
2. The Get-Acl command reference is available at: Get-Acl. The Format-List cmdlet reference is available at: Format-List .
Add those two SDDL strings to the account.nasuni.com S3 Edge configuration for each specific S3 bucket as needed with the corresponding keys “file_sddl” and “dir_sddl”, and click Save.

Note: For successful configuration, you must configure both the file and directory SDDL strings.

For more information on SDDL, see Security Descriptor Definition Language for Conditional ACEs documentation.

Here is an example for an S3 configuration with both file and directory SDDLs configured:

{

"buckets": [

{

"name": "bucket-1",

"path": "/path/to/somewhere",

"volume": "vol-1",

"file_sddl": "O:S-1-5-80-956008885-3418522649-1831038044-1853292631-2271478464G:S-1-5-80-956008885-3418522649-1831038044-1853292631-2271478464D:PAI(A;OICIIO;GA;;;CO)(A;OICIIO;GA;;;SY)(A;;0x1301bf;;;SY)(A;OICIIO;GA;;;BA)(A;;0x1301bf;;;BA)(A;OICIIO;GXGR;;;BU)(A;;0x1200a9;;;BU)(A;CIIO;GA;;;S-1-5-80-956008885-3418522649-1831038044-1853292631-2271478464)(A;;FA;;;S-1-5-80-956008885-3418522649-1831038044-1853292631-2271478464)(A;;0x1200a9;;;AC)(A;OICIIO;GXGR;;;AC)(A;;0x1200a9;;;S-1-15-2-2)(A;OICIIO;GXGR;;;S-1-15-2-2)",

"dir_sddl": "O:S-1-5-80-956008885-3418522649-1831038044-1853292631-2271478464G:S-1-5-80-956008885-3418522649-1831038044-1853292631-2271478464D:PAI(A;OICIIO;GA;;;CO)(A;OICIIO;GA;;;SY)(A;;0x1301bf;;;SY)(A;OICIIO;GA;;;BA)(A;;0x1301bf;;;BA)(A;OICIIO;GXGR;;;BU)(A;;0x1200a9;;;BU)(A;CIIO;GA;;;S-1-5-80-956008885-3418522649-1831038044-1853292631-2271478464)(A;;FA;;;S-1-5-80-956008885-3418522649-1831038044-1853292631-2271478464)(A;;0x1200a9;;;AC)(A;OICIIO;GXGR;;;AC)(A;;0x1200a9;;;S-1-15-2-2)(A;OICIIO;GXGR;;;S-1-15-2-2)"

}

"keys": [

{

"name": "key-1",

"access": "access-1",

"secret": "secret-1"

}

]

}

Multipart Upload Configuration

S3 Edge supports multipart upload for files as of the 10.0 NEA release and later. Multipart upload allows you to upload a single file to S3 Edge as a set of parts thereby maximizing the bandwidth by uploading multiple parts in parallel. Also, multipart upload provides network resiliency against network errors which helps avoid upload restarts in comparison to single PUT object upload.

Starting the in the 10.1 NEA release and later, Nasuni allows two different types of multipart upload to S3 Edge:

Local assembly
Cloud assembly

Local assembly for multipart upload is used by default for multipart uploads and works by ingesting all parts of a given file into cache for an edge appliance and stitching/assembling together the parts into one file within cache before acknowledging the S3 client. This assembly type is especially useful when backend cloud bandwidth is limited compared to frontend bandwidth to the Nasuni edge appliance. Note that this type of assembly requires sufficient free space in the cache to hold twice the size of the uploaded file.

Cloud assembly for multipart upload can be configured on a per-bucket basis for S3 Edge and works by committing all transferred parts of a given file to the backend cloud of the edge appliance and stitching together the final file in the cloud before acknowledging the S3 client. This assembly type is recommended when there is sufficient backend cloud bandwidth compared to frontend bandwidth and/or there is a need to upload files larger than the cache size.

Below is an example of configuring buckets with different multipart upload assembly types in S3 Edge Configuration:

{
"buckets": [
{
"name": "bucket-1",
"path": "s3/first",
"volume": "vol_1"

"mpu_assembly_type": "local" # Note: “local” does not need to be specified, it is set by default
},
{
"name": "bucket-2",
"path": "/",
"volume": "vol_2"

"mpu_assembly_type": "cloud"
}

{
"name": "bucket-3",
"path": "/s3",
"volume": "vol_3” # Since “mpu_assembly_type” is not specified, “local” assembly will be used by default
}
]
}

DNS configuration required for virtual hosted style access

S3 Edge supports the common path style access to buckets as well as the more modern virtual hosted style of bucket access. Using the virtual hosted style makes it easy to completely customize the URL of your S3 resources through DNS resolution. Also, S3 services that require virtual hosted style requests to S3 resources are supported.

When accessing an S3 bucket using the virtual hosted style, the name of the bucket being accessed becomes the hostname of the URL, whereas, with path style, the name of the bucket being accessed would be the first component of the URL's path.

For example, these two URLs are equivalent, but use different access styles:

Path style access:

https://myappliance.example-domain.com/my-bucket/file.txt

Virtual hosted style access:

https://my-bucket.s3.myappliance.example-domain.com/file.txt

No configuration is required to enable virtual hosted style access on the appliance, and all configured buckets are accessible using either style. However, the appliance must be addressable via the bucket name followed by .s3. .
This means that the client DNS must resolve <BUCKET-NAME>.s3.<FQDN> to an IP address in use by the appliance for each bucket to be accessed. The name of the appliance does not have to be part of the fully qualified domain name, and no particular DNS record type is required.

Note: If validation is desired, a wildcard SSL certificate must be installed on the appliance, because each bucket has a separate domain name.

Example of use

If you set the hostname to:

bucket-public.s3.myhost.com

when you open the connection, the specified bucket is accessed directly, without requiring navigation:

Troubleshooting configuration

There are several error cases that might arise during the configuration flow. If the JSON is not valid, it is rejected, and no reconfiguration takes place. The JSON might be malformed, have a bad schema, or some other issue. Any issue that occurs in the backend is reported via the standard Nasuni notification system, and the last loaded configuration remains in operation.

Here are some of the failure cases and how to address them:

Symptom	Cause	Remediation
Notification was posted on UI.	Notification contains details on what went wrong.	If the issue is JSON-related, fix the issue. Otherwise, report notification to Support.
Configuration has not changed (last known good configuration remains in use).	JSON might have been invalid, or there might be an unexpected issue.	Try a simpler configuration and see if that works. Check UI and report on any new notifications pertaining to the issue. Check response headers.
Connection refused from S3 Edge Server.	Server might not be up, or some unexpected issue.	Try reverting to previous configuration and see if that works. Otherwise, notify Support.
Not seeing S3 configuration changes.	Not enough time has passed.	Wait 60 minutes or so. OR Refresh the license using the NMC.
Error with S3 configuration.	The directory might not exist.	Verify that directory exists.

Accessing S3 Edge

This section explains how to connect to, and manipulate files with, S3 Edge.

SSL Support

S3 uses the NEA SSL certificate. It is recommended to install a certificate from a known authority if not already in use.

Using the Cyberduck client

Cyberduck is an open-source client that allows many different protocols, such as FTP and WebDAV. It also supports cloud storage such as Swift, Amazon S3, Backblaze, and Microsoft Azure. For S3, Cyberduck allows you to browse directly into S3 buckets where you can add and remove files. Cyberduck works on Windows and Mac. It allows you to interface with S3 through a simple client and provides the basic functionality you need.

Accessing a Nasuni volume via the S3 protocol is done from the root of the HTTPS site for these S3 clients: Cyberduck and AWS CLI. Other clients require a custom endpoint URL and must specify "/s3/" as part of the S3 access request.

With Cyberduck, a generic Amazon S3 profile can be used to access a Nasuni volume via S3.

Supported workflows

Nasuni tests a set of common workflows with Cyberduck. The tested workflows are not exhaustive, but serve to ensure that common operations work for general use cases.

Creating Directories: The creation of new directories in a bucket.
Uploading Files: The uploading of single-part files to a bucket. Note that Cyberduck automatically tries to use multipart uploads when a file larger than 100 MB is uploaded.
Downloading Files: The downloading of files of any supported size.
Listing Files: The listing of directories of any supported size.

Configuring the Cyberduck multipart.upload parameter

By default, Cyberduck utilizes multipart when uploading files greater than 100 MB in size. If using NEA versions before 10.0, Nasuni S3 Edge does not support multipart upload, so changes must be made from the application/client perspective to ensure that the file size limitation can be bypassed. By changing the multipart.upload parameter to be a size greater than 100 MB, you can upload files larger than 100 MB. Also see Multipart Uploads.

Note: Even without the use of multipart uploads, any size files up to 5 TB can be uploaded.

To change the multipart.upload parameter in Cyberduck, create a new Cyberduck profile by copying the below configuration text into a text file and rename the extension to ”.cyberduckprofile“ after saving it or find the existing profile on disk to update it. Then modify the value for the “s3.upload.multipart.threshold” parameter:

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">

<dict>

<key>Default Nickname</key>

<string>Nasuni Edge Appliance (s3)</string>

<key>Description</key>

<string>Nasuni S3 Edge</string>

<key>Protocol</key>

<key>Scheme</key>

<string>https</string>

<key>Vendor</key>

<string>Nasuni</string>

<key>Properties</key>

<array>

s3.upload.multipart.threshold={bytes}

</array>

</dict>

</plist>

* Change {bytes} to a value greater than the largest file size you intend to upload through Cyberduck. For example, the value “10737418240” equates to 10GiB.

Using the Boto3 client

The AWS SDK for Python (also called Boto3) provides a Python API for AWS infrastructure services. Using Boto3, you can build applications on top of Amazon S3. For details on installing and configuring Boto3, see Quic kstart.

S3 uses the NEA SSL certificate. It is recommended to install a certificate from a known authority if not already in use.

Here is an example of creating a Boto3 client:

import boto3

# Change these variables to suit your environment

NEA_ADDR = 'myfiler.example.org'

A_KEY = 'my_access_key'

S_KEY = 'my_secret_key'

# Create an S3 client

s3_edge = boto3.client(

's3',

endpoint_url=f'https://{NEA_ADDR}/',

aws_access_key_id=A_KEY,

aws_secret_access_key=S_KEY,

verify=False

)

# Use S3 client to list available buckets

print(s3_edge.list_buckets())

Recursive List Support

Recursive list support enables a single list command to list all files in a given directory, as well as in any sub-directories. This can be useful in cases such as the following:

Listing all files within migration tools to ensure that all files have migrated successfully to the destination.
Creating a script to review or copy all files within a given directory for automated processes.

Details

When a recursive listing operation is requested, the frontend replies with the first page of results (assuming there are multiple pages of results) while work continues in the background to generate additional pages of results.

Each page can have up to 1000 results (keys).The default number for each page is 1000 results. The minimum number for each page is 200 results.

If the number of results of a recursive list is above 1000 (or above the set maximum), then the application or client must use the “continuation-token” parameter to get subsequent pages of results as needed.

Note: From the perspective of an end-user view, the application automatically lists the contents and gets the next page of results.

To perform a recursive listing operation, you can use either of the following example commands:

Using curl:

GET /{bucket}?list-type=2

Using the AWS CLI:

GET s3api list-objects-v2

Example output using curl

<?xml version="1.0" encoding="UTF-8"?>

<Name>marchnasunifiler-0</Name>

<NextContinuationToken>c3RhZmYvZGF0YS8wMjM3NWNlZi1hNWQ1LTQ3Y2MtYTY2NS0zMjZjMTViMzk5NTQtMC8xLnVuaThjRjIwRjBkLTE2OTg5NzUyODYtMTQyW21pbmlvX2NhY2hlOnYyLHJldHVybjpd</NextContinuationToken>

Example output using AWS CLI command

Command:

aws --no-verify-ssl --endpoint-url https://1.2.3.4 s3api list-objects-v2 --bucket bucket-public --page-size 200 --output table

Output:

Troubleshooting

The following can be useful with recursive list requests:

A configurable time limit is available to end recursive list requests that take too long. The default time limit is 300 seconds. If there are issues, contact Nasuni Support.
You cannot manually stop a recursive listing operation after it has started. The operation continues to run in the background until it completes or hits the time limit mentioned above. This is usually unnoticeable.

Multipart Upload (MPU) support

Multipart upload works by splitting up larger files into smaller pieces and sending the resulting pieces in parallel across the network to the destination. In certain scenarios, multipart upload can improve upload performance where appropriate backend cloud bandwidth is available.

This feature is available automatically after you update to NEA version 10.0.

Assembly Types

Starting in version 10.0.4, there are two multipart assembly types you can use. They are configured on a per-bucket basis. Two Buckets on the same Volume can each have different assembly types, even if they are within the same file hierarchy.

The assembly type is set in the JSON configuration by adding the "mpu_assembly_type" key to your Bucket, with a value of either "local" or "cloud". If the key is not present for a Bucket, "cloud" is the default.

The two assembly types are:

Cloud Assembly Mode: When a new part is uploaded, it is written to the NEA's cache and then also immediately fast-pushed to the cloud. When the upload is finalized, the new file is created using the parts that are now in the cloud, and a new file appears on cache which points to all of the parts in their correct sequence. Cloud assembly's completion speed depends heavily on the NEA-to-cloud bandwidth available.
Local Assembly Mode: When a new part is uploaded, it is written to the NEA's cache and is NOT fast-pushed to the cloud. When the upload is finalized, the new file is created by copying the local-stored part data in the correct sequence to a brand new file. After the file is created and placed into the filesystem, the old parts files are removed from disk. As such, local assembly temporarily requires extra cache space equal to the final file's total size. Local assembly's completion speed depends mostly on the client-to-NEA bandwidth available.
Local Assembly Mode storage duplication risk
Local Assembly works by copying the data of the parts to the new file in the Volume's directory structure, which requires all of the parts to stay on disk as you upload them. This can lead to data duplication in your cloud storage if a snapshot occurs on your Volume while a Local MPU is in progress, when some parts have been uploaded but the operation has not finalized. This can be mitigated by choosing an infrequent snapshot interval, especially if the snapshots are configured to happen during quiet hours.

Benefits of multipart upload support

The benefits of multipart upload support include the following:

Enables more frontend bandwidth to be dedicated to file transfer. A PUT object generally cannot saturate the frontend bandwidth with a single connection.
For the cloud assembly type, files greater than the size of the cache can be uploaded and assembled in parallel.
For the local assembly type, you must have a minimum of the file's size, plus a small buffer for metadata, available on the cache when a new multipart upload is created. When preparing to complete the upload, you must have an additional available space of the original file's size for the assembly to complete.
Enables error handling on transmission errors.
Enables pause and resume during file transfers.
For the cloud assembly type, persists data to the cloud as part of the upload process.
For the local assembly type, all of the file's data must be uploaded to the NEA and remain there until the CompleteMultipartUpload call. After the final file has appeared in the file system hierarchy after the Complete call, it can be persisted to the cloud as part of an ordinary Volume snapshot cycle.
Can see files from other appliances sooner (assuming S3 Edge is enabled or SDDLs are configured correctly).
Uploads survive a recovery process and can resume after the appliance is restored.

Overall recommendations

These are the overall recommendations:

If performance is the main use case of MPU, ensure a ratio of at least 4:1 or more for frontend to backend bandwidth.
If transfer reliability and file uploads larger than the cache are required, utilize MPU.
If you want to use S3 clients without modifying parameters to specifically disable MPU or change MPU thresholds, utilize MPU.
If the above recommendations do not fit your use case, it is recommended to utilize PutObject.

Details of Cloud Assembly MPU

When a multipart upload is initiated to S3 Edge, the following processing occurs:

The client initiates the upload.
The file is separated into parts between 5 MB and 5 GB in size.
These parts are queued for transfer.
The parts are transferred in parallel (usually 5-10 at a time) across the frontend network.
Each part, as it is written to the cache, is also fast pushed to the cloud over the backend network.
The client initiates the completion of the final file.
After all of the parts have been saved successfully and pushed to the cloud, assembly of the parts into one final file occurs in the cloud.
The client receives acknowledgement that the file upload has completed.

In the event of a network outage, this process greatly helps by being able to restart the process for a particular part, rather than needing to re-upload the entire file.

Details of Local Assembly MPU

When a multipart upload is initiated to S3 Edge, the following processing occurs:

The client initiates the upload.
The file is separated into parts between 5 MB and 5 GB in size.
These parts are queued for transfer.
The parts are transferred in parallel (usually 5-10 at a time) across the frontend network.
Each part is written to the cache and remains there until the completion phase.
The client initiates the completion of the final file.
The data in the parts is copied in sequence to a new file, and then the final file is moved into its new place in the file system.
The client receives acknowledgement that the file upload has completed.

In the event of a network outage, the entire file must be re-uploaded.

When to use which upload types

When to use PutObject?

When you don't have a specific reason to use multipart uploads. This is the standard way of uploading files through S3 Edge, and is usually what you need if you have any doubts.
When there is high bandwidth from the client to the NEA, but comparatively lower bandwidth from the NEA to its cloud storage backend, PutObject is often a great choice. If you are uploading many small files, or small to medium files on an intermittent basis, it's best to use PutObject to write a file to the NEA's cache in one operation and allow UniFS's snapshots to persist them to the cloud as it normally would. This usage model is the closest comparison to that of the other available protocols you can enable on a Volume. If you have a reliable connection to the NEA and are thus not worried about disconnection during transfer, you may find PutObject to be a good choice even for larger files.

When to use Local Assembly Multipart Uploads (Local MPU)?

When you have a very high client-to-NEA bandwidth, and want to write a large file to your Volume and have it available to read as quickly as possible. Multipart uploads will upload parts in parallel, which will maximize the usage of available bandwidth. Local Assembly does not write the parts to the cloud as part of its operation, so the creation of the final file takes place entirely on the NEA, making it faster to complete the operation and have the file available.

Note: if your use case primarily relies on Local MPU, you may consider setting the snapshot frequency on your Volume to be infrequent. This can help prevent the parts files of in-progress MPUs from being duplicated to the cloud, keeping storage costs down. See the section on Local Assembly data duplication risks for more information.

When to use Cloud Assembly Multipart Uploads (Cloud MPU)?

Low client-to-NEA bandwidth compared to NEA-to-cloud, such as when the client is not close to the NEA, the network on which the client and NEA exist is experiencing high traffic, or the data is being programmatically generated on-the-fly.
Why: Multipart is recommended for the ability to retry parts, since disconnections mid-transfer could occur. Cloud assembly is recommended because, since you can send data approximately as fast as the NEA can write to the cloud, you won't realize a performance benefit over Local MPU. Since the multipart upload will take as long in either case, by using Cloud Assembly, you will not need to worry about a snapshot duplicating data in the cloud.
When you are uploading very large files that may cause the cache to fill. When the parts are immediately persisted, they can be evicted from the cache as needed to make room for parts uploaded later.
When the client-to-NEA connection is unreliable, and the Volume's snapshot frequency is high. MPU is recommended for the ability to retry parts. Cloud assembly is recommended to avoid the possibility of data duplication. This, however, can cause the Complete phase to take noticeably longer if the upload-parts phase goes well.

Cloud MPU timing considerations

Since every part of a multipart upload must be pushed to the cloud before final assembly, the end-to-end timing of multipart upload depends on the speed with which all parts can be pushed to the cloud and finally assembled into one final file.

Here is a real example of what could occur with the multipart upload of a 3 GB file:

Took 13 seconds to upload the 3 GB file to S3 Edge from the client.
Took 42 seconds to push all of the parts to the cloud and assemble the final file.
Therefore, from the client perspective, it took 42 seconds for the upload to complete.

Thus, the actual upload is very quick (13 seconds), but the client might not see the final completion until some time later.

Parts storage

Parts for ongoing multipart uploads are saved on the cache disk, in a hidden location on the root of the volume (/.volume-internal). This directory is owned by root and has permissions that exclude non-root users. This directory contains all of the active multipart metadata, and all part files data for multipart upload requests.

Interoperability with other Nasuni features

Global File Lock (GFL)

GFL is not supported with multipart upload and an error message occurs if it is tried: "Locking enabled on specified object".
Other S3 Edge operations (such as GET and PUT) continue to work on GFL-enabled directories as if optimized GFL is enabled, regardless of the actual mode.

Antivirus (AV)

If AV is enabled, each part that is uploaded is scanned by AV individually.
For this reason, it is possible for AV to not detect a virus that is broken into multiple pieces.

Directory Quotas

Directory quotas are not supported with multipart upload and an error message occurs: "A quota is enabled on specified object"

Performance considerations

In order to fully take advantage of multipart upload performance, follow these general guidelines:

Ensure that the backend cloud bandwidth is as close to the available frontend network bandwidth as possible. The recommended ratio is at least 4:1 ratio (Frontend vs. Backend bandwidth).
In internal performance tests, significant improvement is seen for high-latency scenarios (>50ms latency) between the client and S3 Edge.
Otherwise, for low-latency scenarios, you might not see improved performance from the client perspective for multipart upload compared to single PUT performance.

S3 Edge and Nasuni Edge Appliances

This section explains how S3 Edge interacts with Nasuni Edge Appliances.

Monitoring S3 traffic

You can monitor S3 traffic with the Edge Appliance UI. With the Edge Appliance UI, the display is on the Network Activity chart.

Because the S3 protocol is through port 443, which has been defined as the port for Mobile Access traffic, the S3 protocol traffic appears as the “Mobile Transmit” traffic for outgoing data and as the “Mobile Receive” traffic for incoming data.

S3 and Global File Lock

The purpose of the Global File Lock feature is to prevent conflicts when two or more users attempt to change the same file on different Nasuni Edge Appliances. If you enable the Global File Lock feature for a directory and its descendants, any files in that directory or its descendants can only be changed by one user at a time. Any other users cannot change the same file at the same time.

In contrast, S3 does not have a concept of locking (or collaboration) on files. The S3 protocol interacts with any GFL-enabled directory using Optimized mode behavior, regardless of what mode is set on a directory.

This means that some care must be taken in how S3 interacts with other protocols (such as SMB) that do perform locking.

Files on a single NEA (shared volume)

When an SMB client and an S3 client both attempt to write to a file, regardless of whether Global File Lock is enabled or disabled, the last client to write to the file wins. No conflict file is created.

Files across two separate NEAs (sharing the volume)

The situation is more complicated if there are two separate NEAs. These are the pertinent cases with Global File Lock turned on:

If the SMB client has locked the file, and the S3 client attempts to write to the file, the write fails and the S3 client receives a 409 error.
If the SMB client has written the file and removed any lock on the file, and the S3 client attempts to read the file, the S3 client receives the latest version of the file that the SMB client wrote.
If the S3 client has written the file, and the SMB client attempts to read the file, the SMB client receives the latest version of the file that the S3 client wrote.

If GFL is turned off with two separate NEAs, this is the expected behavior:

If the SMB client has opened a file, and an S3 client attempts to write to the file, the last client to write to the file wins. Conflict file is created.
If the SMB client writes to a file, and removed any lock on the file, and an S3 client attempts to read the file, the S3 client must wait for data propagation to complete before reading the latest version of the file.
If the S3 client writes a file, and SMB client attempts to read the file, the SMB client must wait for data propagation to complete before reading the latest version of the file.

S3 and case-insensitive volumes

S3 is a case-sensitive protocol, namely, the filename abc.txt is different from the filename ABC.txt. However, Nasuni supports both case-sensitive volumes and case-insensitive volumes (where abc.txt is the same filename as ABC.txt).

Nasuni has determined that there are no data loss or data unavailable scenarios that can occur with S3 and case-insensitive volumes. For example, if there is an existing file named "ABC.txt" on a share, and S3 tries to upload a file named "abc.txt" to the same share, the upload does not succeed, and a generic error message is returned.

If writing distinct file names in a given directory, there are no interoperability issues using S3 with case-insensitive volumes.

Auditing

S3 processes are available for the usual auditing features.

All S3 users (access keys) appear in the Audit logs as user “scube”. ("Scube" stands for "S cubed" meaning S3.) Here is an example (from audit.csv):

2024-03-06 16:44:14.041293,Read,Read Directory,/Test,,scube,scube,1000,,Internal,,,

2024-03-06 16:44:14.200028,Create,Create Directory,/Test/RTA,,scube,scube,1000,,Internal,,,

Checksum

Nasuni S3 Edge does not support streaming checksums. Such streaming checksums are turned on by default in AWS CLI v2.23.0 and later, as well as in Boto3 (Python v1.36.0 and later). For details, see Data Integrity Protections for Amazon S3.

When an S3 client attempts to use streaming checksums with S3 Edge, an error message is reported, such as “Length Required” or HTTP code 411. If you are using AWS CLI v2.23.0 or later, or Boto3 (Python v1.36.0 or later), to communicate with S3 Edge, you must turn off the checksum validation by changing the default value from “WHEN_SUPPORTED” to “WHEN_REQUIRED” when sending S3 requests to Nasuni S3 Edge.

Troubleshooting

Here are some suggestions for dealing with issues that might arise in the use of S3 with a Nasuni Edge Appliance (NEA).

Situation	Cause	Solution
Received 503 service unavailable error.	S3 service is not running.	Check your NEA and S3 configuration for your account to ensure it is accurate. Also, try refreshing your NEA license.
Received 301 redirect error.	Utilizing an unknown client that did not direct to s3/ path.	Change user agent to include “NasuniS3” or add s3/ prefix to URL path and retry S3 command.
Network disconnection.	Network connection drops while upload is occurring.	Any data that was previously uploaded remains in the location specified. Upload file again.
Upload error “POST requests are not allowed on the specified resource.”	Potentially trying to upload a file using multipart parameter.	Depending on the S3 client, either disable the usage of multipart or modify the threshold whereby multipart usage is utilized. For Cyberduck, this threshold is 100 MB. See Configuring the Cyberduck multipart.upload parameter section as an example.
S3 Edge service no longer available after Disaster Recovery (DR) process	Known issue whereby S3 Edge with SDDL information does not automatically get re-populated after DR.	Re-join AD domain, then re-apply same S3 Edge configuration in account.nasuni.com in the S3 Edge tab and refresh the NEA license(s).