SNMP Best Practices

Prev Next

Why SNMP monitoring is important

The Simple Network Management Protocol (SNMP) is an Internet Standard protocol for collecting and organizing information about managed devices on IP networks. SNMP is widely used in network management for network monitoring, and with third-party products that collect and report log data, such as Splunk.

Using SNMP, you can monitor your unique workloads. This is valuable in helping you establish what is “normal” in your environments and your volumes. With this information, you can then configure Notifications and alerts accordingly.

This process is necessary because, in our experience, there are few universal rules about what is normal for every customer. For one customer, or for one volume for one customer, having 1 GB of unprotected data might be unusual. For another, the threshold might be 1 TB. For one system, the average load might usually be low, while for another, it might usually be high.

There are also some general recommendations. For example, if the Last Snapshot occurred more than 24 hours ago on an active system, this should produce an alert.

The overall strategy is to implement SNMP, then monitor the systems for 2-4 weeks to observe what is usual for them. After that, you can configure the Notifications and alerts based on that knowledge.

SNMP ports

As the SNMP agent, Nasuni receives requests on UDP port 161 from the third-party SNMP manager used for system monitoring. Nasuni sends agent responses back to the source port on the third-party SNMP manager. The third-party SNMP manager receives notifications (including Traps and InformRequests) on SNMP destination port 162.

You cannot change port 161 or port 162.

SNMP with Nasuni

You can configure SNMP monitoring of Nasuni Edge Appliances. Nasuni provides two ways to configure SNMP monitoring:

  • You can enable SNMP traps, which send information to destinations that you provide.
    Notification messages are sent as SNMP traps.

  • You can use apps that pull SNMP information, using the definitions in the NASUNI-FILER-MIB.

You can configure either or both.

The Nasuni Edge Appliance supports monitoring via SNMP versions v1, v2c, and v3. The Nasuni Edge Appliance exposes the standard SNMPv1 MIB (management information base), and the NASUNI-FILER-MIB, SNMPv2-MIB, HOST-RESOURCES-MIB, UCD-SNMP-MIB, UCD-DISKIO-MIB, and IF-MIB. (Each of the displayed MIBs is a link. If you click a link, a page with that MIB information appears.) Both 32-bit and 64-bit SNMP network counters are supported.

Important: Data is updated at most once per minute. Some values, such as filerTotalUnprotectedData, might take 20 minutes or longer to be updated.

Note: Nasuni automatically provides the EngineID value.

Data available in SNMP updates include the following:

  • Network information, such as:

    • Inbound and outbound traffic by type and by port

  • Volume information, such as:

    • Size

    • Time of last snapshot

  • Local cache information, such as:

    • Total space, used space, and free space

    • Unprotected data

    • Cache hit/miss rate

  • CPU performance information, such as:

    • Percent utilization

    • Load averages

  • Memory usage information, such as:

    • Memory and swap utilization

  • Disk performance information, such as:

    • Number of disk reads and writes per disk

    • Bytes read and written per disk

  • Client information, such as: Number of connected CIFS clients

  • Snapshot and sync information, such as:

    • Number of merge conflicts

    • Snapshot success (version) count per volume

    • Times for snapshots (start, end, delta) per volume

  • Traps information for anything that would generate an email alert

The following are some helpful SNMP metrics:

  • From UCD-SNMP-MIB:

    • memAvailReal: The amount of real/physical memory currently unused or available.

    • memTotalFree: The total amount of memory free or available for use on this host. This value typically covers both real memory and swap space or virtual memory.

    • ssCpuRawIdle: The number of 'ticks' (typically 1/100s) spent idle.

  • From HOST-RESOURCES-MIB:

    • hrSWRunPerfCPU: The number of centi-seconds of the total system's CPU resources consumed by this process. Note that, on a multi-processor system, this value may increment by more than one centi-second in one centi-second of real (wall clock) time.

Recommendations for SNMP metrics to track

Nasuni Edge Appliances support industry-standard MIBs that allow most management systems to pull information about physical hardware, such as CPU and RAM usage. Beyond that, the NASUNI-FILER-MIB provides information regarding Nasuni-specific features.

Recommended items to track at a minimum include:

SNMP Metric

MIB

Reasons to monitor

filerVersion

NASUNI-FILER-MIB

Useful metadata about individual Edge Appliances. Information is also available in the NMC.

filerUptime

NASUNI-FILER-MIB

Useful metadata about individual Edge Appliances. Information is also available in the NMC.

filerUpdateAvailable

NASUNI-FILER-MIB

Useful metadata about individual Edge Appliances. Information is also available in the NMC.

filerPlatformName

NASUNI-FILER-MIB

If you have several different types of Edge Appliances, this would be useful to include in your management system.

filerTotalUnprotectedData

NASUNI-FILER-MIB

After you reach a steady state, you can establish a range within which this value should fall. If it exceeds this range, that could be an indication of a problem worth investigating. Perhaps snapshots are failing, or someone is loading a large amount of inappropriate data. One customer caught an infection of CryptoLocker because this value ballooned unexpectedly.

filerReadHits, filerReadMisses

NASUNI-FILER-MIB

These two values are useful for evaluating the performance of the Edge Appliance’s cache. If there are a lot of misses, that means that the Edge Appliance is reaching out to the cloud to pull down data. This might indicate that the cache size is too small, or that pinning a folder could improve user experience.

filerCloudOut, filerCloudIn

NASUNI-FILER-MIB

Cloud transmit/receive bits/second for the last 1 minute. These values are handy for tracking data sent to/from the cloud for trending purposes.

filerClientOut, filerClientIn

NASUNI-FILER-MIB

Client transmit/receive bits/second for the last 1 minute. These values are handy for tracking data sent to/from the cloud for trending purposes.

volumeTableDescription
volumeTableUnprotectedData

NASUNI-FILER-MIB

These values provide more detailed insight into the Unprotected Data value above. Using them allows you to narrow down the location of unexpected unprotected data growth.

filerCacheTotal
filerCacheUsed
filerCacheFree

NASUNI-FILER-MIB

In order to monitor the current status of the cache for an Edge Appliance, you can use these counters.

filerTotalShareLocks

NASUNI-FILER-MIB

Total number of SMB locks on the Edge Appliance.

filerTotalShareClients

NASUNI-FILER-MIB

Total number share clients connected to this Edge Appliance.

Note: if a client is connected to multiple shares, the client is included multiple times in the total.

volumeTableLastSnapshotStart

volumeTableLastSnapshotEnd  volumeTableLastSnapshotDuration

NASUNI-FILER-MIB

Verify that snapshots are running regularly:

  • Date and time the last snapshot started (YYYY-MM-DD HH:MM:SS)

  • Date and time the last snapshot ended (YYYY-MM-DD HH:MM:SS)

  • Duration of the last snapshot.

laLoad

UCD-SNMP-MIB

The 1-, 5-, and 15-minute load averages (one per row).

Load average is a measurement of how many tasks are waiting in a kernel run queue (not just CPU time but also disk activity) over a period of time. You may want to alert if the 15-minute load average stays above the number of processors (vCPU on a VM) on the Edge Appliance. This is a good indicator of end-user performance issues.

Configuring SNMP monitoring

To configure SNMP monitoring (on the Edge Appliance):

Configuration → SNMP Monitoring

To configure SNMP monitoring (on the NMC):

Filers → SNMP Settings

To continue configuring SNMP monitoring (showing Edge Appliance procedure; the NMC procedure is similar):

  1. The SNMP configuration page appears.
     

  2. To enable SNMP v1 and v2c monitoring, select Enable v1,v2c Support.
    If you enable SNMP v1 and v2c monitoring, in the Community Name text box, enter the SNMP community name for the Nasuni Edge Appliance. The default community name is public. Changing the community’s name from the default improves security.

  3. To enable SNMP v3 monitoring, select Enable v3 Support.
    If you enable SNMP v3 monitoring, enter a Username and Password for SNMP v3 authorization.
    Important: Do not use passwords that start with "default".

  4. If you enable SNMP monitoring, in the System Location text box, enter the physical location of the Nasuni Edge Appliance.

  5. If you enable SNMP monitoring, in the System Contact text box, enter the contact information of the person responsible for SNMP monitoring for the Nasuni Edge Appliance.

  6. If you enable SNMP monitoring, in the Trap Addresses text box, enter a list of IP addresses or hostnames listening for SNMP traps, separated by commas. With SNMP traps, Nasuni sends the information to the provided destinations.
    If you do not want to listen for SNMP traps, leave this blank. For example, if you use apps that can pull SNMP information, you do not use traps, but you can use the definitions in the NASUNI-FILER-MIB with your app.
    If you enter any trap addresses, you can send a test trap by clicking Send Test Trap.

  7. Click Save SNMP Settings. The SNMP monitoring settings are saved for this Nasuni Edge Appliance.

Trap Addresses and Engine IDs

On the NMC, you can view Trap Addresses and Engine IDs. Go to Filers SNMP Settings.

If SNMP is enabled, in the Trap Addresses column, a list of IP addresses or hostnames listening for SNMP traps appears. The SNMP Engine IDs are located in the Trap Addresses column.

Note: Trap Addresses are visible only if using SNMP v3.

Important: The Engine ID is derived from the Edge Appliance serial number. For example, if the serial number were “52d1e618bb1a4de9”, then the Engine ID would be “0x52d1e618bb1a4de9”.

Changes in SNMP reporting of memory

SNMP name ↓

In Version 8.7 and before reported ↓

In Version 8.8 to 9.2 reported ↓

In Version 9.3 and later reports ↓

memAvailReal

memfree

memfree + buffers + cached + sreclaimable

memfree

memTotalFree

memfree + swapfree

memfree + swapfree + buffers + cached + sreclaimable

memfree + swapfree

memCached

cached

cached + sreclaimable

cached + sreclaimable

where

buffers = memory in the buffer cache (relatively temporary storage for raw disk blocks)

cached = memory in the pagecache (Diskcache and Shared Memory)

memfree = amount of physical memory not used by the system

sreclaimable = the part of the Slab (In-kernel data structures cache) that might be reclaimed (such as caches)

swapfree = remaining swap space available

For details, see https://access.redhat.com/solutions/406773.

Copyright © 2010-2024 Nasuni Corporation. All rights reserved.