Why SNMP monitoring is important
The Simple Network Management Protocol (SNMP) is an Internet Standard protocol for collecting and organizing information about managed devices on IP networks. SNMP is widely used in network management for network monitoring, and with third-party products that collect and report log data, such as Splunk.
Using SNMP, you can monitor your unique workloads. This is valuable in helping you establish what is “normal” in your environments and your volumes. With this information, you can then configure Notifications and alerts accordingly.
This process is necessary because, in our experience, there are few universal rules about what is normal for every customer. For one customer, or for one volume for one customer, having 1 GB of unprotected data might be unusual. For another, the threshold might be 1 TB. For one system, the average load might usually be low, while for another, it might usually be high.
There are also some general recommendations. For example, if the Last Snapshot occurred more than 24 hours ago on an active system, this should produce an alert.
The overall strategy is to implement SNMP, then monitor the systems for 2-4 weeks to observe what is usual for them. After that, you can configure the Notifications and alerts based on that knowledge.
SNMP ports
As the SNMP agent, Nasuni receives requests on UDP port 161 from the third-party SNMP manager used for system monitoring. Nasuni sends agent responses back to the source port on the third-party SNMP manager. The third-party SNMP manager receives notifications (including Traps and InformRequests) on SNMP destination port 162.
You cannot change port 161 or port 162.
SNMP with Nasuni
You can configure SNMP monitoring of Nasuni Edge Appliances. Nasuni provides two ways to configure SNMP monitoring:
You can enable SNMP traps, which send information to destinations that you provide.
Notification messages are sent as SNMP traps.You can use apps that pull SNMP information, using the definitions in the NASUNI-FILER-MIB.
You can configure either or both.
The Nasuni Edge Appliance supports monitoring via SNMP versions v1, v2c, and v3. The Nasuni Edge Appliance exposes the standard SNMPv1 MIB (management information base), and the NASUNI-FILER-MIB, SNMPv2-MIB, HOST-RESOURCES-MIB, UCD-SNMP-MIB, UCD-DISKIO-MIB, and IF-MIB. (Each of the displayed MIBs is a link. If you click a link, a page with that MIB information appears.) Both 32-bit and 64-bit SNMP network counters are supported.
Important: Data is updated at most once per minute. Some values, such as filerTotalUnprotectedData, might take 20 minutes or longer to be updated.
Note: Nasuni automatically provides the EngineID value.
Data available in SNMP updates include the following:
Network information, such as:
Inbound and outbound traffic by type and by port
Volume information, such as:
Size
Time of last snapshot
Local cache information, such as:
Total space, used space, and free space
Unprotected data
Cache hit/miss rate
CPU performance information, such as:
Percent utilization
Load averages
Memory usage information, such as:
Memory and swap utilization
Disk performance information, such as:
Number of disk reads and writes per disk
Bytes read and written per disk
Client information, such as: Number of connected CIFS clients
Snapshot and sync information, such as:
Number of merge conflicts
Snapshot success (version) count per volume
Times for snapshots (start, end, delta) per volume
Traps information for anything that would generate an email alert
The following are some helpful SNMP metrics:
From UCD-SNMP-MIB:
memAvailReal: The amount of real/physical memory currently unused or available.
memTotalFree: The total amount of memory free or available for use on this host. This value typically covers both real memory and swap space or virtual memory.
ssCpuRawIdle: The number of 'ticks' (typically 1/100s) spent idle.
From HOST-RESOURCES-MIB:
hrSWRunPerfCPU: The number of centi-seconds of the total system's CPU resources consumed by this process. Note that, on a multi-processor system, this value may increment by more than one centi-second in one centi-second of real (wall clock) time.
Recommendations for SNMP metrics to track
Nasuni Edge Appliances support industry-standard MIBs that allow most management systems to pull information about physical hardware, such as CPU and RAM usage. Beyond that, the NASUNI-FILER-MIB provides information regarding Nasuni-specific features.
Recommended items to track at a minimum include:
SNMP Metric | MIB | Reasons to monitor |
---|---|---|
filerVersion | NASUNI-FILER-MIB | Useful metadata about individual Edge Appliances. Information is also available in the NMC. |
filerUptime | NASUNI-FILER-MIB | Useful metadata about individual Edge Appliances. Information is also available in the NMC. |
filerUpdateAvailable | NASUNI-FILER-MIB | Useful metadata about individual Edge Appliances. Information is also available in the NMC. |
filerPlatformName | NASUNI-FILER-MIB | If you have several different types of Edge Appliances, this would be useful to include in your management system. |
filerTotalUnprotectedData | NASUNI-FILER-MIB | After you reach a steady state, you can establish a range within which this value should fall. If it exceeds this range, that could be an indication of a problem worth investigating. Perhaps snapshots are failing, or someone is loading a large amount of inappropriate data. One customer caught an infection of CryptoLocker because this value ballooned unexpectedly. |
filerReadHits, filerReadMisses | NASUNI-FILER-MIB | These two values are useful for evaluating the performance of the Edge Appliance’s cache. If there are a lot of misses, that means that the Edge Appliance is reaching out to the cloud to pull down data. This might indicate that the cache size is too small, or that pinning a folder could improve user experience. |
filerCloudOut, filerCloudIn | NASUNI-FILER-MIB | Cloud transmit/receive bits/second for the last 1 minute. These values are handy for tracking data sent to/from the cloud for trending purposes. |
filerClientOut, filerClientIn | NASUNI-FILER-MIB | Client transmit/receive bits/second for the last 1 minute. These values are handy for tracking data sent to/from the cloud for trending purposes. |
volumeTableDescription | NASUNI-FILER-MIB | These values provide more detailed insight into the Unprotected Data value above. Using them allows you to narrow down the location of unexpected unprotected data growth. |
filerCacheTotal | NASUNI-FILER-MIB | In order to monitor the current status of the cache for an Edge Appliance, you can use these counters. |
filerTotalShareLocks | NASUNI-FILER-MIB | Total number of SMB locks on the Edge Appliance. |
filerTotalShareClients | NASUNI-FILER-MIB | Total number share clients connected to this Edge Appliance. Note: if a client is connected to multiple shares, the client is included multiple times in the total. |
volumeTableLastSnapshotStart volumeTableLastSnapshotEnd volumeTableLastSnapshotDuration | NASUNI-FILER-MIB | Verify that snapshots are running regularly:
|
laLoad | UCD-SNMP-MIB | The 1-, 5-, and 15-minute load averages (one per row). Load average is a measurement of how many tasks are waiting in a kernel run queue (not just CPU time but also disk activity) over a period of time. You may want to alert if the 15-minute load average stays above the number of processors (vCPU on a VM) on the Edge Appliance. This is a good indicator of end-user performance issues. |
Configuring SNMP monitoring
To configure SNMP monitoring (on the Edge Appliance):
Configuration → SNMP Monitoring
To configure SNMP monitoring (on the NMC):
Filers → SNMP Settings
To continue configuring SNMP monitoring (showing Edge Appliance procedure; the NMC procedure is similar):
The SNMP configuration page appears.
To enable SNMP v1 and v2c monitoring, select Enable v1,v2c Support.
If you enable SNMP v1 and v2c monitoring, in the Community Name text box, enter the SNMP community name for the Nasuni Edge Appliance. The default community name is public. Changing the community’s name from the default improves security.To enable SNMP v3 monitoring, select Enable v3 Support.
If you enable SNMP v3 monitoring, enter a Username and Password for SNMP v3 authorization.
Important: Do not use passwords that start with "default".If you enable SNMP monitoring, in the System Location text box, enter the physical location of the Nasuni Edge Appliance.
If you enable SNMP monitoring, in the System Contact text box, enter the contact information of the person responsible for SNMP monitoring for the Nasuni Edge Appliance.
If you enable SNMP monitoring, in the Trap Addresses text box, enter a list of IP addresses or hostnames listening for SNMP traps, separated by commas. With SNMP traps, Nasuni sends the information to the provided destinations.
If you do not want to listen for SNMP traps, leave this blank. For example, if you use apps that can pull SNMP information, you do not use traps, but you can use the definitions in the NASUNI-FILER-MIB with your app.
If you enter any trap addresses, you can send a test trap by clicking Send Test Trap.Click Save SNMP Settings. The SNMP monitoring settings are saved for this Nasuni Edge Appliance.
Trap Addresses and Engine IDs
On the NMC, you can view Trap Addresses and Engine IDs. Go to Filers → SNMP Settings.
If SNMP is enabled, in the Trap Addresses column, a list of IP addresses or hostnames listening for SNMP traps appears. The SNMP Engine IDs are located in the Trap Addresses column.
Note: Trap Addresses are visible only if using SNMP v3.
Important: The Engine ID is derived from the Edge Appliance serial number. For example, if the serial number were “52d1e618bb1a4de9”, then the Engine ID would be “0x52d1e618bb1a4de9”.
Changes in SNMP reporting of memory
SNMP name ↓ | In Version 8.7 and before reported ↓ | In Version 8.8 to 9.2 reported ↓ | In Version 9.3 and later reports ↓ |
memAvailReal | memfree | memfree + buffers + cached + sreclaimable | memfree |
memTotalFree | memfree + swapfree | memfree + swapfree + buffers + cached + sreclaimable | memfree + swapfree |
memCached | cached | cached + sreclaimable | cached + sreclaimable |
where
buffers = memory in the buffer cache (relatively temporary storage for raw disk blocks)
cached = memory in the pagecache (Diskcache and Shared Memory)
memfree = amount of physical memory not used by the system
sreclaimable = the part of the Slab (In-kernel data structures cache) that might be reclaimed (such as caches)
swapfree = remaining swap space available
For details, see https://access.redhat.com/solutions/406773.
Copyright © 2010-2024 Nasuni Corporation. All rights reserved.