vStorage APIs for Array Integration ( VAAI )

What is VAAI?

VAAI is a set of APIs and SCSI commands that offload certain I/O-intensive functions from the ESXi host to the storage platform for more efficient performance.
VAAI was introduces in vSphere 4.1 to enable offload of these features:
~
◎ Full Copy
What is it? Hardware-accelerated copying of data by performing all duplication and migration operations on the array.
Benefits? Faster data movement via Storage vMotion; faster vm creation and deployment from templates; faster vm cloning.  Reduces server CPU cycles, memory, IP and SAN network bandwidth, and storage front-end controller I/O.
~
◎ Block Zero
What is it?  Hardware-accelerated zero initialization.
Benefits?  Greatly reduces common input/output tasks, such as creating new vm’s.  Especially beneficial when creating FT enabled VMs or when performing routine app-level Block Zeroing
~
◎ Hardware-assisted locking
What is it?  Improved locking controls on VMFS.
Benefits?  more VMs per datastore.  Shortened simultaneous block vm boot times.  Faster VM migration.
What is new in 5.0?
Enhancements for environments that use array-based thin provisioning.  Specifically:
~
◎ Dead Space Reclamation
What is it?  The ability to reclaim blocks on a thin-provisioned LUN on the array when a virtual disk is deleted or migrated to a different datastore.  Historically the blocks used prior to the migration where still reported as “in use” by the array.
Benefits? More accurate reporting of disk space consumption and reclamation of the unused blocks on the thin LUN.
~
◎ Out-of-space conditions
What is it?  If a thin-provisioned datastore reaches 100 percent, only the virtual machines that require extra blocks of storage are temporarily paused, allowing admins to allocate additional space to the datastore.  Virtual machines on the datastore that don’t need additional space continue to run.
Benefits?  Prevents some catastrophic scenarios encountered with storage oversubscription in thin-provisioned environments.
~
Configuring / Verifying VAAI Full Copy/Block Zero
In the vSphere client, Host and Clusters > Configuration Tab > (Software) Advanced Settings > DataMover
Full Copy = DataMover.HardwareAcceleratedMove.  1 = Enabled ; 0 = Disabled
Block Zero = DataMover.HardwareAcceleratedInit.  1 = Enabled; 0 = Disabled
~
Configuring / Verifying VAAI Hardware-Assisted Locking
In the vSphere client, Host and Clusters > Configuration Tab > (Software) Advanced Settings > VMFS3
Hardware-Assisted Locking = VMFS3.HardwareAcceleratedLocking.  1 = Enabled; 0 = Disabled.
~
VAAI Dead Space Reclamation
This one can be a little bit involved.  There are various resources addressing this topic, all are referenced at the end of this post.
~
(In a nutshell)
Step 1 – Verify Hardware Acceleration (VAAI) is supported 
Host and Clusters > Configuration tab > (Hardware) Storage > Select Datastore, review details (Not supported in my dinky home lab).
~
Step 2 – Get the NAA id of the device backing the datastore:
~ # esxcli storage vmfs extent list
Example output:
naa.60a98000572d54724a346a6170627a52
~
Step 3 – Get VAAI status:
esxcli storage core device list -d naa.60a98000572d54724a346a6170627a52
Example
# esxcli storage core device list –d  naa.60a98000572d54724a346a6170627a52
naa.60a98000572d54724a346a6170627a52
   Display Name: NETAPP Fibre Channel Disk (naa.60a98000572d54724a346a6170627a52)
   Has Settable Display Name: true
   Size: 51200
   Device Type: Direct-Access
   Multipath Plugin: NMP
   Devfs Path: /vmfs/devices/disks/naa.60a98000572d54724a346a6170627a52
   Vendor: NETAPP
   Model: LUN
   Revision: 8020
   SCSI Level: 4
   Is Pseudo: false
   Status: on
   Is RDM Capable: true
   Is Local: false
   Is Removable: false
   Is SSD: false
   Is Offline: false
   Is Perennially Reserved: false
   Thin Provisioning Status: yes
   Attached Filters: VAAI_FILTER
   VAAI Status: supported
   Other UIDs: vml.020033000060a98000572d54724a346a6170627a524c554e202020
~
Step 4 – Check if the array supports the UNMAP primitive for dead space reclamation
esxcli storage core device vaai status get -d naa.60a98000572d54724a346a6170627a52
~
Step 5 – Run the UNMAP primitive command
VMWARE NOTES:

Caution – We expect customers to use this primitive during their maintenance window, since running it on a datastore that is in-use by a VM can adversely affect I/O for the VM. I/O can take longer to complete, resulting in lower I/O throughput and higher I/O latency.

A point I would like to emphasize is that the whole UNMAP performance is totally driven by the storage array. Even the recommendation that vmkfstools -y be issued in a maintenance window is mostly based on the effect of UNMAP commands on the array’s handling of other commands.

There is no way of knowing how long an UNMAP operation will take to complete. It can be anywhere from few minutes to couple of hours depending on the size of the datastore, the amount of content that needs to be reclaimed and how well the storage array can handle the UNMAP operation.

To run the command, you should change directory to the root of the VMFS volume that you wish reclaim space from. The command is run as:

vmkfstools –y <% of free space to unmap>

~
Step 6 – Verify
Verify using esxtop > u > f > o > p  ; Review the DELETE, DELETE_F and MBDEL/s columns.
For this one I recommend reviewing the article put together by Pauldie O’Riordan, the last referenced below at the end of this post.
~
Out of Space Conditions / Thin Provisioning Stun
I can’t find a setting for this so I am assuming if VAAI is supported by the array, then the OOS/TPS behavior will apply.  I will keep digging on this one.  This snippet out of a VMware Community blog clarifies the feature to satisfaction (at least we know what to expect):
~
That’s all I got peeps.  Live long and prosper.
Gabe@networkdojo.net
Sources:

VMware Network I/O Control ( NetIOC )

Cliff notes for NetIOC.  You can find a most excellent white paper describing this feature in 25 glorious pages here:  VMware Network I/O Control, Architecture, Performance and Best Practices.

Prerequisites for NetIOC

NetIOC is only supported with the vNetwork Distributed Switch (vDS).

NetIOC Feature Set
NetIOC provides users with the following features:
• Isolation: ensure traffic isolation so that a given flow will never be allowed to dominate over others, thus preventing drops and
undesired jitter
• Shares: allow flexible networking capacity partitioning to help users to deal with overcommitment when flows compete
aggressively for the same resources
• Limits: enforce traffic bandwidth limit on the overall vDS set of dvUplinks
• Load-Based Teaming: efficiently use a vDS set of dvUplinks for networking capacity
NetIOC Traffic Classes
The NetIOC concept revolves around resource pools that are similar in many ways to the ones already existing for CPU and Memory.
NetIOC classifies traffic into six predefined resource pools as follows:
• vMotion
• iSCSI
• FT logging
• Management
• NFS
• Virtual machine traffic

Shares
A user can specify the relative importance of a given resource-pool flow using shares that are enforced at the dvUplink level. The
underlying dvUplink bandwidth is then divided among resource-pool flows based on their relative shares in a work-conserving
way. This means that unused capacity will be redistributed to other contending flows and won’t go to waste. As shown in Figure 1,
the network flow scheduler is the entity responsible for enforcing shares and therefore is in charge of the overall arbitration under
overcommitment. Each resource-pool flow has its own dedicated software queue inside the scheduler so that packets from a given
resource pool won’t be dropped due to high utilization by other flows.
Limits
A user can specify an absolute shaping limit for a given resource-pool flow using a bandwidth capacity limiter. As opposed to shares
that are enforced at the dvUplink level, limits are enforced on the overall vDS set of dvUplinks, which means that a flow of a given
resource pool will never exceed a given limit for a vDS out of a given vSphere host.
Load-Based Teaming (LBT)
vSphere 4.1 introduces a load-based teaming (LBT) policy that ensures vDS dvUplink capacity is optimized. LBT avoids the situation of
other teaming policies in which some of the dvUplinks in a DV Port Group’s team were idle while others were completely saturated just
because the teaming policy used is statically determined. LBT reshuffles port binding dynamically based on load and dvUplinks usage
to make an efficient use of the bandwidth available. LBT only moves ports to dvUplinks configured for the corresponding DV Port
Group’s team. Note that LBT does not use shares or limits to make its judgment while rebinding ports from one dvUplink to another.
LBT is not the default teaming policy in a DV Port Group so it is up to the user to configure it as the active policy.
LBT will only move a flow when the mean send or receive utilization on an uplink exceeds 75 percent of capacity over a 30-second
period. LBT will not move flows more often than every 30 seconds.



Configuring NetIOC

NetIOC is configured through the vSphere Client in the Resource Allocation tab of the vDS from within the “Home->Inventory->Networking” panel.
NetIOC is enabled by clicking on “Properties…” on the right side of the panel and then checking “Enable network I/O control on this
vDS” in the pop up box.

Editing NetIOC Settings

That’s all folks.  I’m out.

Gabe@networkdojo.net