Looking at the Write Cache in Storage Spaces Direct

If you’ve ever dealt with a SAN or Storage guy before, you’ll know that they usually have a huge passion for cache stats. This is because the secret sauce of accelerating cheap storage for years has been to stick a small amount of expensive but super fast flash in front of your slower spinning disk, or in recent years, your cheaper low endurance SSDs.

Because of this, it was always a good idea to keep an eye on how your cache was going, making sure things like Cache Hit Misses were low, and that your Write Cache wasn’t overallocated.

And Storage Spaces Direct (S2D) is no different today, especially in Hybrid (SSD+HDD) deployments.

Today, we’ll focus in on the Write Cache portion of S2D’s caching technology, as it’s easy to forget about and overlook.

Now if you’re running Windows Server 2019, the new Cluster Performance History makes this very easy for you.
If you want to see the total Cluster cache write usage:

Get-ClusterPerf -ClusterSeriesName PhysicalDisk.Cache.Size.Dirty,PhysicalDisk.Cache.Size.Total

Or if you want to see the per node usage:

Get-ClusterNode | Get-ClusterPerf -ClusterNodeSeriesName PhysicalDisk.Cache.Size.Dirty,PhysicalDisk.Cache.Size.Total

This is all well and good if you’ve already made the mode to Windows Server 2019, but what if you’re still on Windows Server 2016?
Well the good news is that the counters still exist, however they’re not as easy to read as they are in Windows Server 2019.
To get the same information, we need to look at the following Performance Counters:

  • Cluster Storage Cache Stores
  • Cache Pages Bytes
  • Cache Pages Dirty

Cache Pages Bytes is pretty self explanatory, it’s the size of the cache. Cache Pages Dirty however looks like some alien number that doesn’t match up to anything, and that’s partly true, but only because we’re still missing a key piece of information, how big is a cache page?

Finding that is as simple as running (Get-ClusterS2D).CachePageSizeKBytes With this, we can take our dirty counter value and times it by the cache page size to work out the write cache allocation.

Now that’s now the easiest thing to check when you’re in a hurry, or across multiple hosts and clusters, so I’ve created a small function to wrap it all up for you.

Function Get-ClusterWriteCacheUsage {
    [cmdletbinding()]
    param(
        # Cluster's to query
        [string[]]$ClusterName="localhost",
        # Show only the total per node
        [switch]$TotalOnly
    )
    begin {
        # Establish helper functions
        Function Format-Bytes {
            Param (
                $RawValue
            )
            $i = 0 ; $Labels = ("B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB")
            Do { if ( $RawValue -Gt 1024 ) { $RawValue /= 1024 ; $i++ } } While ( $RawValue -Gt 1024 )
            # Return
            [String][Math]::Round($RawValue,2) + " " + $Labels[$i]
        }
    }
    Process {
        # Start processing supplied cluster names
        Foreach ($Name in $ClusterName) {
            # Confirm the cluster exists
            try{
                $Cluster = (Get-Cluster -Name $Name -ErrorAction Stop).Name
            }catch{
                throw "Cluster Name $Name supplied, can't be found"
            }
            # Query for Nodes
            $Nodes = (Get-ClusterNode -Cluster $Cluster)
            # Query for Cluster Cache Page Size
            [int64]$PageSize = (Get-ClusterS2D -CimSession $Cluster).CachePageSizeKBytes * 1KB
            # Loop through each node to get performance counters
            Foreach ($Node in $Nodes) {
                $NodeName = $Node.Name
                # Determine if we need all instances or just _total
                if ($TotalOnly -eq $true) {
                    $Query = "_Total"
                }
                else {
                    $Query = "*"
                }
                $DirtyCounter = "\\$NodeName\Cluster Storage Cache Stores($Query)\Cache Pages Dirty"
                $SizeCounter = "\\$NodeName\Cluster Storage Cache Stores($Query)\Cache Pages"
                # Collect the actual counter data
                $Data = (Get-Counter -Counter $DirtyCounter -ComputerName $NodeName).CounterSamples
                $Data += (Get-Counter -Counter $SizeCounter -ComputerName $NodeName).CounterSamples
                # Determine the names of the disks queried
                $Instances = ($Data | Sort-Object InstanceName).InstanceName | Get-Unique
                # Find matching data for each disk and return it formated
                Foreach ($Instance in $Instances) {
                    # get Matching Data
                    $CacheSize = (
                        $Data |Where-Object {
                            ($_.InstanceName -eq $Instance) -and
                            ($_.Path -ilike "*\cache pages")
                        }
                        ).RawValue * $PageSize
                    $CacheUsage = (
                        $Data |Where-Object {
                            ($_.InstanceName -eq $Instance) -and
                            ($_.Path -ilike "*\cache pages dirty")
                        }
                        ).RawValue * $PageSize
                    # Format data into a PS Object and return it
                    [pscustomobject][ordered]@{
                        ComputerName    = $NodeName
                        Instance        = $Instance
                        WriteCacheUsage = Format-Bytes -RawValue $CacheUsage
                        CacheSize       = Format-Bytes -RawValue $CacheSize
                    }
                }
            }
        }
    }
}

Hopefully this can help give some of you out until you can upgrade to Windows Server 2019!