If you’ve ever dealt with a SAN or Storage guy before, you’ll know that they usually have a huge passion for cache stats. This is because the secret sauce of accelerating cheap storage for years has been to stick a small amount of expensive but super fast flash in front of your slower spinning disk, or in recent years, your cheaper low endurance SSDs.
Because of this, it was always a good idea to keep an eye on how your cache was going, making sure things like Cache Hit Misses were low, and that your Write Cache wasn’t overallocated.
Quite often the best information on new technology is actually found in blog posts and not actual documentation, and while the documentation for Storage Spaces Direct from Microsoft is great, some of the real gems are in the pre-GA blogs they put up.
So below, I hope to keep up a list of essential blog posts from both Microsoft and independent bloggers for those of you who wish to really understand what’s happening under the hood!
UPDATE(2017-09-19): Microsoft have officially recognized the bug and have a KB describing the symptoms and workaround much like the below. See here: https://support.microsoft.com/en-us/help/4043361/disks-in-maintenance-mode-status-after-september-cumulative-update-kb
I was patching our dev cluster the other day and came across a new issue when applying the latest September Cumulative Update (KB4038782), and it seems others on the internet have hit this issue as well.
Background First, a bit of background on the expected behaviour when performing maintenance:
I’ve been deploying a few Storage Spaces Direct (S2D) clusters lately, and I noticed a slight mis-configuration that can occur on deployment.
Normally when deploying S2D, the disk types in the nodes are detected and the fastest disk (usually NVMe or SSD) is assigned to the cache, while the next fastest is used for the Performance Tier and the slowest being used in the Capacity Tier. So if you have NVMe, SSD and HDD, you would end up with an NVMe Cache, a SSD Performance Tier and a HDD Capacity Tier.
We’ve all had the case where there was a volume running hot on your cluster and you spend ages wrestling with perf counters to try to find that VM that’s causing your storage to burn. Well let me introduce you to a magical new command in Windows Server 2016
Get-StorageQoSFlow This miracle command can give you insights on all the VHD(x)s running on your cluster, revealing IOPS, Latency and Bandwidth stats for them all without the need for large-scale monitoring solutions.
As many of you would have seen, Windows Server 2016 has been officially launched, with evaluation media available and General Availability slated for later this month.
One of the great new features in this release, is Storage Spaces Direct, a Software-Defined Storage Solution. There is already plenty of information available on how to get this up and running on Microsoft Docs, but I thought I’d share some of the operational tasks that aren’t so obvious, starting with expanding volumes.