So you’ve set up an Azure Stack HCI Cluster and everything’s running great, but there is this nagging feeling in the back of your mind. It’s a hybrid setup, with some type of flash cache sitting in front of spinning disk, and you start to wonder how hard you’re pushing that cache, and how long it will last. Thankfully with Windows Server 2019, there are many in-built tools and commands to help work out just that!
If you’re running a Storage Spaces Direct (S2D) Cluster, you might have noticed some instability in recent months, specifically when it comes to patching and performing maintenance. Well you’re in luck because 5 days ago, Microsoft released a new KB article that helps explain why you might have seen issues. The scenario targeted by the Microsoft article is S2D Clusters running May (KB4103723) or later patch levels, where you experience Event ID 5120 during patching or maintenance, leading to things like CSV timeouts, VM pauses, or even VM crashes.
If you’ve been anywhere near Twitter or any Tech Blogs and News sites recently, you would have noticed that Microsoft have dropped their first cut of the next Long-Term Service Branch OS, Windows Server 2019, into the Windows Insider ring for people like you and me to start testing. Now most people (like me) don’t have a huge amount of spare hardware sitting round for times like this, especially for testing things like Storage Spaces Direct (S2D).
Hi all, Quite often the best information on new technology is actually found in blog posts and not actual documentation, and while the documentation for Storage Spaces Direct from Microsoft is great, some of the real gems are in the pre-GA blogs they put up. So below, I hope to keep up a list of essential blog posts from both Microsoft and independent bloggers for those of you who wish to really understand what’s happening under the hood!
UPDATE(2017-09-19): Microsoft have officially recognized the bug and have a KB describing the symptoms and workaround much like the below. See here: https://support.microsoft.com/en-us/help/4043361/disks-in-maintenance-mode-status-after-september-cumulative-update-kb I was patching our dev cluster the other day and came across a new issue when applying the latest September Cumulative Update (KB4038782), and it seems others on the internet have hit this issue as well. Background First, a bit of background on the expected behaviour when performing maintenance:
We’ve all had the case where there was a volume running hot on your cluster and you spend ages wrestling with perf counters to try to find that VM that’s causing your storage to burn. Well let me introduce you to a magical new command in Windows Server 2016 Get-StorageQoSFlow This miracle command can give you insights on all the VHD(x)s running on your cluster, revealing IOPS, Latency and Bandwidth stats for them all without the need for large-scale monitoring solutions.
As many of you would have seen, Windows Server 2016 has been officially launched, with evaluation media available and General Availability slated for later this month. One of the great new features in this release, is Storage Spaces Direct, a Software-Defined Storage Solution. There is already plenty of information available on how to get this up and running on Technet, but I thought I’d share some of the operational tasks that aren’t so obvious, starting with expanding volumes.