OS Patching.... SQL Server DOWN!!!!!!! Yes I still like software patching
All software has bugs so patch your SQL Servers regularly and by that I mean every 1 or 3 or 6 or 12 or even 18 months. Older versions of SQL like 2012 may not have frequent updates while current versions seem to have monthly updates.
If you take 5 minutes to hop on over to any unofficial build chart list of the known Service Packs (SP) and (CUs) look through the hotfixes in each cumulative update starting with the newest and work your way down for the version of SQL server you are on. Simply find something or a situation you may or may not have had but don't want to and you now have a reason to patch!
Today was a fun day in the aspect that OS patching of SQL servers caused multiple SQL FCI to go down at 8am so what happened.
1. We are running a two node SQL 2016 with Server 2016 running Storage Spaces Direct (S2D) each with it's own enclosure.
2. They all have a Cloud Witness.
3. They were all last patched in June with May updates and have been in a patching freeze since and not patched or rebooted.
Current Passive nodes were patched and rebooted during the day and we had zero issues. Role Swap nodes (Yes, I have to say this because when you say "Fail Over" that is considered as a negative action and causes alarm and red flags) occurred as planned at 1am and things are still continuing as expected swapping the Active to Passive. Now again as expected, we start patching the now new Passive nodes and they are needing to reboot after the first round of patching. At this point is when we now start to have issues causing the Active node to drop the S2D disks causing SQL Server to go off-line and take down the application and causes an outage.
So why did this actually happen as the passive node shouldn't impact the active node when all votes in the cluster are still above 50%. What actually happened is that we managed to have the node fall into the Windows Server 2016 cumulative updates that were released from May 8, 2018 (KB4103723) to October 9, 2018(KB4462917) installed and where hit with a bug/issues that was introduced and caused the S2D disks not to actually go into maintenance mode like they had previously when paused. This caused the clustered shared disk to go off-line and the SQL Servers kind of needs this and without it goes off-line as well.
So after all the insight above we now have a process in place to ensure that the paused passive node actually is paused and puts the disks into maintenance mode before being rebooted. Additionally, disabling live dumps will also help to mitigate the effect of live dump generation on systems that have lots of memory as most SQL servers or at least the ones in my case have lots of memory. All this information can be gathered from a quick google search with errors from the Cluster Manager and Event logs.
Search items included:
" Cluster physical disk resource failed periodic health check "
" Two Node S2D cluster passive node rebooted and goes down "
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj6T2CYPIVf-17CgbiCI82C9QbGEGlwtOZ7RAuFbyccfkmN2oB3BQaSEkP6eS4gupnVog_RfCRK9OH4bn23qGRZwcpU4KANcDPkrM0_nShG_nmGNeb8tiRZ8EmsHDplxma6uLMbOZyqj2Q/s1600/clusteer+2.jpg)
Os Patching.... Sql Server Down!!!!!!! Yes I Still Like Software Patching >>>>> Download Now
ReplyDelete>>>>> Download Full
Os Patching.... Sql Server Down!!!!!!! Yes I Still Like Software Patching >>>>> Download LINK
>>>>> Download Now
Os Patching.... Sql Server Down!!!!!!! Yes I Still Like Software Patching >>>>> Download Full
>>>>> Download LINK YM