Friday, July 12, 2019

hard drive - How to repair mirror set in Windows Storage Pool?




I have a storage pool with 2x2 drives in it. This is an "upgraded" Storage Pool being hosted by Windows 10 1809. One of the drives has started throwing errors -- SMART shows over a thousand recently-logged read errors, and Windows has thrown about a hundred disk events into the log over the past week. For some reason, Storage Spaces insists that the drive is still healthy, but my gut feeling is that it should be replaced. But, I haven't been able to find any solid documentation on how.



With an old-fashioned RAID-1 array, I'd just pull the bad disk, put a new disk in of at least the same size, and then tell it to rebuild the mirror using the new disk. After both drives get replaced with ones of a larger capacity, most systems will then typically let me extend the volume to the size shared by the two drives. But I haven't been able to find any explicit documentation whatsoever of how Storage Spaces handles this situation, whether it is the same or different.



One of the things I've read is that if Storage Spaces detects any sort of error in a Storage Pool, it actually makes the entire Storage Pool permanently read-only. This sounds pretty wacky to me, but I thought I'd mention it in case someone reading this can confirm or deny.



The best I've been able to put together from the bits and pieces I've found is that what I'm supposed to do is:





  1. Install two new disks.

  2. Add these disks in tandem to the Storage Pool as mirrored elements.

  3. Tell the Storage Pool that I want to "retire" the mirror pair that contains the bad drive.

  4. Wait for Storage Pool to evict all the data from those drives by moving it into the blank space in the new drives.

  5. Tell Storage Pool that I don't want the mirror set of old drives to be in the pool any more.



Is this correct? What happens if one of the drives in one of those mirrored pairs goes missing? Does the Storage Pool continue to read/write data from that drive's counterpart, just like a degraded RAID-1 array? And, is there a more direct way to say, "Yeah, just rebuild that particular pair with this new drive"?



I've seen screenshots of the step 3 I described above, with a link button "Prepare for removal" to the right of each of the physical disks, but when I navigate to that same screen on my system, that link doesn't appear for me. Does it only appear when there is enough space elsewhere in the storage pool for it to be possible to relocate the data? Is there something you have to do to make it appear? Does it only appear for certain configurations?




I am surprised that I haven't been able to find any clear information walking through this scenario of wanting to replace a failing drive that's part of a mirror set. In my prior experience it is a fairly common occurrence.


Answer



Okay, so for anyone in a similar situation to me madly Googling for answers, I thought I'd relay my experience.



One of the problems I had to deal with was that I don't have enough SATA ports on this motherboard to do this properly. So, the steps I've followed here include working around that minor headache as well. If you have the SATA ports for it, you don't need to pull the ailing drive until after the Storage Pool has prepared it for removal. There are perhaps some arguments that you might want to do this anyway given that the drive, if it's like mine, has started showing signs of being untrustworthy.



Anyway, it looks like the way Storage Spaces handles repairs of mirror sets is as I had surmised in the question: I did not see any way to tell the system, "Make this a pair of disks again by rebuilding onto this new one." It looks like with Storage Spaces, everything operates in units of pairs of disks.



First, anyone feeling nervous about whether the mirror is in fact providing the type of data redundancy you're looking for, I can relay this experience: I shut down my computer, disconnected the misbehaving drive, and started it back up, and the Storage Pool was still fully-responsive and usable. Storage Spaces, after a few moments, detected that the disk was missing and politely asked me to plug it back in. :-P




As for how to deal with the fact that the mirroring set now has a big hole in it, I attached two brand new drives. In my case, I had to steal a cable from my optical drive for one of them, and for the other one? Removing the misbehaving drive was a necessity because I needed its SATA port too.



When the system came back up with these new drives installed, Storage Spaces presented an option for the pool to "Add drives". Clicking this automatically identified the two new drives and allowed them to be added to the pool, and doing this automatically triggered a repair process that, as I write this, is in the process of "repairing" (and surprisingly fast, too -- in the past 15 minutes it has already reached 12%.



I can also confirm that the "Prepare for removal" option which was missing for me before did in fact magically appear as soon as there was free space in the pool into which the data could be located. The missing drive did not get a "Prepare for removal" option, presumably because, seeing as how it's missing, there's no way the system could relocate the data from that drive. (Fortunately, being a mirror set, the same data is available on other drives.) But it did get a "Remove drive" option. I tried clicking this and was told that the drive could not be removed because there was still data attributed to it. However, going through these steps explicitly changed the status of the missing drive to "Preparing for removal".



Physical drives, showing 'Preparing for removal' on missing drive



Note that the "usage" of the drive is less than that of the other drives. As the repair proceeds, this percentage is progressively dropping.




It is my expectation that once the repair is complete, its usage will be at 0% and it will be possible to detach this drive logically from the pool. Furthermore, given my paucity of SATA ports, I will be telling Storage Spaces to prepare the failed drive's partner for removal as well, and then I'll be able to remove it, swap some cables around, and have my optical drive back.



If there are no follow-ups to this answer, you can assume that all of this succeeded as planned. If I run into unexpected situations removing the drives that are being replaced, I will follow up on this answer.


No comments:

Post a Comment

hard drive - Leaving bad sectors in unformatted partition?

Laptop was acting really weird, and copy and seek times were really slow, so I decided to scan the hard drive surface. I have a couple hundr...