Monday, January 21, 2019

troubleshooting - What happened to my Hard Drive?


I bought a WD MyBook 500GB to back up my two internal drives that I have had for over 4 years each. The backup went smoothly over Firewire at a respectable speed and everything was fine for months and months.


Fast forward to a month ago. I was using the external drive when all of a sudden there was a click and it powered off. I tried turning it back on but it would click on spinup and power off again and again. Obviously this is making me panic as I know that the internal drives that I backed up are going to die any day. I also know that the probability of being able to back up those drives onto another one is lower than ever.


So here I am at the mercy of WD. I know that I absolutely need to be able to access the data on the MyBook. I decided to take it apart and remove the SATA drive within. When I opened it I realized that the hard drive itself was so hot that the metal burned me on contact. Obviously the hard drives should not be so hot that you can boil an egg on them. I thought maybe since my computer has fans blowing at the hard drive bays, I can put the drive inside my computer. For those not familiar with the MyBook external drive, the enclosure has no cooling what so ever.


I was able to put the drive inside my computer and everything was running great, until a couple days ago. I downloaded 20 gigs of data that I needed to transfer to this 500GB drive that I got out of the MyBook. I started the transfer and then saw the approximate copy time was 4 hours. The transfer rate was 1.5MB/s (Megabytes, not Megabits). Obviously this is ridiculously slow over the SATA interface even on the low RPM WD Green drive.


My next move was to try to figure out if maybe there is some issue with Windows 7, or with drivers, or maybe anything else that is creating a bottleneck on the transfer. I did a reboot into Ubuntu with a LiveCD, mounted the necessary drives, and started the transfer. Exact same situation, 1.5MB/s and not a kilobyte faster.


I rebooted back into windows and ran a test on both my main OS drive and the WD drive having this issue. The results are as follows:


OS Drive (Best of 5 Tests, 100MB Test Data Size)


Sequential Read: 49.01 MB/s
Sequential Write: 48.28 MB/s
Random Read 512K Chunks: 19.51 MB/s
Random Write 512K Chunks: 23.97 MB/s
Random Read 4K Chunks: 0.298 MB/s
Random Write 4K Chunks: 1.132 MB/s

Faulty WD Drive (Best of 5 Tests, 100MB Test Data Size)


Sequential Read: 41.82 MB/s
Sequential Write: 1.060 MB/s
Random Read 512K Chunks: 25.83 MB/s
Random Write 512K Chunks: 1.250 MB/s
Random Read 4K Chunks: 0.395 MB/s
Random Write 4K Chunks: 1.575 MB/s

Obviously something is horribly wrong. The read speed is completely fine, probably what it always was. The write speed on the other hand is abysmal. What could possibly cause the write speed to slow down so much and not affect the read speed? It is worthy of noting that once the data is written at this very low speed it is written correctly. There is no file corruption. If I run a S.M.A.R.T. Quick Test (Using Western Digital Data LifeGuard Diagnostics software), I get this error message:


Quick Test on drive 1 did not complete!
Status code = 07 (Failed read test element), Failure Checkpoint = 97 (Unknown Test)
SMART self-test did not complete on drive 1!


But on the actual main screen of the utility it says that the drive SMART Status is PASS.


That was a lot of information but hopefully someone knowledgeable with Hard Drives can help me with a couple questions.


1. What is going on? Did the heat in the external enclosure damage the drive?
2. Why can I still use the drive without any problems other than a slow write speed?
3. What is going on with this SMART Status and what does that Quick Test result mean?
4. Should I expect this drive to die on me any second?


Pretty much any other input anyone has would be great. I really want to diagnose what happened and unfortunately I don't have that much experience with Hard Drive issues.


Answer




What is going on? Did the heat in the external enclosure damage the drive?



It is impossible to say for sure, as we can't for sure say if the heat is a symptom or a cause. I would tend however to support your thesis, though, as heat affects magnetic properties of materials.


Although it is unlikely that your HD reached any close to the curie temperature, heat can still weaken the magnetic proprieties of materials, as cold improves them. It might be (but this is just an hypothesis) that the magnetic proprieties of the disk surface or of the writing heads have weakened.


Also, heat can have deformed the physical shape of some component, making them less effective (for example a writing head which is now further away from the disk surface).


A test you could do is to wrap your HD in a watertight plastic bag and immerse it in a bowl of crumbled ice (even better: you can mix the crumbled ice with salt, that brings the temperature down to circa -21°C) and repeat the tests there. You might notice an increase of performance.


Incidentally his is a technique that - through contraction of the materials - is also useful to unstuck movable parts (which does not seem your problem, as normally a stuck movable part means no read and no write capabilities at all).


Another common cause of disk failure is vibration. Vibration brings lack of precision in the moving parts, tear of joints, wrong alignments between heads and disk surface, and so on and so forth. In case something is now impeding the disk to revolve smoothly, you would for sure have extra heat generated by both the friction and the increased power used by the engine to keep up the rotation at the same speed. In this scenario heat would be a symptom rather than the cause of your problems.



Why can I still use the drive without any problems other than a slow write speed?



With a metaphor: for the same reasons for which you will go faster on a well lubricated bicycle than on a rusty one. Modern hard drives are smart enough to compensate for hardware problems, so - unless a core component is broken - they will find a way to keep running (this is so because HD's obsolete very quickly, and if they would stop working at the first writing error or corrupted sector, you would be changing your HD every few weeks).



What is going on with this SMART Status and what does that Quick Test result mean?



Unless you find some official documentation, this is a question one might only infer the answer to. You can pick your favorite one: from marketing reasons (so you do not immediately notice defects!) to human mistakes (it's just a bug, it should report "not passed") transiting by design ones ("pass" means the HD is still usable, the test that fails signal the fact a non-essential subsistem is broken)



Should I expect this drive to die on me any second?



Again: you can never know for sure. I have still a 5 Gb unit from the 90's up and running, for example. But consider this: you normally would keep backups of a totally healthy HD because it might - all of a sudden - fail. Now, you have an HD with visible signs of bad health status, heating up like crazy, having degraded performance and failing tests... if I were you, I would definitively hope for the best but prepare for the worst!


Hope this helps, and if you try the cryo trick (the ice thing) I would be very interested to know the outcome of it. Best luck!


No comments:

Post a Comment

hard drive - Leaving bad sectors in unformatted partition?

Laptop was acting really weird, and copy and seek times were really slow, so I decided to scan the hard drive surface. I have a couple hundr...