Monday, April 1, 2019

ubuntu - Home server hard drive: 186k start-stop cycles in 325 days?


I set up a home server about a year ago, using Ubuntu server (10.04 LTS at the moment), four disks in RAID 5 for storage (WD Green 1.5 TB) and a laptop drive for the OS.


Today the output of smartctl, a command line utility for checking the SMART attributes of a hard drive, tells me that the primary OS drive has had no less than 186,000 start-stop cycles in 325 days and may be nearing the end of its lifespan.



The smartctl output is in
"normalized values", in this case a
number between 200 and 000, where 200
is "brand new" and 000 means
"worn out". My disk gets 001.



So I wonder what happened: 186k start/stop cycles in 7820 hours is about one start/stop per 2.5 minutes around the clock. This seems somewhat excessive for a computer that sees actual use once or twice per day. (The RAID disks are normal, averaging to one start/stop per day, as expected.)


Does anyone have similar experiences, or pointers to what might be the issue here?


Specifically I'd like to know



  • Why the massive start/stop count? Do I have some sort of configuration issue? Could there be a background service that is causing trouble?

  • Could having a laptop disk as the OS drive be part of the problem? Can anyone confirm or deny this?


Here is the /etc/hdparm.conf configuration


/dev/sda {
apm = 127
spindown_time = 120
}

and the most relevant parts of smartctl --attributes /dev/sda:


smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 001 001 000 Old_age Always - 185875
9 Power_On_Hours 0x0032 090 090 000 Old_age Always - 7820
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 109
193 Load_Cycle_Count 0x0032 118 118 000 Old_age Always - 246833
194 Temperature_Celsius 0x0022 107 098 000 Old_age Always - 36

As I generally prefer my drives to last more than a year, any advice is appreciated.


Update


Apparently the "apm = 127" setting in hdparm.com was the problem. Commenting out the settings I get the default, 254, and the disk never spins down at all.


That's not quite what I was looking for either, I'll have to see if I can find a middle ground somewhere. Still, the problem from the title of this post is solved. Thanks for your help.


Some more detail for the next person with similar problems:



apm is Advanced Power Management, a
value 1-255. Higher values mean "more
performance", lower values "more power
saving", 255 is "disabled".


I had picked 127 as the "highest
performance that still allows disk
spindown" according to hdparm man
pages, as I wanted the disk to go to
sleep when the server was not in use.


What it got me was the
manufacturer's 20-second default
spindown time for this particular
drive (a WD Scorpio Blue), a
fair enough default for a laptop
running on batteries.


With the OS
writing to disk all the time (system
logs and such, whether or not the
computer is in actual use), the disk
would barely fall asleep before being
awoken again, and I got the start/stop
every 20 seconds behavior. My attempt at increasing
the spindown time (I had set it to 10
minutes) was apparently ignored by the
drive.


At some point I had installed
laptop-mode, which caches disk
reads/writes in memory, so the OS was
only supposed to write to disk every
couple of hours.


The primary problem in this case was
that laptop-mode stopped
working after an upgrade - it is still
listed as a service to start at
bootup, but it no longer starts. And I
had more or less forgotten about it
and didn't think of checking.


At least I know where to look now, thanks again for your input.



Answer



Some things to check for:



  • Is the problem occurring now? (sample the drive, wait a day, sample again, and see if it increases noticeably (say once for every 2.5 minutes in a day)

  • Is the problem occurring for all of the disks, or just one?

  • What is the power configuration for the computer? Power saving, or no? Spin down the disks, or no? Check hdparm -B and hdparm -S (and read the man page for information on how to interpret the data)


If the problem isn't happening now, I do recall a bug that was reported related to hard disks spinning down and up repeatedly in Ubuntu, but it may have been a while ago. You might investigate that, see if maybe it was fixed in an upgrade.


If the problem is only for one disk, you have to ask what is special about that disk.


If the settings above don't match your needs, they may be related to the problem, or even the culprit.


No comments:

Post a Comment

hard drive - Leaving bad sectors in unformatted partition?

Laptop was acting really weird, and copy and seek times were really slow, so I decided to scan the hard drive surface. I have a couple hundr...