I have a system with two 2TByte SATA disks configured as a Raid1 array.
There are times when the cpu is waiting for I/O for more than 20% of the time (output from sar
) e.g.
09:25:01 CPU %user %nice %system %iowait %steal %idle
09:35:01 all 57,65 0,00 6,53 25,54 0,05 10,23
15:45:01 all 0,90 0,00 1,47 54,90 0,06 42,68
15:55:04 all 1,74 0,00 1,58 88,52 0,10 8,06
16:25:03 all 0,59 0,00 0,38 24,14 0,05 74,84
23:45:05 all 2,45 0,00 1,43 31,56 0,05 64,50
I collected additional information using atop
which shows, that disk-I/O on one of the raid-disks is at the upper limit (disk sda, busy to 90%) e.g.:
MDD | md1 | busy 0% | | read 10174 | write 425 | | KiB/r 6 | KiB/w 7 | MBr/s 1.2 | | MBw/s 0.1 | avq 0.00 | | avio 0.00 ms |
DSK | sda | busy 90% | | read 9091 | write 507 | | KiB/r 6 | KiB/w 7 | MBr/s 0.9 | | MBw/s 0.1 | avq 1.14 | | avio 5.65 ms |
DSK | sdb | busy 18% | | read 1082 | write 507 | | KiB/r 11 | KiB/w 7 | MBr/s 0.2 | | MBw/s 0.1 | avq 1.39 | | avio 6.82 ms |
The man-page to atop
states:
Such line shows the name (e.g. VolGroup00-lvtmp for a logical volume
or sda for a hard disk), the busy percentage i.e. the portion of time
that the unit was busy handling requests (busy), the number of read
requests issued (read), the number of write requests issued
(write), the number of KiBytes per read (KiB/r), the number of
KiBytes per write (KiB/w), the number of MiBytes per second
throughput for reads (MBr/s), the number of MiBytes per second
throughput for writes (MBw/s), the average queue depth (avq) and
the average number of milliseconds needed by a request (avio) for
seek, latency and data transfer.
Information can be read in parallel from both disks for raid1, but this is not used for a single stream of sequential input according to the md
man page , explaining the fact that the second disk is not fully used
Looking at the MBr/s and MBw/s entries for sda, it looks like the disk is 90% busy with
0.9 + 0.1 MiBytes per second = 1 MiBytes per second = 8 MiBit per second
However, the expected rate for current disks is on the order of 1000 Mbit/s, which is roughly 100 times higher (neglecting the conversion from MiBit to Mbit).
The disks are (ouput of hdparm -I /dev/sda
)
/dev/sda:
ATA device, with non-removable media
Model Number: TOSHIBA DT01ACA200
Serial Number: 54A8UH4GS
Firmware Revision: MX4OABB0
Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0; Revision: ATA8-AST T13 Project D1697 Revision 0b
Standards:
Used: unknown (minor revision code 0x0029)
Supported: 8 7 6 5
Likely used: 8
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 3907029168
Logical Sector size: 512 bytes
Physical Sector size: 4096 bytes
Logical Sector-0 offset: 0 bytes
device size with M = 1024*1024: 1907729 MBytes
device size with M = 1000*1000: 2000398 MBytes (2000 GB)
cache/buffer size = unknown
Form Factor: 3.5 inch
Nominal Media Rotation Rate: 7200
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Advanced power management level: disabled
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_BUFFER command
* NOP cmd
* DOWNLOAD_MICROCODE
Advanced Power Management feature set
Power-Up In Standby feature set
* SET_FEATURES required to spinup after power up
SET_MAX security extension
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
Media Card Pass-Through
* General Purpose Logging feature set
* WRITE_{DMA|MULTIPLE}_FUA_EXT
* 64-bit World wide name
* URG for READ_STREAM[_DMA]_EXT
* URG for WRITE_STREAM[_DMA]_EXT
* WRITE_UNCORRECTABLE_EXT command
* {READ,WRITE}_DMA_EXT_GPL commands
* Segmented DOWNLOAD_MICROCODE
* unknown 119[7]
* Gen1 signaling speed (1.5Gb/s)
* Gen2 signaling speed (3.0Gb/s)
* Gen3 signaling speed (6.0Gb/s)
* Native Command Queueing (NCQ)
* Host-initiated interface power management
* Phy event counters
* NCQ priority information
Non-Zero buffer offsets in DMA Setup FIS
* DMA Setup Auto-Activate optimization
Device-initiated interface power management
In-order data delivery
* Software settings preservation
* SMART Command Transport (SCT) feature set
* SCT Write Same (AC2)
* SCT Error Recovery Control (AC3)
* SCT Features Control (AC4)
* SCT Data Tables (AC5)
Security:
Master password revision code = 65534
supported
not enabled
not locked
frozen
not expired: security count
not supported: enhanced erase
320min for SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 5000039ffac402a6
NAA : 5
IEEE OUI : 000039
Unique ID : ffac402a6
Checksum: correct
Is the output or the man-page for atop
wrong or are the hard disks underperforming very much compared to the expected value or is there a missunderstanding on my side?
Or the wider question is: is my system really limited by the disk I/O capacity?
No comments:
Post a Comment