Smokes your problems, coughs fresh air.

Tag: disk

Broken disk went undetected, but did corrupt data

I had a disk in one of my servers that was starting to give ATA errors in the syslog. Contrary to what you might think, ATA errors are fairly common, so I didn’t immediately sound the alarm. However, this disk turned out to be corrupting data. During upgrading Debian 6 to 7, the file system became read-only. Rebooting gave me a recovery shell and e2fsck gave me millions of questions.

In the end, I had to recreate the FS and restore from backup.

For the record, this was the error in question (although, this error can also be harmless):

[2013-09-01 01:32:19]  ata1: lost interrupt (Status 0x51)
[2013-09-01 01:32:19]  ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[2013-09-01 01:32:19]  ata1.01: failed command: READ DMA EXT
[2013-09-01 01:32:19]  ata1.01: cmd 25/00:00:3f:43:9c/00:04:05:00:00/f0 tag 0 dma 524288 in
[2013-09-01 01:32:19]           res 40/00:00:11:00:00/00:00:00:00:00/10 Emask 0x4 (timeout)
[2013-09-01 01:32:19]  ata1.01: status: { DRDY }
[2013-09-01 01:32:19]  ata1: soft resetting link
[2013-09-01 01:32:20]  ata1.00: configured for UDMA/133
[2013-09-01 01:32:20]  ata1.01: configured for UDMA/33
[2013-09-01 01:32:20]  ata1: EH complete

Port is ata 1.1. In other words, sdb; first controller, second disk (nice mixup of zero and one based counters; at first I thought it was sda).

The disk was a Western Digital WDC WD5000ABYS-01TNA0.

Disabling Intellipark on the WD15EARS

I just got two Western Digital 1.5TB WD15EARS disks. This drive has a feature called intellipark, which parks the head after the disk is not used for a while. This is supposedly a power saving feature. But, as someone explains, it can also severely decrease the lifetime of your drive.

To disable, download wdidle and disable it. This needs to be done from a dos environment. Dos bootdisks should be available for download.

USAGE:
WDIDLE3 [/S[<Timer>]] [/D] [/R] [/?]
where:
/S[<Timer>] Set timer, units in seconds. Default=8.0 (8.0 seconds).
Resolution is 0.1 seconds from 0.1 to 12.7 seconds.
Resolution is 30 seconds from 30 seconds to 300000 seconds.
Note, times between 12.8 and 30 seconds will be set to 30 seconds.
/D          Disable timer.
/R          Report current timer.
/?          This help info.

Disabling it actually causes it to go even more beserk. In a few hours, SMART logged 5000 load_cycles, and the drive was making very funny noises all the time. Instead, I set it to 300 seconds, which effectively disables it.

Someone also made a boot CD with the tool available (wgetting it works, clicking doesn’t). I don’t think the link will be there forever though. (can I attach zips to a blog post?)

The forum post I linked to also has info about TLER, which I’ll get back to.

© 2024 BigSmoke

Theme by Anders NorenUp ↑