Skip to content

Broken disk went undetected, but did corrupt data

I had a disk in one of my servers that was starting to give ATA errors in the syslog. Contrary to what you might think, ATA errors are fairly common, so I didn’t immediately sound the alarm. However, this disk turned out to be corrupting data. During upgrading Debian 6 to 7, the file system became read-only. Rebooting gave me a recovery shell and e2fsck gave me millions of questions.

In the end, I had to recreate the FS and restore from backup.

For the record, this was the error in question (although, this error can also be harmless):

[2013-09-01 01:32:19]  ata1: lost interrupt (Status 0x51)
[2013-09-01 01:32:19]  ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[2013-09-01 01:32:19]  ata1.01: failed command: READ DMA EXT
[2013-09-01 01:32:19]  ata1.01: cmd 25/00:00:3f:43:9c/00:04:05:00:00/f0 tag 0 dma 524288 in
[2013-09-01 01:32:19]           res 40/00:00:11:00:00/00:00:00:00:00/10 Emask 0x4 (timeout)
[2013-09-01 01:32:19]  ata1.01: status: { DRDY }
[2013-09-01 01:32:19]  ata1: soft resetting link
[2013-09-01 01:32:20]  ata1.00: configured for UDMA/133
[2013-09-01 01:32:20]  ata1.01: configured for UDMA/33
[2013-09-01 01:32:20]  ata1: EH complete

Port is ata 1.1. In other words, sdb; first controller, second disk (nice mixup of zero and one based counters; at first I thought it was sda).

The disk was a Western Digital WDC WD5000ABYS-01TNA0.

    No Comments ( Add comment / trackback )