I had a disk in one of my servers that was starting to give ATA errors in the syslog. Contrary to what you might think, ATA errors are fairly common, so I didn’t immediately sound the alarm. However, this disk turned out to be corrupting data. During upgrading Debian 6 to 7, the file system became read-only. Rebooting gave me a recovery shell and e2fsck gave me millions of questions.
In the end, I had to recreate the FS and restore from backup.
For the record, this was the error in question (although, this error can also be harmless):
[2013-09-01 01:32:19] ata1: lost interrupt (Status 0x51) [2013-09-01 01:32:19] ata1.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [2013-09-01 01:32:19] ata1.01: failed command: READ DMA EXT [2013-09-01 01:32:19] ata1.01: cmd 25/00:00:3f:43:9c/00:04:05:00:00/f0 tag 0 dma 524288 in [2013-09-01 01:32:19] res 40/00:00:11:00:00/00:00:00:00:00/10 Emask 0x4 (timeout) [2013-09-01 01:32:19] ata1.01: status: { DRDY } [2013-09-01 01:32:19] ata1: soft resetting link [2013-09-01 01:32:20] ata1.00: configured for UDMA/133 [2013-09-01 01:32:20] ata1.01: configured for UDMA/33 [2013-09-01 01:32:20] ata1: EH complete
Port is ata 1.1. In other words, sdb; first controller, second disk (nice mixup of zero and one based counters; at first I thought it was sda).
The disk was a Western Digital WDC WD5000ABYS-01TNA0.
Recent Comments