Доброго времени суток,
Недавно появились какие-то непонятные зависания машины при io операциях. Заглянул в логи, а там постоянно появляются сообщения вида
[ 8.242182] ata1.00: exception Emask 0x0 SAct 0x40000 SErr 0x0 action 0x0
[ 8.242187] ata1.00: irq_stat 0x40000008
[ 8.242191] ata1.00: failed command: READ FPDMA QUEUED
[ 8.242199] ata1.00: cmd 60/08:90:f8:10:c4/00:00:03:00:00/40 tag 18 ncq 4096 in
[ 8.242199] res 51/40:08:10:10:c4/00:00:04:00:00/40 Emask 0x409 (media error) <F>
[ 8.242202] ata1.00: status: { DRDY ERR }
[ 8.242204] ata1.00: error: { UNC }
[ 8.262969] ata1.00: configured for UDMA/133
[ 8.262985] sd 0:0:0:0: [sda] Unhandled sense code
[ 8.262988] sd 0:0:0:0: [sda]
[ 8.262990] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 8.262993] sd 0:0:0:0: [sda]
[ 8.262994] Sense Key : Medium Error [current] [descriptor]
[ 8.262997] Descriptor sense data with sense descriptors (in hex):
[ 8.262999] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[ 8.263008] 04 c4 10 10
[ 8.263012] sd 0:0:0:0: [sda]
[ 8.263014] Add. Sense: Unrecovered read error - auto reallocate failed
[ 8.263016] sd 0:0:0:0: [sda] CDB:
[ 8.263018] Read(10): 28 00 03 c4 10 f8 00 00 08 00
[ 8.263026] end_request: I/O error, dev sda, sector 63181048
[ 8.263043] ata1: EH complete
И, собственно, в момент "зависания":
[ 658.548085] ata1.00: failed command: WRITE FPDMA QUEUED
[ 658.548088] ata1.00: cmd 61/08:98:a0:62:01/00:00:04:00:00/40 tag 19 ncq 4096 out
[ 658.548088] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 658.548089] ata1.00: status: { DRDY }
[ 658.548090] ata1.00: failed command: WRITE FPDMA QUEUED
[ 658.548093] ata1.00: cmd 61/08:a0:b8:dd:01/00:00:04:00:00/40 tag 20 ncq 4096 out
[ 658.548093] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 658.548095] ata1.00: status: { DRDY }
[ 658.548096] ata1.00: failed command: READ FPDMA QUEUED
[ 658.548099] ata1.00: cmd 60/20:a8:98:7b:0c/00:00:01:00:00/40 tag 21 ncq 16384 in
[ 658.548099] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 658.548101] ata1.00: status: { DRDY }
[ 658.548102] ata1.00: failed command: READ FPDMA QUEUED
[ 658.548105] ata1.00: cmd 60/08:b0:a8:99:2d/00:00:02:00:00/40 tag 22 ncq 4096 in
[ 658.548105] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 658.548106] ata1.00: status: { DRDY }
[ 658.548108] ata1.00: failed command: READ FPDMA QUEUED
[ 658.548111] ata1.00: cmd 60/08:b8:b8:b0:08/00:00:04:00:00/40 tag 23 ncq 4096 in
[ 658.548111] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 658.548112] ata1.00: status: { DRDY }
[ 658.548113] ata1.00: failed command: READ FPDMA QUEUED
[ 658.548116] ata1.00: cmd 60/40:c0:08:b5:08/00:00:04:00:00/40 tag 24 ncq 32768 in
[ 658.548116] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 658.548118] ata1.00: status: { DRDY }
uname -a
Linux Anton-Notebook 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
SSD: Corsair Force 3
*-disk
описание: ATA Disk
продукт: Corsair Force 3
физический ID: 0.0.0
сведения о шине: scsi@0:0.0.0
логическое имя: /dev/sda
версия: 5.05
серийный №: 1240791900009707002F
размер: 447GiB (480GB)
возможности: partitioned partitioned:dos
кофигурация: ansiversion=5 sectorsize=512 signature=0000c10b
*-volume:0
описание: Том EXT4
производитель: Linux
физический ID: 1
сведения о шине: scsi@0:0.0.0,1
логическое имя: /dev/sda1
логическое имя: /
логическое имя: /var/lib/docker/aufs
версия: 1.0
серийный №: 25dc11da-4e2a-45fb-af0b-fd21f8747506
размер: 37GiB
capacity: 37GiB
возможности: primary journaled extended_attributes large_files huge_files dir_nlink recover extents ext4 ext2 initialized
кофигурация: created=2013-01-30 03:32:28 filesystem=ext4 lastmountpoint=/ modified=2014-09-06 11:16:25 mount.fstype=ext4 mount.options=rw,noatime,discard,data=ordered mounted=2014-09-06 11:16:25 state=mounted
*-volume:1
описание: Windows NTFS volume
физический ID: 2
сведения о шине: scsi@0:0.0.0,2
логическое имя: /dev/sda2
версия: 3.1
серийный №: 1e90-29a4
размер: 94MiB
capacity: 100MiB
возможности: primary bootable ntfs initialized
кофигурация: clustersize=4096 created=2013-01-30 01:05:11 filesystem=ntfs label=Зарезервировано системой state=clean
*-volume:2
описание: Windows NTFS volume
физический ID: 3
сведения о шине: scsi@0:0.0.0,3
логическое имя: /dev/sda3
версия: 3.1
серийный №: 82f5e4af-88c8-b54c-a6b9-7f9f55b0277c
размер: 93GiB
capacity: 93GiB
возможности: primary ntfs initialized
кофигурация: clustersize=4096 created=2013-01-30 01:05:30 filesystem=ntfs state=clean
*-volume:3
описание: Extended partition
физический ID: 4
сведения о шине: scsi@0:0.0.0,4
логическое имя: /dev/sda4
размер: 316GiB
capacity: 316GiB
возможности: primary extended partitioned partitioned:extended
*-logicalvolume
описание: Linux filesystem partition
физический ID: 5
логическое имя: /dev/sda5
логическое имя: /media/data
capacity: 316GiB
кофигурация: mount.fstype=ext4 mount.options=rw,noatime,discard,data=ordered state=mounted
root@kali:~# smartctl -a /dev/sda
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.14-kali1-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Device Model: Corsair Force 3 SSD
Serial Number: 1240791900009707002F
LU WWN Device Id: 0 000000 000000000
Firmware Version: 5.05
User Capacity: 480,103,981,056 bytes [480 GB]
Sector Size: 512 bytes logical/physical
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ACS-2 revision 3
Local Time is: Sat Sep 6 05:48:04 2014 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 48) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x0021) SCT Status supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 070 070 050 Pre-fail Always - 2589867935029
5 Reallocated_Sector_Ct 0x0033 100 100 003 Pre-fail Always - 16
9 Power_On_Hours 0x0032 092 092 000 Old_age Always - 42348377545808
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 1
171 Unknown_Attribute 0x0032 000 000 000 Old_age Always - 0
172 Unknown_Attribute 0x0032 000 000 000 Old_age Always - 0
174 Unknown_Attribute 0x0030 000 000 000 Old_age Offline - 66
177 Wear_Leveling_Count 0x0000 000 000 000 Old_age Offline - 1
181 Program_Fail_Cnt_Total 0x0032 000 000 000 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 000 000 000 Old_age Always - 0
187 Reported_Uncorrect 0x0032 000 000 000 Old_age Always - 1098
194 Temperature_Celsius 0x0022 030 063 000 Old_age Always - 30 (Min/Max 30/63)
195 Hardware_ECC_Recovered 0x001c 120 120 000 Old_age Offline - 2655541
196 Reallocated_Event_Count 0x0033 100 100 003 Pre-fail Always - 16
201 Soft_Read_Error_Rate 0x001c 120 120 000 Old_age Offline - 2655541
204 Soft_ECC_Correction 0x001c 120 120 000 Old_age Offline - 2655541
230 Head_Amplitude 0x0013 100 100 000 Pre-fail Always - 100
231 Temperature_Celsius 0x0013 100 100 010 Pre-fail Always - 0
233 Media_Wearout_Indicator 0x0000 000 000 000 Old_age Offline - 4925
234 Unknown_Attribute 0x0032 000 000 000 Old_age Always - 4951
241 Total_LBAs_Written 0x0032 000 000 000 Old_age Always - 4951
242 Total_LBAs_Read 0x0032 000 000 000 Old_age Always - 6149
SMART Error Log not supported
SMART Self-test Log not supported
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.