飞牛 NAS硬盘出现Reallocated Sector Ct备份扇区分配分析
今天早上 nas 出现 Reallocated Sector Ct 告警 

备用扇区已经被用掉一个了。这四块盘的来源是从拼夕夕上买的,企业库存盘,通电 0 小时的。
所以用的时间并不长,现在出现坏道了,需要做一下分析再找卖家稳稳怎么个事。
分析
ssh 到 fnos 后台,用序列号找出对应在 Linux 中的磁盘
sudo smartctl --scan | awk '{print $1}' | while read dev; do
echo -n "$dev: "
sudo smartctl -i "$dev" | grep "Serial Number" | awk '{print $3}'
done

出现问题的盘是ZC15GDG1 对应/dev/sdc
接下来就是看看完整的 smart 信息了
smartctl -a /dev/sdcroot@NAS:~# smartctl -a /dev/sdc
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.12.18-trim] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Enterprise Capacity 3.5 HDD
Device Model: ST4000NM0265-2DC107
Serial Number: ZC15GDG1
LU WWN Device Id: 5 000c50 0b0028f77
Add. Product Id: DELL(tm)
Firmware Version: DB34
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database 7.3/5770
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Fri Jul 4 08:54:48 2025 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 249) Self-test routine in progress...
90% of test remaining.
Total time to complete Offline
data collection: ( 90) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 403) minutes.
Conveyance self-test routine
recommended polling time: ( 3) minutes.
SCT capabilities: (0x50bd) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x010f 080 064 044 Pre-fail Always - 90376105
3 Spin_Up_Time 0x0103 088 088 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 11
5 Reallocated_Sector_Ct 0x0133 100 100 010 Pre-fail Always - 1
7 Seek_Error_Rate 0x000f 082 060 045 Pre-fail Always - 176707062
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 791
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 5
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 0 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 057 047 040 Old_age Always - 43 (Min/Max 29/46)
191 G-Sense_Error_Rate 0x0032 099 099 000 Old_age Always - 3548
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 17
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 159
194 Temperature_Celsius 0x0022 043 053 000 Old_age Always - 43 (0 28 0 0 0)
195 Hardware_ECC_Recovered 0x001a 025 001 000 Old_age Always - 90376105
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 1
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 767h+41m+22.645s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 11700990479
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 12071513596
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Self-test routine in progress 90% 791 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
根据提供的 smartctl -a /dev/sdc 完整报告,以下是针对 Seagate ST4000NM0265(序列号 ZC15GDG1) 的详细分析和操作建议:
1. 关键SMART指标分析
| 属性ID | 名称 | 当前值 | 阈值 | RAW值 | 状态分析 |
|---|---|---|---|---|---|
| 5 | Reallocated_Sector_Ct | 100 | 10 | 1 | 已有 1个扇区被重映射,但数值较低(<10),暂不构成严重威胁。 |
| 197 | Current_Pending_Sector | 100 | 0 | 0 | 无待处理扇区,说明没有新增疑似坏道。 |
| 198 | Offline_Uncorrectable | 100 | 0 | 0 | 无无法修复的坏扇区,硬件状态良好。 |
| 199 | UDMA_CRC_Error_Count | 200 | 0 | 0 | 数据线/接口无错误,排除连接问题。 |
| 190 | Airflow_Temperature_Cel | 57 | 40 | 43°C | 温度略高(43°C),建议改善散热(硬盘工作温度建议≤40°C)。 |
| 1 | Raw_Read_Error_Rate | 80 | 44 | 90376105 | 原始值高是希捷硬盘的正常现象,无需担心。 |
2. 当前硬盘状态总结
- 健康状态:
PASSED(SMART自检通过)。 - 坏扇区:仅有 1个重映射扇区,无新增待处理坏道。
- 温度:43°C(需优化散热)。
- 运行时间:791小时(约1个月),属于较新硬盘。
- 正在进行的测试:报告显示 Extended offline(长测试) 已完成10%,建议等待测试完成。
3. 下一步操作建议
(1) 等待长测试完成
当前硬盘正在运行 Extended offline(长测试),剩余90%:
# 查看测试进度
sudo smartctl -l selftest /dev/sdc
# 测试完成后检查结果
sudo smartctl -H /dev/sdc- 如果测试结果仍为
PASSED且Reallocated_Sector_Ct未增加,硬盘可继续使用。 - 如果测试失败或新增坏道,需备份数据并考虑更换。
(2) 监控温度
- 检查硬盘散热环境(如风扇是否正常,硬盘间距是否足够)。
- 可通过以下命令实时监控温度:
watch -n 5 'sudo smartctl -A /dev/sdc | grep Temperature_Celsius'(3) 定期检查SMART状态
建议每周检查一次关键指标:
sudo smartctl -A /dev/sdc | grep -e "Reallocated_Sector_Ct" -e "Current_Pending_Sector" -e "Temperature_Celsius"(4) 备份重要数据(预防性措施)
即使当前问题不严重,也建议备份数据(尤其是重映射扇区可能预示早期老化):
# 使用dd或rsync备份数据到其他硬盘
sudo rsync -avh /path/to/data /mnt/backup/4. 是否需要立即更换硬盘?
- 无需立即更换:当前仅1个重映射扇区且无其他错误,硬盘仍可安全使用。
- 需警惕的情况:如果未来出现以下变化,需立即行动:
Reallocated_Sector_Ct持续增加。Current_Pending_Sector> 0。- 频繁I/O错误或性能下降。
5. 优化建议
- 改善散热:确保硬盘通风良好,必要时增加风扇。
- 避免频繁启停:减少
Start_Stop_Count(当前为11次,正常范围)。 - 启用SMART自动监控(示例配置):
# 添加到cron(每周日检查)
echo "0 0 * * 0 sudo smartctl -H -A /dev/sdc | mail -s 'SMART Report sdc' your@email.com" | sudo tee -a /etc/crontab总结
- 现状:硬盘状态良好,仅1个历史坏道,温度略高。
- 建议操作:等待长测试完成 → 优化散热 → 定期监控 → 备份数据。
- 后续关注:重点关注
Reallocated_Sector_Ct和温度变化。