How to Check Disk Health with `smartctl` on Arch Linux

How to Check Disk Health with smartctl on Arch Linux

Monitoring the health of your storage devices is a crucial aspect of system administration and desktop maintenance. Disks, whether HDDs or SSDs, are prone to failure over time due to mechanical wear or flash memory degradation. Luckily, most modern drives support SMART (Self-Monitoring, Analysis, and Reporting Technology), a monitoring system included in the firmware of storage devices. On Arch Linux, the smartmontools package and its utility smartctl provide the tools you need to assess disk health.

In this article, we’ll explore how to install and use smartctl to monitor your disks on Arch Linux. We’ll go over key SMART attributes, interpret common outputs, and schedule regular health checks.


What is SMART?

SMART is a system built into most modern HDDs and SSDs that continuously monitors various parameters of the drive, such as reallocated sectors, temperature, read/write errors, and more. These metrics help predict impending drive failure and inform the user about overall disk reliability.

However, SMART only works if you proactively check it. That’s where smartctl comes in.


Installing smartmontools on Arch Linux

Before using smartctl, ensure the necessary tools are installed. On Arch Linux, this is simple:

sudo pacman -S smartmontools

This installs smartctl, which is the command-line utility used to interact with SMART-enabled devices, and the smartd daemon, which can automate SMART monitoring and send alerts.


Checking if a Disk Supports SMART

Not all disks support SMART, especially older or cheap USB drives. To check if a device supports SMART, run:

sudo smartctl -i /dev/sdX

Replace /dev/sdX with your actual device (like /dev/sda or /dev/nvme0n1).

Example output:

smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.8.1-arch1-1] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Samsung SSD 980 NVMe Series
Device Model:     Samsung SSD 980 1TB
Firmware Version: 5B2QGXA7
...
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

If SMART support is available but not enabled, you can enable it with:

sudo smartctl -s on /dev/sdX

Running a Basic Health Check

To get a quick summary of the drive’s health:

sudo smartctl -H /dev/sdX

Example output:

SMART overall-health self-assessment test result: PASSED

A “PASSED” result is good news. If it says “FAILED” or “UNKNOWN,” your drive may be experiencing issues or SMART may be improperly configured.


Getting Detailed SMART Data

For a more comprehensive report:

sudo smartctl -A /dev/sdX

You’ll see a list of SMART attributes. Here’s an example snippet from an HDD:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       7529
194 Temperature_Celsius     0x0022   046   054   000    Old_age   Always       -       46 (Min/Max 22/54)

Let’s break down the important columns:

  • ID: Attribute identifier.
  • ATTRIBUTE_NAME: Description of the parameter.
  • VALUE: Normalized value (usually 1–100 or 1–200, higher is better).
  • WORST: Lowest recorded value.
  • THRESH: Threshold; crossing it indicates failure.
  • TYPE: Indicates whether it is a pre-fail or old-age metric.
  • WHEN_FAILED: If this field is populated, the value crossed its threshold.
  • RAW_VALUE: The actual value reported by the drive; interpretation varies by manufacturer.

Key SMART Attributes to Watch

While many SMART attributes are recorded, some are more critical than others:

1. Reallocated_Sector_Ct

Indicates how many sectors have been moved due to read/write errors. A non-zero value may signal impending failure.

2. Current_Pending_Sector

Sectors that are waiting to be reallocated. If these increase, data integrity may be compromised.

3. Offline_Uncorrectable

Indicates sectors that cannot be read during an offline scan. Should be zero.

4. Temperature_Celsius

Higher temperatures can shorten disk lifespan. Ideal values vary by model, but 30–50°C is typical.

5. Power_On_Hours

Tracks the age of the drive. Useful for determining wear, especially for SSDs.

6. Wear_Leveling_Count (on SSDs)

Estimates the amount of wear experienced. SSDs have a finite number of program/erase cycles.

7. Media_Wearout_Indicator (on some Intel SSDs)

Starts at 100 and decreases to 0 as the device wears out.


Running Self-Tests

SMART drives support built-in self-tests to validate their health.

Short Test (~1-2 minutes)

sudo smartctl -t short /dev/sdX

To check the results:

sudo smartctl -l selftest /dev/sdX

Long Test (~10+ minutes)

A more thorough test that checks the entire disk:

sudo smartctl -t long /dev/sdX

Note: Tests run in the background. You’ll need to wait until the test completes before viewing results.

Example output of self-test log:

# 1  Short offline       Completed without error       00%     13456

If the test shows errors, it may be time to back up data and replace the disk.


Checking SMART on NVMe SSDs

NVMe drives use a different interface, so the syntax is slightly different:

Basic Info

sudo smartctl -i /dev/nvme0n1

Health Summary

sudo smartctl -H /dev/nvme0n1

Detailed Health Report

sudo smartctl -a /dev/nvme0n1

Typical NVMe attributes include:

  • Percentage Used: How much of the drive’s lifespan has been consumed.
  • Data Units Written: Useful for tracking SSD wear.
  • Media and Data Integrity Errors: Should be zero.
  • Temperature: Operating temp of the controller.

Scheduling Regular Checks

You can use smartd for automatic health monitoring. To enable the daemon:

  1. Edit the configuration file:
sudo nano /etc/smartd.conf

Add a line like:

/dev/sdX -a -o on -S on -s (S/../.././03|L/../../6/04) -W 4,40,45 -m your@email.com

This config:

  • Enables all SMART features.
  • Runs short tests daily at 3 AM.
  • Runs long tests every Saturday at 4 AM.
  • Sends email alerts on failures.
  • Monitors temperature thresholds.
  1. Enable and start the daemon:
sudo systemctl enable smartd
sudo systemctl start smartd

To test email notifications, simulate an alert or check logs with:

journalctl -u smartd

Troubleshooting

  • SMART not available: Some USB enclosures block SMART passthrough. Consider connecting the drive via SATA.
  • SMART enabled but no data: Try using -d option to specify device type:
sudo smartctl -a -d sat /dev/sdX
  • Access denied: Run as root (sudo).

Conclusion

Using smartctl on Arch Linux is a powerful and reliable way to monitor the health of your storage devices. Whether you’re a system administrator or a curious desktop user, checking SMART data regularly can help prevent data loss and ensure optimal performance.

To summarize:

  • Install smartmontools.
  • Use smartctl -i and -A for basic health checks.
  • Run periodic self-tests.
  • Monitor critical attributes like reallocated sectors, pending sectors, and temperature.
  • Enable smartd for automated monitoring and alerts.

With just a few commands, you gain deep insights into your storage devices and can act before failures occur. Don’t wait for a disaster—start monitoring your drives today.