Sunday, February 21, 2016

What to do with a new disk?

Today two replacement SATA disks arrived from my favorite supplier. Reason enough to briefly summarize what I do when I get fresh disks: Maybe someone else can learn from the DOA mistakes of my youth when I trusted that a new disk would just work only to find that when I needed it, all it would do is "click click click" and that was that.

If you go through more disks than the average person, for example because you run a bunch of RAID arrays, I would first recommend that you get yourself a suitable docking station. Here's what I use:

UNITEK Dual Bay USB Docking Station
UNITEK Dual Bay USB Docking Station

I got mine from newegg.com of course. There are plenty of alternatives, little USB-to-SATA adapters or hotswap bays that mount in your machine's case, but none of those beat a decent docking station for convenience and versatility. (Note that I never use the "clone" feature of that thing, although I hear that it works fine.)

So unpack your new disks and do a quick physical inspection. If your supplier is decent at all, the packaging will be so good that it's extremely unlikely that you'll get something that's mechanically broken on the outside, so a glance is usually enough. Then slap them into your docking station and power it up. Open a terminal and do a quick check with dmesg:

[51407.603023] usb 1-1: new high-speed USB device number 4 using ehci-pci
[51407.718674] usb 1-1: New USB device found, idVendor=152d, idProduct=2551
[51407.718678] usb 1-1: New USB device strings: Mfr=1, Product=11, SerialNumber=3
[51407.718679] usb 1-1: Product: USB Mass Storage
[51407.718681] usb 1-1: Manufacturer: JMicron
[51407.718682] usb 1-1: SerialNumber: 00000000000000
[51407.719191] usb-storage 1-1:1.0: USB Mass Storage device detected
[51407.719353] scsi host6: usb-storage 1-1:1.0
[51408.142938] usbcore: registered new interface driver uas
[51409.223243] scsi 6:0:0:0: Direct-Access     HDD                       0000 PQ: 0 ANSI: 2 CCS
[51409.224849] scsi 6:0:0:1: Direct-Access     HDD                       0000 PQ: 0 ANSI: 2 CCS
[51409.225220] sd 6:0:0:0: Attached scsi generic sg6 type 0
[51409.225355] sd 6:0:0:1: Attached scsi generic sg7 type 0
[51409.229108] sd 6:0:0:0: [sdf] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
[51409.229476] sd 6:0:0:1: [sdg] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
[51409.230360] sd 6:0:0:0: [sdf] Write Protect is off
[51409.230365] sd 6:0:0:0: [sdf] Mode Sense: 28 00 00 00
[51409.231358] sd 6:0:0:1: [sdg] Write Protect is off
[51409.231363] sd 6:0:0:1: [sdg] Mode Sense: 28 00 00 00
[51409.232350] sd 6:0:0:0: [sdf] No Caching mode page found
[51409.232355] sd 6:0:0:0: [sdf] Assuming drive cache: write through
[51409.233613] sd 6:0:0:1: [sdg] No Caching mode page found
[51409.233616] sd 6:0:0:1: [sdg] Assuming drive cache: write through
[51409.286473] sd 6:0:0:0: [sdf] Attached SCSI disk
[51409.287473] sd 6:0:0:1: [sdg] Attached SCSI disk

Alright, looks like both disks are there having been recognized when the docking station powered up. Good! Now go ahead and check the details with smartctl:

# smartctl -i /dev/sdf -d sat
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.1.12-gentoo] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST1000DM003-1SB10C
Serial Number:    Z9A0GYZ0
LU WWN Device Id: 5 000c50 08774c950
Firmware Version: CC43
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat Feb 20 17:34:31 2016 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
# smartctl -i /dev/sdg -d sat
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.1.12-gentoo] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Blue
Device Model:     WDC WD10EZEX-00WN4A0
Serial Number:    WD-WMC6Y0F4UPT5
LU WWN Device Id: 5 0014ee 0aec80cac
Firmware Version: 01.01A01
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat Feb 20 17:34:57 2016 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Good! Notice that I had to use the "-d sat" option to tell smartctl that there's really a SATA drive hiding behind all the USB stuff. (Took me a while to realize that I can do that, I used to think that SMART just doesn't work at all over USB.)

What you want to be looking for is the "SMART support is:" line. It's almost universally true today that SMART will be enabled by default, unlike back in 2002. But it's still good to check. In case it's not enabled, enable it. In case your disk doesn't support SMART at all, well, why did you order it? To enable SMART you'd say something like

# smartctl -s on /dev/sdf -d sat

but again, hopefully you won't have to. Alright, after all this prep work, we finally get to the point of all this: You want to run the basic SMART tests that all modern drives support. Note that especially the long test can take a really long time, so do this when you're sure you won't need the docking station for something else. First run the short tests:

# smartctl -t short /dev/sdf -d sat
# smartctl -t short /dev/sdg -d sat

Yes, you can easily run these in parallel because the disk is doing its own testing, your machine only told it to get going. For a 1 TB disk, the short test takes about a minute, but if you're impatient, you can check on the progress of the test as follows:

# smartctl -a /dev/sdf -d sat
...
=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
...

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
...

Self-test execution status:      ( 246)    Self-test routine in progress... 60% of test remaining.
...

There's a lot more output than that, I just put "..." instead to keep things simple. (Actually you can get even more output with -x instead of -a if you really want.) After waiting for your minute, you can check on the outcome of the test with the same command. Toward the bottom of the output you'll hopefully find a line like the following:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline       Completed without error       00%         0         -

This indicates that the short test succeeded. Check the other disk as well, then get ready for the long test:

# smartctl -t long /dev/sdf -d sat
# smartctl -t long /dev/sdg -d sat

Same procedure as before, except that this time you can expect to wait for about two hours for a 1 TB disk. Hopefully the long test also works out just fine.

And there you have it, the minimum amount of testing I do on replacement disks these days before I put them on the shelf as I wait for a RAID array to fail. Of course if you have plans to encrypt the data on these disks you can do more "testing" by filling them up with random data now before you shelf them away.

3 comments:

  1. Wow, it took me a long time to finally figure out how to get it to work on OSX. But I did it! :)

    ReplyDelete
  2. Sadly, OSX doesn't support SMART over USB unless you install a third party driver. The whole story is here: https://alexschroeder.ch/wiki/2016-02-21_Disk_Checking
    I'm running the long tests right now. :)

    ReplyDelete