DIY server: Good SATA HDDs

- J
- Johny B Good
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Mar 21, 2014 12:05 AM

Are you sure you're not describing Western Digital drives? WD a few years back decided to curse their 'eco green' range with an 8.3 second head unload time out which basically resulted in them reaching the original 300,000 limit within 6 months of run time. They revised the limit to 600,000 cycles but this just meant it would still hit the new limit after just one year's worth of run time.

The only saving grace being that they supplied a little dos utility called wdidle3.exe you could run off a dos bootable disk (floppy or CD or USB pen drive) to up the time out to a max of 300 seconds or even disable it completely (although selecting that option never worked for me whilst it apparently did for others. Go figure!).

You'd have thought that WD would have learnt their lesson from the anguished cries of outrage by many of their 'valued customers' but apparently not if the 4TB WD RED I bought late last year is any guide. I didn't think I'd be having to dig my wdidle3 floppy disk out of the pile ever again but I'm glad I did.

I had a notion to check the setting on this brand new drive (zero PoH figure) just in case and was surprised to see it set to the old default of 8.3 seconds. I just set it to the max of 300 rather than try to totally disable it altogether since a 5 minute head parking time out makes a huge reduction in the rate of head parking events yet still provides the benefit it was _meant_ to provide. God knows which idiot thought ( and _still_ thinks!) an 8.3 second time out would be (is) a sensible default.

So, a word to the wise. If you choose a western digital drive, run the wdidle3 utility to check for the the 8.3 second default and adjust to the max of 300 seconds. You can try the effect of the disable setting option but don't be surprised if this has no effect.

Incidently, I spotted this high head unload count figure in one of a pair of 2TB Samsung SpinPoints in July last year where it had topped a million cycles. The other drive had only clocked a mere 160,000 cycles.

The difference must have been down to my experimenting with the power saving levels about a year earlier. Although they had both been set to the same levels, for some reason, one of them was running on a different power management regime.

I surmised that this was an aspect of their functioning that may have needed a full power down reset to invoke the programmed change so I set to the minimum power saving level for both drives and did a full shutdown power reset.

After that, the head unload cycle counters for both drives were stopped dead in their tracks (but not before the damage had already been done to the unit that had clocked in excess of a million such cycles - the other stopped at 168982 where it sits to this day).

The Million Cycle drive had started showing a climbing MZER figure before I spotted the excessive head unload count value. Although I did manage to replace it before the smart monitoring daemon started emailing error reports, subsequent testing revealed it was 'One Sick Puppy'. The excessive head unload cycles had finally taken their toll.

So, yet another word to the wise. Keep an eye on the head unload count whenever you change the power saving level settings on non Western Digital drives (the head unload time out doesn't seem to alter with such power management changes, only by use of the wdidle3 utility).

"Everone Knows" about the default 8.3 seconds timeout head unload issue with Western Digital drives right enough but they're not the only make of drive that employs head unloading as part of its built in power management regime which, unlike the WD drives can be altered through the standard power management settings so it's one 'gotcha' to look out for with _any_ make of drive.

HTH & HAND :-)

- T
- Theo Markettos
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Mar 21, 2014 12:14 AM

Something worth pointing out is that this is historical data - ie the drives are 2-3 years old. Hitachi was bought by Western Digital, but forced to sell off their consumer 3.5" production to Toshiba in 2012. So a 'Hitachi' consumer drive bought today is actually a WD, while a Toshiba 3.5" consumer drive is a Hitachi inside.

Something else interesting about the Toshiba drives is they have as standard some NAS-style features which other manufacturers try to use to differentiate (green/red/blue/black models). On the other hand, Toshiba's warranty support is peculiar (unless you buy a retail boxed version, you just have your statutory right to return to the vendor - you can't return to Toshiba).

Theo

- T
- Theo Markettos
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Mar 21, 2014 12:19 AM

FYI I ordered from PC World Business and it was 20 quid cheaper than that, plus TopCashBack. I paid for the cheaper 3-7 day delivery service (6 quid) and it arrived next day (yesterday).

Big 'cashback' banner on the PCWB site, so shouldn't be a problem. Cashback was straightforward last time around.

Theo

- J
- Johny B Good
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Mar 21, 2014 1:44 AM

I agree. I've had a 4TB unit running in the NAS4Free box these past

1821 + 1024 + 1024 hours. That's what? 3869 hours, about 161 days (just over 5 months) with no problems other than bit 10 in the PoH counter register resetting to zero a few hundred hours after being set to 1.

You can take a look in uk.comp.homebuilt for the gory detail under the thread title of "WDC WD40EFRX smart log POH anomaly (has yours clocked 2500 hours or more yet?)" that I posted on 12 Jan 2014 if you're interested.

I suppose I could RMA it but, aside from the hassle of copying the

2.6TB's worth of data elsewhere, I could land up with a replacement that it'll be just my luck to suffer infant mortality. I've got the current drive safely past that point, statistically speaking, so it just doesn't seem worth the risk just for the sake of a flaky PoH counter.

That does seem rather surprising that no mention of TLER was made. It doesn't matter to me since I don't care for the adventure known as RAID. It is after all, only a home server box I'm running where the vast bulk of the data is, when all is said and done, of the disposable kind with the much smaller critical data duplicated on different drives.

In any case, I couldn't afford the loss of capacity entailed in such an adventure (I'm already covetting the arrival of 5 and 6TB drives at less than "Bleeding Edge" prices to keep up with my ever increasing storage requirements).

Maybe, just maybe, I'll be able to afford the use of RAIDZ in a few years time. For now, I may as well enjoy the other benefit of a JBOD setup which allows capacity upgrades one drive at a time without any consideration to matching drive capacities.

I can understand your 'anxiety' about missing out on an opportunity to make good use of the cashback offer but HP seem to have been renewing their cashback offers since forever.

It kind of goes against the advice that it's better save up the cash and wait till you can purchase the whole kit in one hit rather than buy the parts piecemeal over a protracted period as your finances allow.

That was advice I gave to a customer several years ago whose plan had been to spread the purchasing of the parts for a home built PC project over the best part of a year.

If you're merely considering a month or two between server box and the first of the disk drives then doing so to avoid losing out on the cashback offer makes perfectly good sense of course. In my customer's case it didn't make any sense at all (certainly true when the purchasing is done over a 12 month period before you can even begin to start making use of the bits).

- J
- Johny B Good
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Mar 21, 2014 1:48 AM

One would hope they're not shipped to you in the 'state of undress' as pictured. :-)

- J
- John Rumm
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Mar 21, 2014 3:25 AM

I was... until when I just went and looked at the mechanism in question ;-)

Yes, sorry about that - it was a WD Green not a Hitachi Green (I also had a problem with one of those, but that was in a Sonology NAS and it was different!).

However my point remains the same really - every manufacturer has produced some real turkeys from time to time!

Yup, alas I found out about the fix (which IIRC they actually released for a different drive originally) after encountering the problem on mine.

Just had a look at the stats on the ones in my NAS (model WDC WD20EFRX-68AX9N0)

PoH on one shown as 9970, with Load Cycle Count of 0...

(though the other drive shows PoH of 1617, which given they were both installed at the same time a tad over a year ago, and have the same start stop count and power cycle count seems a little odd!)

IIRC low power use was the justification for the stupidly short time used on the "green" drives.

- J
- John Rumm
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Mar 21, 2014 3:38 AM

See my other reply to your post on this - but I seem to have the same issue on one of mine (although not bit 10 in this case)

One of those Backblaze style 180TB boxes seems ever more attractive...

- T
- Tim Watts
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Mar 21, 2014 8:21 AM

I actually witnessed my SAN doing that. Just upgraded the controller firmware then that on a couple of disks.

Then it decided to do a frantic copy from disk 6 to one of the hotspares, I kinda expected something was up and sure enough, once the copy was finished, it promptly failed disk 6.

Wonder if my new firmware flagged something that was not previously flagged as a potential death in the making...

- M
- Mark
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Mar 21, 2014 9:02 AM

I guess all FS types have their drawbacks. Personally I consider ZFS to be the best option for me.

- M
- Mark
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Mar 21, 2014 9:04 AM

True. But my personal experience of Seagate drives is not great so I would avoid them anyway[1].

- T
- Tim Streater
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Mar 21, 2014 11:23 AM

And if I have nothing on which to run DOS?

- H
- Huge
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Mar 21, 2014 12:33 PM

Fascinating, thanks.

- J
- John Rumm
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Mar 21, 2014 1:15 PM

Then temporarily connect the disk to something that can.

- J
- John Rumm
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Mar 21, 2014 1:17 PM

The general rule of thumb for changes to built in test software seems to be "make the limits wider"!

- D
- D.M.Chapman
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Mar 21, 2014 3:03 PM

Indeed. ZFS is great, but like you say, nothing is perfect :-)

We love ZFS and have many many TB of it around... it's probably the biggest thing we are missing migrating from Solaris to RedHat :-(

Darren

- J
- Johny B Good
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Mar 21, 2014 4:09 PM

That's true enough. I suppose which is best largely depends on whether the manufacturer is prepared to hold their hand up and "Do The Right Thing" or just play their customers for suckers as was the case with Fujitsu's infamous MPG range a decade or so back.

That's the problem with hidden power saving features. At a minimum (deafault no less!) of 8.3 seconds, not only does it risk premature failure, it's also rather counter-productive as well.

If the designers had had their thinking caps on when deciding the possible range of time out values they would have made sure to make it impossible to set it lower than 30 seconds (60 seconds would be an even better 'minimum').

Perhaps my 4TB RED isn't quite so unique in having a 'flipping' bit in the POH count register after all. Setting the drive for maximum spin down power savings shouldn't stop the counter even when the drive is in deepest hibernation. Rather spookily, it looks like it might also be the same bit 10 that's flipping to zero (although, thinking about it, it would have to be bit 11 for the count to get so far past the 1024 mark).

If what's happening to that drive is the same as what's happening to mine, it might not be a radioactive spec in the works as I'd just recently surmised. It's beginning to look like a firmware bug (some sort of 'race condition' going on somewhere, either a coding bug or possibly even a hardware bug).

If my guess _is_ correct, it looks to me as though the one that had clocked 9970 hours could have been installed almost a week after the 'ground hog day' one.

In view of the fact that one of those two drives remains unaffected, I'd be inclined to a hardware bug if they're both on the same controller firmware version. It might be worth checking this because if it's a firmware bug, it may be possible to download a new firmware update to fix the problem without having to invoke the RMA procedure and all that that entails.

If, like me, you've got the drive on a regular 'short test' schedule (mine's on a weekly schedule), you should be able to spot what's going on by the 'life time hours' counter values skipping backwards every so often somewhere around the 2500 mark.

If you haven't already, I'd recommend that you schedule the drive to do a 'short test' once a week to observe what's going on with the POH counter.

I hope you don't mind but, just for the record, I'd like to reiterate the situation (fingers in ears now!)...

S P O I L E R

S P A C E

SO, BETWEEN YOU AND ME, IT SEEMS WE BOTH HAVE WESTERN DIGITAL DRIVES (2tb AND 4tb UNITS) WHICH APPEAR TO BE SUFFERING GROUND HOG DAY SYNDROME RE THE POH COUNTER DROPPING BACK BY 1024 AROUND THE 2500 HOUR MARK.

Apologies for the 'shouting' but it seems that there might well be others looking for confirmation of the same peculiar fault. This might draw enough 'silent sufferers' from out of their dark closets to get the attention of Western digital's customer care department.

It was a false premise that such a short time out of 8.3 seconds could provide any net benefit in its power saving function which is why I suggested a minimum of 30 or 60 seconds would have been a more sensible choice.

The worst of it was that it only saved a few hundred mW at best (something like 2 or 3 hundred). Obviously someone must have fallaciously equated shorter time outs with greater power saving and decided to set the default to its absolute minimum. If the time out setting had been restricted to a more reasonable 30 seconds minimum, there wouldn't have been any problem ( four years or so at 30 seconds before reaching the 600,000 mark).

- J
- Johny B Good
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Mar 21, 2014 4:13 PM

Tough! Or else buddy up with someone who does.

- J
- John Rumm
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Mar 21, 2014 5:53 PM

Perhaps it was chosen so as to be short enough with modern disk "chatty" multitasking operating systems, to actually kick in from time to time. Otherwise people would claim its a non feature because the system never leaves the disk alone for long enough to allow it to activate.

Either way, daft enough on a mechanism that does not support a significantly larger reload capability.

Hmmm, could be bit 13 on mine...

10011011110010 = 9970 00011001010001 = 1617

To be fair I can't recall which went in first - but I doubt there was more than a day between them going in. (I would have pulled one of the green drives, installed the first red, and let it resynch, and then swapped the other when that was done. The resynch was probably less than

12 hours.

Disk 1

Model: WDC WD20EFRX-68AX9N0 Serial: WD-WMC300512982 Firmware: 80.00A80

Disk 2

Model: WDC WD20EFRX-68AX9N0 Serial: WD-WMC300489513 Firmware: 80.00A80

So not firmware related... quite a large jump in SN though for drives bought together...

There is only a limited amount I can do with them in the NAS alas... so I shall just have to revisit from time to time and see what the PoH is doing...

- A
- Andrew Gabriel
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Mar 24, 2014 6:18 AM

Those are nearline SATA drives, and unlikely to be representitive of what you might expect from a consumer/desktop drive. They probably have a 5 year life, but that info is usually only available to the OEM.

That's what nearline SATA is designed for (unlike consumer/desktop drives).

At this point, ZFS on Linux is probably somewhat behind in that respect than Solaris, Illumos distros (even OpenSolaris), and FreeBSD. We get a lot of our ZFS business from folks who have rolled their own ZFS on Linux servers, love the features (they become absolutely essential once you've used them), but need better stability and performance.

I don't know if this applies to all ZFS distros, but ZFS disables its performance handling of the drive write caches if it's not using the whole drive exclusively (as this could mess up the other user(s) of the drive). Having multiple zpools on a drive also counts as not having exclusive use of the drive (as they don't sync their transaction commits at the same instant).

- T
- Tim Watts
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Mar 24, 2014 10:44 AM

This is true... Although they are not insanely expensive.

Buter overally I'm mostly convinced by the WD REDs - although I will not be able to mix makes as noone else seems to do a comparable drive.

I shall definitely play - how I decide will probably be based on performance tests and how I like the admin and warnings when I manually fail a disk then replace it.

Thanks Andrew - that is well worth knowing! I do have the option to put a couple of tiny USB sticks in (there are *loads* of ports including one on the mobo) and RAID-1 them just for / and /usr