How old are the disks in your NAS?

blackstrat@lemmy.fwgx.uk · 1 day ago

How old are the disks in your NAS?

deadcatbounce@reddthat.com · edit-2 23 hours ago

Don’t fill copy-on-write fs more that about 80%, it really slows down and struggles because new data is written to a new place before the old stuff is returned to the pool. Just sayin’.

I wouldn’t worry if you’re backed up. The SMART values and daemon will tell you if one is about to die.

detonator9798@lemmy.one · 1 day ago

I have a SAMSUNG HD103UJ 1TB I’m about to retire, but not because it’s bad, only replaced with other bigger HDD’s. It’s a bit sad to be honest, these Samsungs is rock solid!

142578 hours (16 years)! 🤘

walden@sub.wetshaving.social · edit-2 23 hours ago

Wow. W Bush was president (or Obama depending on month).

Edit: yep, W. Bush. Oct 6th 2008, so Obama hadn’t even been elected yet.

mbirth@lemmy.ml · 1 day ago

According to my Synology:

WD40EFRX-68WT0N0 - 86,272 hours
WD40EFRX-68WT0N0 - 86,207 hours
WD40EFRX-68N32N0 - 34,417 hours
ST4000VN006-3CW104 - 10,054 hours

AMillionMonkeys@lemmy.world · 1 day ago

According to my Synology:

Where are you finding this data? It’s not Info Center -> Storage…

SandroHc@lemmy.world · 1 day ago

Look into the S.M.A.R.T. reports of each drive.

On Synology DSM 7.x: Storage Manager › HDD/SSD › Health Info › S.M.A.R.T. › S.M.A.R.T. Attribute › Details › Power_On_Hours

mbirth@lemmy.ml · 1 day ago

The power-on hours are shown directly on the Health Info page, no need to click through to the SMART attributes.

MNByChoice@midwest.social · 1 day ago

I tend to buy two at a time. Some are months old, others three years old.

Professionally, I have seen drives over 10 years always on at low utilization without issue. (The data was easily replaceable.)

crammed in to my case in a hideous way

Heat is a killer. Check them regularly.

blackstrat@lemmy.fwgx.uk · 23 hours ago

They’re in a drafty garage. This time of year I keep them spinning to stop them freezing 🤣

jet@hackertalks.com · 1 day ago

3:2:1 - Cattle not pets - If your data is backed up in multiple sites, the death of one site shouldn’t overwhelm you, and give you time to recover.

If your primary site drives are getting above their designed lifetime, rotate them out, sure - but they could be used as part of the backup architecture else where (like a live offsite sync location with enough tolerance for 2 disk failures to account for the age).

3 copies of your data; 2 types of media; 1 copy offsite.

taiidan@slrpnk.net · 1 day ago

I mean if it’s homelab, it’s ok to be pets. Not everything has to be commoditized for the whims of industry.

TechnicallyColors@lemm.ee · 1 day ago

“Cattle not pets” in this instance means you have a specific plan for the random death of a HDD (which RAIDZ2 basically already handles), and because of that you can work your HDDs until they are completely dead. If your NAS is a “pet” then your strategy is more along the lines of taking extra-good care of your system (e.g. rotating HDDs out when you think they’re getting too old, not putting too much stress on them) and praying that nothing unexpected happens. I’d argue it’s not really “okay” to have pets just because you’re in a homelab, as you don’t really have to put too much effort into changing your setup to be more cynical instead of optimistic, and it can even save you money since you don’t need to worry about keeping things fresh and new.

“In the old way of doing things, we treat our servers like pets, for example Bob the mail server. If Bob goes down, it’s all hands on deck. The CEO can’t get his email and it’s the end of the world. In the new way, servers are numbered, like cattle in a herd. For example, www001 to www100. When one server goes down, it’s taken out back, shot, and replaced on the line.”

~from https://cloudscaling.com/blog/cloud-computing/the-history-of-pets-vs-cattle/

taiidan@slrpnk.net · 9 hours ago

I get that. But I think the quote refers to corporate infrastructure. In the case of a mail server, you would have automated backup servers that kick-in and you would simply pull the rack of the failed mail server.

Replacing drives based on SMART messages (pets) means you can do the replacement on your time and make sure you can do resilvering or whatever on your schedule. I think that is less burdensome than having a drive fail when you’re quite busy and being stressed about having the system is running in a degraded state until you have time to replace the drive.

TechnicallyColors@lemm.ee · edit-2 9 hours ago

I don’t think ‘cattle not pets’ is all that corporate, especially w/r/t death of the author. For me, it’s more about making sure that failure modes have (rehearsed) plans of action, and being cognizant of any manual/unreplicable “hand-feeding” that you’re doing. Random and unexpected hardware death should be part of your system’s lifecycle, and not something to spend time worrying about. This is also basically how ZFS was designed from a core level, with its immense distrust for hardware allowing you to connect whatever junky parts you want and letting ZFS catch drives that are lying/dying. In the original example, uptime seems to be an emphasized tenet, but I don’t think it’s the most important part.

RE replacements on scheduled time, that might be true for RAIDZ1, but IMO a big selling point of RAIDZ2 is that you’re not in a huge rush to get resilvering done. I keep a cold drive around anyway.

SayCyberOnceMore@feddit.uk · 22 hours ago

Yep, numbering’s the key.

When you create NAS01, you know there’s going to be a NAS02 one day

Talaraine@fedia.io · 1 day ago

When one server goes down, it’s taken out back, shot, and replaced on the line.

And then Skynet remembers…

nesc@lemmy.cafe · 23 hours ago

4x8tb they had 8.5k hours on them when I got them four years ago, they work non-stop since.

NeoNachtwaechter@lemmy.world · 1 day ago

6 years old and running perfectly fine.

I have 5 WD RED disks in a RAIDZ1 config. In the first year I was experimenting with the sleep or spindown options. Then I have read that drives live longer if they run constantly. Now they are spinning 24/7.

The additional SSD has broken and been replaced 2x during these years.

blackstrat@lemmy.fwgx.uk · 1 day ago

Yeah flat out spinning is definitely better for reliability.

walden@sub.wetshaving.social · edit-2 1 day ago

I’m glad you asked because I’ve sort of been meaning to look into that.

I have 4 8TB drives that have ~64,000 hours (7.3 years) powered on.
I have 2 10TB drives that have ~51,000 hours (5.8 years) powered on.
I have 2 8TB drives that have ~16,800 hours (1.9 years) powered on.

Those 8 drives make up my ZFS pool. Eventually I want to ditch them all and create a new pool with fewer drives. I’m finding that 45TB is overkill, even when storing lots of media. The most data I’ve had is 20TB and it was a bit overwhelming to keep track of it all, even with the *arrs doing the work.

To rebuild it with 4 x 16TB drives, I’d have half as many drives, reducing power consumption. It’d cost about $1300. With double parity I’d have 27TB usable. That’s the downside to larger drives, having double parity costs more.

To rebuild it with 2 x 24TB drives, I’d have 1/4 as many drives, reducing power consumption even more. It’d cost about $960. I would only have single parity with that setup, and only 21TB usable.

Increasing to 3 x 24TB drives, the cost goes to $1437 with the only benefit being double parity. Increasing to 4*24TB gives double parity, 41TB, and costs almost $2k. That would be overkill.

Eventually I’ll have to decide which road to go down. I think I’d be comfortable with single parity, so 2 very large drives might be might be my next move, since my price per kWh is really high, around $.33.

Edit: one last option, and a really good one, is to keep the 10TB drives, ditch all of the 8TB drives, and add 2 more 10TB drives. That would only cost $400 and leave me with 4 x 10TB drives. Double parity would give me 17TB. I’ll have to keep an eye on things to make sure it doesn’t get full of junk, but I have a pretty good handle on that sort of thing now.

fmstrat@lemmy.nowsci.com · edit-2 1 day ago

As someone who runs 3 large arrays with 8TB, 16TB, and 21TB drives respectively, know that:

RAIDZ1 will cause tons of fear when a disk fails if you’re used to Z2. Don’t change.
When a disk goes, the larger the disk, the slower the rebuild time, and the more taxing it is on the other disks. With Z1, if another fails during the rebiluild, you’re SOL.

Less disks is simpler, but more disks is safer. 6 disks is the perfect sized array IMO. If you don’t need more space, I’d buy a 2TB hot spare and call it a day. But if space is a concern, Z2 with 4 disks.

Edit: Those three arrays mirror each other in different locations, and the fear was still there when the Z1 had an issue. Mostly due to the headache, but still.

blackstrat@lemmy.fwgx.uk · 1 day ago

The reason I went RAIDZ2 in my current setup was because of the number of disks increasing the chance of multi failures. But with fewer disks that goes down. I’m not at all worried about data loss, as I said I have good backups so I can always restore. So if the remaining disk dies during a rebuild, that’s unfortunate, but it only affects my uptime, not my data.

fmstrat@lemmy.nowsci.com · 19 hours ago

Hate to be that guy, but those maths aren’t mathing.

Less drives does not equal less chance of multiple failures. The statistical failure rate of one drive has no impact on another. In fact, analysis of Backblaze’s data showed that larger drives were more prone to failure (platter density vs platter count).

blackstrat@lemmy.fwgx.uk · 14 hours ago

Who has more chance of a single disk failing today: me with 6 disks, or Backblaze with their 300,000 drives?

Same thing works with 6 vs 2.

fmstrat@lemmy.nowsci.com · 5 hours ago

Backblaze of course, but we aren’t talking about the probability of seeing a failure, but of one of your disks failing, and more importantly, data loss. A binomial probability distribution is a simplified way to see the scenario.

Let’s pretend all disks have a failure rate of 2% in year one.

If you have 2 disks, your probability of each disk failing is 2%. The first disk in that array is 2%, and the second is 2%. If 2 disks fail in Z1, you lose data. This isn’t a 1% (half) chance, because the failure rate of one disk does not impact the other, however the risk is less than 2%.

So we use a binomial probability distribution to get more accurate, which would be .02 prob in year one with 2 trials, and 2 failures making a cumulative probability of .0004 for data loss.

If you have 6 disks, your probability of each disk failing is also 2%. The first disk in that array is 2%, the second is 2%, so on and so forth. With 6 disk Z2, three must fail to lose data, reducing your risk further (not to .08%, but lower than Z1).

So with a binomial probability distribution, this would be .02 prob with 6 trials, and 3 failures making a cumulative probability of .00015 for data loss.

Thats a significantly smaller risk. The other interesting part is the difference in probability of one disk failing in a 6 disk array than a 2 disk array is not 3x, but is actually barely any difference at all, because the 2% failure rate is independent. And this doesn’t even take into account large disks have a greater failure rate to start.

I’m not saying mirroring two larger disks is a bad idea, just that there are tradeoffs and the risk is much greater.

vegetaaaaaaa@lemmy.world · 1 day ago

$ for i in /dev/disk/by-id/ata-WD*; do sudo smartctl --all $i | grep Power_On_Hours; done
  9 Power_On_Hours          0x0032   030   030   000    Old_age   Always       -       51534
  9 Power_On_Hours          0x0032   033   033   000    Old_age   Always       -       49499

𝕽𝖚𝖆𝖎𝖉𝖍𝖗𝖎𝖌𝖍@midwest.social · 1 day ago

Once a year or so, I re-learn how to interpret Smart values, which I find frustratingly obtuse. Then I promptly forget .

So one’s almost 6 y/o and the other is about 5½?

vegetaaaaaaa@lemmy.world · edit-2 11 hours ago

One has a total powered-on time of 51534 hours, and the other 49499 hours.
As for their actual age (manufacturing date), the only way to know is to look at the sticker on the drive, or find the invoice, can’t tell you right now.

blackstrat@lemmy.fwgx.uk · 23 hours ago

Seagate “raw read error rate” is a terrifyingly big number if everything is hunky dory.

felbane@lemmy.world · 1 day ago

Ultimately it’s a matter of personal choice and risk tolerance.

The Z1 will be simpler and have larger capacity, but if you have a drive fail you’ll need to quickly get it replaced or risk having to rebuild/restore if the mirror drive follows the first one to the grave.

Your Z2 setup right now can have two drives fail and still be online, and having a wider spread of power-on hours is usually a good thing in terms of failure probability.

I manage a large (14,000±) number of on-site RAID1 arrays in various environments and there is definitely a trend for drives shipped at the same time to fail at roughly the same time. It’s common enough that we often intentionally swap drives out before shipping a new unit to the customer site.

On my homelab, I’m much more tolerant of risk since I have trust in my 3-2-1 backup solution and if my NAS goes down it’s not going to substantially affect anything while I wait for a drive replacement.

tobogganablaze@lemmus.org · edit-2 1 day ago

My first batch (6x 20TB) is at 8611 hours.

The 2nd batch (3x 20TB) is at 5612 hours

blackstrat@lemmy.fwgx.uk · 1 day ago

So a year ago you spent over 3k on disks?

tobogganablaze@lemmus.org · 1 day ago

Yeah and a new 12 bay NAS to put them all in. I had a 2 bay that I expanded with a bunch of USB drives before, but that was starting to get really messy. Basically took my entire thirteenth salary.

thejml@lemm.ee · edit-2 1 day ago

Mine are 3x 27k and 1x 47k. I just started replacing them… not because they’re old or have any issues, just because they’re becoming too small. Going from 4 to 8 tb disks and transferring the old ones to an external raid enclosure for backups.

Actually brings up a question I had… what do people think about refurbished drives for a NAS?

Jondar@lemmy.world · 1 day ago

I just went all refurbished on my new drives. Time will tell. Oldest one has about 8 months runtime on it.

I went with 5x recertified Seagate exos 20tb, and one recertified ironwolf pro 20tb.

thejml@lemm.ee · 1 day ago

Nice, we’ll all look out for an update in a year!

I try to mix brands and lots (buy a few from one retailer and some from another). I used to work for a storage/NAS company and we had many incidents when we’d fill a 12 or 24 drive raid with drives right from the same order and had multiple drives die within hours of each other. Which isn’t usually enough for replacement/resilvering.

SayCyberOnceMore@feddit.uk · edit-2 22 hours ago

Yep, seen a similar thing with servers…

A few years ago I built up a system with ~ 20 servers. Powered them all up and did all the RAID initialisation (RAID5 across 6-8 disks per server IIRC)

One server basically needed all it’s disks replacing and some of the others needed a disk or 2 replaced - within a month!

Since replacing those disks and building all those arrays I’m happy to build a NAS / server, let it bed-in for a while and if nothing fails I’ll just keep powering up & down my NAS as needed and I’ll run the drives until they die…

ocean@lemmy.selfhostcat.com · 1 day ago

Second hand so I’m sure ancient.

sugar_in_your_tea@sh.itjust.works · edit-2 1 day ago

About 10k power on hours. That’s honestly a little surprising since I’ve had them for 7 years or so, but it’s only been on 24/7 for the last year or two (used to just turn on when watching a movie or something).

From those hours, I should expect a few more trouble free years.

My OS drive is >30k hours since it used to be my desktop boot drive (tiny 120GB SATA SSD). I’ve been thinking about upgrading to NVME, since my desktop NVME is getting a little full (500GB), and it could also make for a nice cache. It’s nowhere near dying though, with ~16TBW, so I’m in no hurry.