Heads up for a data corruption bug in ZFS, few versions affected, might have started at 2.1.x, but many reports on 2.2.x

vitzli-mmc@alien.top · 1 year ago

Heads up for a data corruption bug in ZFS, few versions affected, might have started at 2.1.x, but many reports on 2.2.x

bobj33@alien.top · 1 year ago

It looks like the source of the bug is identified and fixed.

https://github.com/openzfs/zfs/pull/15579/commits/679738cc408d575289af2e31cdb1db9e311f0adf

[2.2] dnode_is_dirty: check dnode and its data for dirtiness #15579

Is-Not-El@alien.top · 1 year ago

Ah heck I just updated my NAS VM to FreeBSD 14.

Anyone running FreeBSD 14, make sure vfs.zfs.bclone_enabled is set to 0.

grahamperrin@alien.top · 1 year ago

… FreeBSD 14, …

Not only 14 …

SlyFox125@alien.top · 1 year ago

Anyone have any ideas for checking for this issue in existing backups?

vitzli-mmc@alien.top · 1 year ago

Script at #15526 can somewhat check for a hole in the first 4K bytes of the file, but gives false positives, if script produces a syntax error in the last line - replace /bin/sh with /bin/bash or whatever the location of the BASH is.

Used it on part of my collection, found several zeroed-out files, but I strongly suspect they were full of zeroes before they hit ZFS, at least some files from 2009 were full of zeroes. Script gave multiple false positives (and one true positive on fully-zero file) on .iso files, suspect that they miss boot record.

SlyFox125@alien.top · 1 year ago

Thank you. I’ve been keeping an eye on the thread to see if any consensus emerges regarding any better understanding of how the corruption manifests itself. It appears there is a possibility that a portion could be zeroed out and then new data written over it, giving the impression that all is well, but where the file is obviously still corrupt. It seems the best method is to have a list of checksums from known good files, but that obviously requires previous action that may or may not have occurred (obviously, most people never anticipated this and thus have no such list).

vitzli-mmc@alien.top · 1 year ago

I was able to copy zipped 400GB zipped dump from the torrent, checksum it beforehand and after the move, no failures so far, at least at the beginning

SlyFox125@alien.top · 1 year ago

It appears the issue arises more when a ZFS file system is being used in a primary nature; e.g., reading and writing to it directly as a part of some active operation. Are you using it as a backup/archive, or as a primary partition where your OS and applications are writing to it directly? If it’s the former, it would seem you’re much more unlikely to encounter the issue.

EchoGecko795@alien.top · 1 year ago

modinfo zfs | grep version

To quickly get the version installed.

3-2-1-backup@alien.top · 1 year ago

zfs --version also does the trick.

tatiwtr@alien.top · 1 year ago

That did not work for me on ubuntu, but did on my debian/proxmox distribution

proxmox:

zfs-0.8.3-pve1

zfs-kmod-0.8.3-pve1

ubuntu:

version: 0.6.5.6-0ubuntu26

srcversion: 0968F94158D646E259D86B5

vermagic: 4.4.0-142-generic SMP mod_unload modversions retpoline

looks like im using an ancient version and am ok?

gabest@alien.top · 1 year ago

I also use a version close to that, 0.something. See absolutely no reason to upgrade. It just works. It’s the version that has the fast scrub already.

dr100@alien.top · 1 year ago

This is why you ALWAYS need INDEPENDENT backups. You can think all day long about detecting bitrot, and how well you’re protected against X drive failures but then something comes from the side and messes up your data in a different way than you’ve foreseen.

henry_tennenbaum@alien.top · 1 year ago

Wait. Are you trying to say that raid is not a backup?

Heads up for a data corruption bug in ZFS, few versions affected, might have started at 2.1.x, but many reports on 2.2.x

Heads up for a data corruption bug in ZFS, few versions affected, might have started at 2.1.x, but many reports on 2.2.x

some copied files are corrupted (chunks replaced by zeros) · Issue #15526 · openzfs/zfs