Checksum Errors On 1st Scrub

That’ll do it.

1 Like

Is this something that is going to clear up or do I need to take action? Thanks.

Now you know how to check for the space when blockcloning is involved. Use the CLI command. The GUI reports funny numbers, as you see.

I don’t know how you are using that server but BCONE_RATIO was 1685.00x

What would cause this? Other than the scheduled snapshots, I haven’t done anything to the server but wait for the scrub to finish. I certainly didn’t mess with any block-cloning features.

Are you copying data between datasets? Read the second link or do some searching for ZFS and block cloning

Just noticing something here, quick question.

Did you intentionally make a pool mixed with 4/18 TB drives?

@mango were you running a disk benchmark? or some sort of performance analyzing thing?

I think so. Listed in OP first post. Guessing started with 4TB mirrors and kept adding mirror VDEVS to grow

And look at that BCLONE_RATIO 1685.00x

Yes, I intentionally created a mixed drive pool. I read that it was fine but TrueNAS does throw a warning on the pool because of it. I plan to swap those drives out eventually.

I didn’t run any sort of benchmark tool. I’ve just been waiting for the scrub to finish.

Data does copy between datasets from different pools via replication. I replicate about 100GB from an SSD pool to this pool here (largie). I also have a download pipeline in place that downloads files to a scratch disk pool and moves the data to the largie pool. That’s been in place for 10 months so likely not the issue. But there’s no copying happening between datasets on the same pool.

I’ll review the links provided earlier but I haven’t done anything funky here that I know of.

I deleted all snapshots and replicated data from all pool for good measure, no dice. Oh, and my scrub progress now shows 105% with ‘no estimated completion time’.

Yeah… That, sadly, happens sometimes. It is indeed still scrubbing. There was previously another post that had more details, but steady as she goes & it’ll eventually finish.

Just to clarify, the pool size and usage information continues to grow during this scrub. I have not added any new data and it now shows the pool size as 104TB and 84% used.

Is there something about the scrub causing this since no other activity is happening on the system?

Did you turn off your replication and download pipeline while all this scrubbing is going on?

Just let the scrub happen.

Scrub should in no possible way be adding any new data to the pool. But considering that usage is growing, but this is likely why the estimated time remaining is above 100%.

No clue what is adding new data to your pool - it ain’t the scrub though.

If you’re really paranoid & 10,000% sure you don’t have anything that should be actively writing during this scrub, then I’d consider shutting down all Apps, VMs, and SMB/NFS services to try to isolate wtf is going on. If it doesn’t help, I mean, you can at least turn them back on easy enough.

Beyond that? I’m out of ideas.

Easiest is just run the commands from the previous post and see what numbers have changed.

I guess I’ll just let the scrub finish and go from there. The CLI reports no time estimate but the GUI says 9 hours and keeps climbing :person_shrugging:

Oh and here’s the latest:

NAME     SIZE    CAP  ALLOC   FREE  BCLONE_RATIO  BCLONE_USED  BCLONE_SAVED
largie  52.7T    68%  36.0T  16.7T      2297.00x        23.1G         51.7T
NAME    AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
largie  16.6T  87.7T        0B    104K             0B      87.7T

The scrub finished with just 1 checksum error, so the good news is that the RAM was definitely the core issue here.

Unfortunately, the pool usage information is still showing inflated numbers, even after a reboot. The bclone values are showing the same as before.

ChatGPT calls this a “bclone explosion” :rofl:, and says RAM instability could have polluted the clone metadata. It recommends moving the data to a new dataset to clear the metadata links, which I think is a fair plan regardless of how it was caused.

I don’t have enough storage in the pool to zfs send to another dataset, so I’ll have to use my Synology to copy files back and forth.

Any thoughts? Thanks.

I think this may call for you to do Report a Bug in the TrueNAS GUI. Smile Icon on the upper right is Feedback / Report a Bug. Include the debug dump file and the link to this forum thread. BCLONE_RATIO was 1685.00x and now is 2297.88x

I would not do anything ChatGPG says unless confirmed by the experienced users on the forums.

What exact RAM are you running? To get 96GB, the support site shows this, if I have the correct MB and drop downs.

You never copied any files or folders? Never replicated one dataset to another within the pool?


I asked ChatGPT the same question and it came up with a completely different answer that has nothing to do with block-cloning.

It just makes up stuff, rather than admit it doesn’t know.


Did the latest scrub ever finish?

2 Likes

@SmallBarky Is this a GUI bug or issue with the dataset and BCLONE_RATIO? I have two sets of this RAM:

G.Skill Flare X5 48GB (2 x 24GB) DDR5-5200 PC5-41600 CL40 Dual Channel Desktop Memory Kit F5-5200J4040A24GX2-FX5

@winnielinnie Yes, the scrub finished. I frequently move data to this dataset but I have not replicated anything between datasets in the same pool.

Judging from the reactions here, it seems like I’d need a crazy workload to cause this type of bclone madness.

I am running vdev mirrors and have some regrets going that route instead of raidz2. I might take this opportunity to wipe the pool and recreate with raidz2.

Can you be more specific about what the data you are writing is, e.g. file types, sizes, etc. (if not thats ok)

Does the source system also show very high block cloning ratios?