What is Dragonfish doing to the boot drive?

I’ve had a nvme drive as boot drive for years, it was slowly decreasing the wear lvl as expected, after 8k hours it was at around 90% left, then after I installed first beta of Dragonfish it lost 30% in 2 months at a fairly consistent rate.

I know the s.m.a.r.t. indicators are not all that precise and what not, but the point stands, the indicator was moving as expected, then after Dragonfish it’s off the charts.

ignore the spikes it’s when either the truenas or monitoring host were rebooting

Any idea what’s going on ?

How do you monitor wear level of your boot nvme drive?

the graph is from librenms which gathers smartctl data each hour, I also use multireport.sh which emails me summary daily where I actually first noticed it

1 Like

Before the switch to Dragonfish, have you had your syslog written to another pool? Afaik, since Dragonfish this is not possbile anymore and is always written to system dataset.

1 Like

I’m not really sure, but I might’ve.
I checked my other truenas and it’s the same trend, since dragonfish install rapid wearout of boot drive. This is fairly unacceptable I mean this is going to kill my boot drive in few months at this rate.

Can I somehow disable the local syslog? I send the logs to remote syslog server anyway and only ever look at them there.

1 Like

Try reboot and set System Setting - Advanced - Storage - Swap Size to 0.
Then Check Swap Utilization is Zero in Reporting - Memory.

24.04 introduced new ARC memory allocations method, maybe caused by this swap issue.

and next update will remove swap

thanks I’ll give that a go

That’s not what that setting does.

But, if using dragonfish 24.04.0 you should disable lru_gen (or upgrade to 24.04.1)

I got the lru thingy in post-init, how do i upgrade to 24.04.1, i’m at 24.04.0 and Check Updates button says no updates. Do I have to switch train ?

It’s due any day now.

1 Like

Regarding the suggestion to turn off swap (Investigation of slow UI / RAM / SWAP issues) isn’t this a bad thing? I know generally people dislike swap, but also misunderstand it - a gig of memory that gets used twice a day is better sat in swap than in ram, but my point is more about what will truenas do if I swapoff -a and it happens to run out of ram?

1 Like

Which smart attribute is that percentage base on? My SSD has an attribute ‘Remaining_Lifetime_Perc’ and it has a raw value off 47. If that means it’s life is half over, I’ll be shocked. :flushed: Power_on_Hours has raw value of 488. That can’t be actual hours because it’s certainly been in there longer than that.

I can’t say for sure, my understanding is the values alone mean nothing as many manufacturers do not follow standards (ie one drive may reporting how long it think it has left in that attribute, whereas other drive may report something else related to drive health with different value scale and yet another may report nothing etc), what is important is monitoring them over long periods of time and watch for sudden spikes which could be indicative of an issue.

Sounds like Perc is percent and your drive is middle aged :wink:

Drives loop their power on hours after 65K or so. 7.5 years?

You’ll hit an OOM condition and crash.
However - this shouldn’t really be happening anyway because a large portion of RAM is supposed to be used for ARC cache. If you start hammering the memory, ARC will resize dynamically as needed - even in extreme circumstances.
See: ARC Memory Allocation on Dragonfish TrueNAS 24 and Other Issues - #11 by essinghigh

In any case, you don’t need to swapoff. You can just disable lru_gen.
echo n >/sys/kernel/mm/lru_gen/enabled
This is the fix that iX will be using, they will be defaulting new installs to not have swap as well, but disabling lru_gen is the actual fix for the unnecessary swappage seen with dragonfish .0

1 Like

I dunno. My understanding is RAW values may not be what you think they should be. Why would they make it straightforward when they could confuse you instead? I know it doesn’t increment the Power_on_Hours by actual hours. It changes much more slowly, but I haven’t tried to calculate it out.

I bought new and installed the SSD as boot drive some time in 2019. So, at most 5.4 yr.

Really wish they brought it back, I see no need to wear boot drives with this as I’ve also noticed a significant amount of writes to boot drives.

Edit: went ahead & created a suggestion, maybe a few other folks care & iX will bring the option back: [NAS-129237] - iXsystems TrueNAS Jira

3 Likes

I voted for it, but I know it will be shot down.

They already stated the reasons for fixing the syslog to the boot-pool, and they don’t want to give users the option to change its location (which we have enjoyed for the longest time.) :frowning_face:

I can’t actually find the quote anymore, but I remember posting about it in the old forums where the exact wording by iX was:

As part of the system reporting and debug improvements, system logs now exclusively write to the TrueNAS boot device.

Sadly the documentation I myself linked to quoting this has been updated & I can’t actually find any further quotes in regards to this change.

Is uhhh… that about it as far as the official reasoning goes?

Just updated both to 24.04.1 and it feels like back in ye olden times when I replaced my first hdd with ssd. UI is super snappy and actually usable on the apps screen. Fingers crossed I won’t see another 15% boot drive wear gone in a month’s time.

2 Likes