Continuing the discussion on TrueNAS Virtualization Plans for 25.04.2

The software status page is only good if people read it - and they are NOT given a link in a warning when they click to upgrade i.e. when they click, they don’t get a pop-up that says “This version of TrueNAS is not yet rated as ready for use by Conservative risk-averse users because it contains ‘experimental’ technology that is not yet ready for production use. Do you still want to upgrade?” with the normal checkbox they have to tick if they want to click the “Yes - Proceed” button.

Nor (assuming that they say yes to that) do they get a further prompt “You are running VMs - after this upgrade they will stop working. Do you want to proceed with the upgrade?” also with the normal checkbox they have to tick if they want to click the “Yes - Proceed” button.

IMO these are the level of warnings that should be given if there is a risk that the upgrade is unsuitable for them.

And I suggested exactly the same approach for the DF → EE upgrade for users that had TrueCharts apps (and indeed curated a list of Forum posts at that time from people who upgraded because they had no idea it would screw up their apps).

But iX didn’t listen, and they repeated yet again the same unthinking approach which put users production systems at needless risk.

You have NOT however responded to my suggestion that it should never have been needlessly removed in the first place. There was literally no need to remove it - the two technologies co-exist perfectly nicely because they both use KVM.

Perhaps it was just thoughtlessness - though if you had listened and considered properly the previous comments and learned from them, you wouldn’t have done this. But if it wasn’t thoughtlessness, then unfortunately it has the appearance that you wanted to force users onto the Incus virtualisation so that they acted as guinea pigs for the technology - and that same reasoning is also an rational explanation as to why you also don’t give more blunt warnings.

Since both parallel running and better warnings were both explicitly suggested to you regarding EE app migrations, the appearance is that you don’t listen and don’t learn. If this appearance is wrong, then please explain why.

And IMO if you don’t want this to be the conclusion that gets drawn, then you need to:

  • Be more open to listening to justified criticism from the community and to learn from it; and / or
  • Be willing to involve the community in decision making on technology adoption before you develop and deliver risky functionality (so that you get told explicitly by the community to deliver parallel running and NOT to rip out the libvirt functionality); and / or
  • Be much more risk averse on users’ behalfs and by doing so re-earn their trust that you will not put their production services at risk without them having been explicitly and bluntly warned about it.

(And please let’s not revisit the issue of your marketing folks falsely calling a marketing survey to non-technical users an invitation to join a “user technical advisory council” or similar - if you want to have a community technical advisory council, then form one with people who can understand the different competing pressures you face in real life as developers and who can rationally argue for a different development approach which will avoid these issues.)

Edit: What I will add was a comment from the most recent T3 where you said that you wanted to make the UI technology independent so that you can swap technologies underneath without there being any impact on users. Unfortunately:

  1. iX doesn’t have a good track record (and maybe no track record) of achieving painless technology transitions.
  2. Such a suggestion implies adopting a lowest common denominator approach to functionality (rather than delivering the best technical solution possible with a specific technology)
  3. This approach also implies the ability to determine the lowest common denominator across technologies that are not yet used and even those that might not yet exist

So why is this approach better than delivering the best possible solution with e.g. libvirt and then later delivering the best possible solution with Incus, making it painless NOT by adopting a lowest common denominator, but instead by adopting a strategy of parallel running and user triggered migrations that are NOT tied to being run at the same time as the TrueNAS version upgrade?

3 Likes

When a new version (fish) is BETA or RC.1 it indicates the whole release is still being validated. Even when it gets to Release, we still recommend using the software Status advice based on a self-assessment.

Software quality improves with large numbers of users/system configs and bug fixes which come in the form of software updates. The nature of TrueNAS software is that it runs on an enormously diverse set of hardware with very diverse use-cases. User skills and expectations also vary widely. Nothing substitutes for mass testing. 25.04 has 80,000+ systems and so we have found most of the significant bugs; some will be fixed in 25.04.2. We decided to classify the lack of libvirt KVM as a bug and fix it in 25.04.2 version.

My guess is there will be some hot patches after that new update is tested. Only after testing in the field does the software status page get updated.

When we apply “experimental” to a feature. It does mean that the rest of the system is operating as expected. If the feature (instances) is not used, we expect no side effects. We think about 85% of Fangtooth users are not using instances.

We structure the software so that where features do get used they have low impact on other aspects of the system. I have not heard of Instances causing issues with other features.

HOWEVER, if using an experimental feature like instances, there is an expectation of more than normal bug fixes and so the user may have to do more software updates and may have to do some work at the CLI level to address specific issues. So, this expectation is not aligned with a Conservative or even General user.

“Experimental” also flags it is not recommended for long term production. For example, we would use that warning if we were not sure how easy future updates will be. Until we have the full list of issues we need to fix, we don’t know how major future changes will need to be.

We always attempt to migrate from previous versions (e.g the transition of Apps from K8s to Docker), but we clearly can’t claim 100% success. Data in host paths has generally always migrated.

Regarding mature services, our primary goal is make sure that a conservative user using mature services who follows our advice, should have relatively few issues. They may skip a version…

An early adopter that tries out each software version and update will have more issues, but will help us improves the software more quickly.

In future, we’d like to provide that update advice in the UI…but ironically, it will only help future users not current users getting to that state. Current users have to read the software status page.

I agree :100: with everything that you said here. Realistically you can never deliver bug-free software.

So it wasn’t a mistake to remove it, but rather a bug as a consequence of its removal - and apparently not what the community might actually like to hear i.e. an admission that:

  1. It could have been left in in the first place to allow parallel running and users migrating at a time of their choosing; and
  2. It should have been left in in the first place to allow parallel running, and thus reduce risk and make life easier for virtualisation users; and
  3. It (or similar) will be left in to allow parallel running for at least one major release whenever similar situations occur in the future.

If that is the case, then why am I not being recommended to move to 25.04.1 (since I don’t user virtualisation and don’t want to use LXCs)?

I don’t have a pressing need to upgrade, but equally I am pretty confident that I would upgrade successfully, and equally confident that I could revert back again if I had problems. So if you think that it is stable, and I think it is stable enough to upgrade to despite being very risk averse, then why are you not recommending me to go and why are you applying the same criteria to people who do use virtualisation as to those that don’t?

However from 25.04 (Fangtooth) Version Notes | TrueNAS Documentation Hub :

Manual Migration Required

Due to configuration incompatibilities between the previous libvirt implementation (TrueNAS 24.10 and earlier) and Incus in TrueNAS 25.04, existing VM configurations do not transfer automatically during the upgrade. However, TrueNAS retains storage zvols, allowing you to manually recreate the previous VM configurations and get them back online.

So

  1. “We always attempt to migrate from previous versions” is not true in this case; and
  2. Perhaps I mis-understood, but aren’t there driver changes needed too which are not even mentioned? e.g. see GUIDE: How to install/migrate Windows VM to Fangtooth/Incus using Virtio drivers

I note the language used here. “In future, we’d like to provide that update advice in the UI” is not the same as “In future, we will provide that update advice in the UI”.

And of course, had you listened to the similar complaints/advice/discussions given when EE upgrades were happening, the better “update advice” could indeed have been provided for the EE → FT update - but you didn’t and they weren’t.

1 Like

Perfectly good suggestion, but it can only happen in future software releases… unless we also invent a time machine. We created the software status page because it helped address the issues of previous, current and future software.

The place to put this is in Feature Requests and get upvotes for people who want it.

No time machine needed as I suggested exactly this 9 months+ ago, and had you listened you could have delivered it.

As for suggesting this should go in feature requests, surely doing something that will protect users production services from being screwed up is something that shouldn’t be equated to e.g. Extend SCALE SNMP or AI Image Generation App Request or SUGGESTION: Don't use "folder" icons for datasets ?

And what exactly are the statistics for Feature Requests - TrueNAS Community Forums being accepted and delivered? On this page there are 350 requests and of these 21 are shown as “accepted” or “implemented” - so that is a 6% acceptance rate.

And on Topics tagged feature-request there are a further 16 that are not being considered because they are not in the correct category and so cannot be voted upon.

1 Like

These are the types of reasons why we labelled it as Experimental…

I think the situation is that in some cases driver changes were needed… I don’t know what %.

No, we want all Feature Requests to go here and get user votes. It provides a systemic process and allows us to be held accountable.

We do not have a process for reviewing all comments on the forums.

We do need to encourage all the Community to use the tools we provide. If we discourage people from using the software status page, we will have many more issues. So, I have to be boring and refer to it all the time.

…which you ignore if you like.

You’re accountable to nobody. 70+ votes for a feature request? Nothing requires you to respond.

So you confirm that your statement “We always attempt to migrate from previous versions” isn’t true for “experimental” releases even though without it anyone with virtual machines who migrates ends up with them broken?

Yet this is NOT mentioned at all in the Release Notes.

Don’t make me laugh. What accountability is there actually for iX not implementing something in the list? Do I need to provide a definition of the word “accountability” so that we can see whether it is met or not?

See Hitchhiker’s Guide to the Galaxy where Earth dwellers were encouraged to visit the intergalactic planning office on Alpha Centauri where the plans for the hyperspatial bypass were available for review. It wasn;t the planning officer’s fault that earth didn;t have the hyperspatial broadcast receivers needed to receive the advisory notices.

The reason you have to be boring and refer to it all the time is exactly because it is in such an obscure place and because it is not linked to sufficiently clearly at the actual point of upgrade. Make the automated warnings blunt and explicit and in people’s faces at the point of upgrade and you can stop being repetitive and boring. (And so can I !!!)

No one is suggesting that you should reduce your efforts to keep users informed - what we are suggesting is that you add more appropriate methods that will drastically reduce the instances of upgrades breaking people’s production systems!!

2 Likes

@Captain_Morgan

I am really disappointed at the direction this discussion is taking - I would much rather be pushing on an open door than having to attempt to batter down a heavily armoured defensive position - and I suspect that the other community participants feel the same way.

I want to feel good about my community involvement, and not feel that I have to 1) get out the megaphone in order to get past fingers-in-the-ears; and 2) nit-pick to draw attention to counter arguments made that are factually incorrect or have no logic; and 3) repeatedly draw attention to points that are pointedly ignored and never addressed.

To be honest, I have no idea why the iX folks are unwilling to listen properly to well-founded suggestions about how they could do better and be even more successful? We are literally trying to help you avoid unhappy users who tell the world that their TrueNAS upgrade trashed their production system!!

3 Likes

We currently don’t think 25.04.1 is as trouble free as 24.10.2.2, but it does have additional features.

  • 24.10.2 has had about 6X more field testing (measured in machine months).

  • If you update now to 25.04.1 , we will be recommending 25.04.2 in short order. (1 month)

  • The list of outstanding bugs on 25.04.1 is 10X longer than 25.04.2.2

We allow you to have your own opinion and it might be right, but we take our recommendations very seriously. We don’t yet see a compelling reason to update for a Conservative user ( who wants to update infrequently).

We have deliberately chosen a 6 month release cycle, but with the expectation that in some cases, we might recommend that some users skip a whole version. In our view conservative users are OK with annual updates… they just want trouble-free.

In some ways we have done want you want… run the libvirt and incus virtualization in parallel. We just haven’t done it in the way you wanted.

  • Electric Eel runs libvirt and continues to

  • Fangtooth runs incus as experimental

Users can choose either. We recommend electric eel for production. There is no-one asking for libvirt and incus VMs on the same TrueNAS.

Our plan was to unify in Goldeye. We are bringing that forward a bit with 25.04.2

It’s a saturday afternoon and I am listening and responding. I’ve included a lot of data that you don’t have, but we do, which influences our decisions.

We have listened and changed our plans for 25.04.2.

However, we do need help promoting the use of the software status page to minimize issues. I will push back when users complain and don’t use that resource properly. That is our current vehicle for minimizing issues… and we want the forums to make this clear.

No - in my view there is no way that what you did was what I have suggested.

You didn’t need to remove libvirt from Fangtooth 25.04.0 or .1 (or indeed RCs or betas or alphas or nightlies), and you haven’t really explained why:

  1. iX took that decision to do it when it doesn’t appear to have been necessary;
  2. iX took that decision despite all the previous technology deliveries that caused pain for users;
  3. iX took that decision after admitting under pressure that it could have run Kubernetes and Docker in parallel and made migration of apps less sudden-cut-over (including complete loss of TrueCharts functionality) thus demonstrating having listened to nothing, and learned nothing and had the same disregard for users production difficulties after upgrades.

In one simple word: WHY!!!

You made a big deal a few minutes ago about how iX should be accountable, so please live up to that and be accountable right here, right now, about this.

4 Likes

I just came here to say that I have had a case of ongoing whiplash with the continued structural changes in both the virtualization and apps/containers capabilities, and have decided to spin up a Proxmox machine for all of that, leaving my TrueNAS machine to focus on storage.

3 Likes

You are NOT listening now - we are saying that you should think differently and avoid the need for users to complain so vociferously that you have to change your plans. That is an admission of failure, and NOT something to crow about.

If you are listening now, then stop prevaricating and ignoring and dissembling and giving factually incorrect statements, and just agree that in future:

  • iX will take all reasonable efforts to deliver new technology in parallel with existing technology

  • iX will warn users at the time of actual upgrade and in a blunt way that cannot possibly be ignored or mis-interpreted if the upgrade is likely or possible to cause production functionality to stop working.

    (If you have implemented parallel running then the need for these should only be when you remove the old technology one or more major versions later and the user hasn’t migrated their containers etc. for the entire period they were running the current version.)

1 Like

We did not want to support users setting up VMs in either pathway… we have no idea what the corner cases will be.

It would have required more QA and added a huge amount of schedule risk.

Fangtooth would have more bugs

The UI (and docs) will be more complex

We did not have the tools to migrate from one to the other easily

The Migration to Goldeye will be more complex

We decided to label it experimental and keep production VM uses cases on Electric Eel. We think its safer (and still recommend that).

Really? Significantly more complex? I don’t believe this for a second; does anyone else from the community believe this either?

As a programme and project manager of 3 decades, the answer to this is NOT to be driven by arbitrary release dates in April / October. Instead deliver it when it is ready.

But the definition of “ready” would have been different i.e. that

  1. When the experimental functionality was included but not in use, it had zero impact on any other service inc. libvirt.
  2. When the experimental functionality was included and in use, it still had zero impact on any other service inc. libvirt.

The actual functionality of Incus would require less QA because it is experimental and not for production use. So total QA effort might have been less.

And then the bug fix releases 25.04.1 could deliver bug fixes for everything else and functiional improvements/bug fixes for Incus.

Keeping production VM use cases on EE means exactly as I stated earlier - that a single Grid on the Software Status page is insufficient because there are now two distinct user groups - those with libvirt VMs and those without - and the recommendations for each are different. QED. End of.

But if the libvirt functionality hadn’t been removed, existing virtualized users could just continue as is - you wouldn’t need to have two separate classifications at all.

We’d be stupid if we didn’t review this list when we plan a new software version.

Software versions are planned about 12 months in advance… so there is a delay. Some will not make the cut.

The list is there and getting votes, they are visible, so we are accountable. We’ve executed on many already.

If “accountable” means only that the whole community can see that you’re ignoring them, I guess so.

1 Like

Visibility isn’t the same as accountability. Apparently I DO need to go and find a dictionary definition for accountability in order to demonstrate the difference. {sigh}

And as I have already pointed out you have executed on as little as 6% - which is not sufficient to meet my own personal definition of “many” - do I need to add finding a dictionary definition for the word “many” in order to prove this point too?

1 Like