I am currently running Core on a Dell R510 server. I bought a Dell R720xd to install Scale on. Initially I installed Scale, then restored the config from the Core rig, then moved my 2 storage pools over. One of them is RaidZ2 with 9 drives and the other is RaidZ1 with 3 drives. After I did that I started to get a lot of drive errors. At that point I put the drives back in the Core server and they have been fine since. Then I ran full diagnostics on the R720xd and everything passed. My question is, when I do this a second time would it be better to build the pools from scratch in the Scale install and then just copy all my file back to them, or should I import the pools? I am NOT going to restore the config from the Core rig again, Iām going to set the Scale install up from scratch to be safe. I just want to make sure my storage pools do OK going forward. Thanks.
Drive errors across all the drives points to HBA, backplane, etc. Check and reseat cables.
Full hardware details could help with trying to diag on the R720. Verify the HBA is flashed to a current version and is āIT modeā.
Thatās what I was thinking, thatās why I ran all the diagnostics in the Lifecycle controller on it, and it passed all of them. I also reseated the PERC cars and all the cables. And it is indeed in IT mode, thankfully thatās how it came because I forget how to put one into it
in the meantime, since I did all the diagnostics on it I have created a couple of test pools and so far it is not throwing errors. Iām just wondering if it would be safer to create the new pools from scratch, since in know they have updated ZFS since I created these on Core. It would take a while to copy my backups over to it, but I would certainly do it if thatās safer than importing them.
I think you have a hardware problem on the R720. How many drives are you testing it with? It could be power where it only happens with your 12 drives?
Are you checking your current system with something like @joeschmuck Multi-Report? Maybe you can try @NickF1227 TN-Bench on the R720 and see if it can produce errors during the bench. I donāt know how long a normal run is. Just two ideas. Check health on current data drives before migrations and trying to reproduce errors on newer hardware before you migrate.
Right now 2, but Wednesday Iām going to try to fill it up when I get more caddyās. This thing does have redundant power supplies, and at least according the the lifecycle controller they both test as good. Iām thinking that the PERC card may have unseated a little during shipping. I did all the diagnostics after I reseated it and reseated all the cables.
You need to be specific. āDrive Errorsā could mean a lot of things (drive failure, zfs corruption, etc). Exactly what are the error messages?
Take a look in my signature for Drive Troubleshooting Flowcharts. This will help you identify what kind of problem you have. If it does not solve your issue, at least you can tell us what is going on.
Using the flowcharts will allow you to troubleshoot this problem faster vice messages here in the forums.
Cheers
Take a look at the documentation. As far as I understand, you cannot migrate from CORE to any Scale version because there are restrictions.
In this case, you must either install the appropriate older Scale version, migrate the configuration, and then update to the newer versions.
Alternatively, you can import the pools into a fresh Scale installation and rebuild the configuration manually.
Might be worth acquiring an LSI 9300-8i and dropping that in to see if it behaves better. Most likely going to be the HBA if not the expander. Iāve seen similar things where all was good in CORE and then errors in SCALE. Seems Linux drivers are a bit stricter. Might need some cables as I think your current HBA is SAS2.
That may where my problem came in. I imported my Core config into Scale and then moved the drives over, and of course it saw the pools because they were already set up.
I believe it is SAS2. If I end up having to get a new card and cables, is the backplate the same between the two?
Iād imagine it would work fine just at SAS2 speed.