Why such a low performance by my SSD pool?

Hi there,
I try to do something very straight forward - using an HDD pool for my Samba and NFS shares and run some apps (k3s) and VMs on another pool built only on SSDs.

It is good to mention that, the SSDs are attached to a HBA card in IT mode, here it is:

root@sofx1010nas3012:~# lspci  | grep SAS2008
02:00.0 RAID bus controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
root@sofx1010nas3012:~#

More about it:

root@sofx1010nas3012:~# storcli show
CLI Version = 007.1504.0000.0000 June 22, 2020
Operating system = Linux 6.6.20-production+truenas
Status Code = 0
Status = Success
Description = None

Number of Controllers = 0
Host Name = sofx1010nas3012.home.lan
Operating System  = Linux 6.6.20-production+truenas
StoreLib IT Version = 07.1503.0200.0000
StoreLib IR3 Version = 16.12-0


root@sofx1010nas3012:~#

Here I have collected some info about the disks:

root@sofx1010nas3012:~# zpool status
  pool: zpool-hdd-01
 state: ONLINE
  scan: scrub repaired 0B in 03:18:58 with 0 errors on Sun May 26 03:19:01 2024
config:

        NAME                                      STATE     READ WRITE CKSUM
        zpool-hdd-01                              ONLINE       0     0     0
          mirror-0                                ONLINE       0     0     0
            e2d996f4-15e8-11eb-b46a-3cecef205174  ONLINE       0     0     0
            88488f18-d2c3-11ec-9212-3cecef205174  ONLINE       0     0     0
          mirror-1                                ONLINE       0     0     0
            507cf56f-cc77-44b5-81b1-4a27f77ff407  ONLINE       0     0     0
            5613d2ba-d10d-11ec-a10c-3cecef205174  ONLINE       0     0     0

errors: No known data errors

  pool: zpool-ssd-01
 state: ONLINE
  scan: scrub repaired 0B in 00:15:05 with 0 errors on Sun May 26 00:15:07 2024
config:

        NAME                                      STATE     READ WRITE CKSUM
        zpool-ssd-01                              ONLINE       0     0     0
          mirror-0                                ONLINE       0     0     0
            0d8898f7-2ed1-7142-a11a-5485a19630e4  ONLINE       0     0     0
            163c3909-746d-f947-88e4-b167751ba893  ONLINE       0     0     0
          mirror-1                                ONLINE       0     0     0
            b4b32aca-c727-4104-9d9a-188c2233134d  ONLINE       0     0     0
            5b2a2607-5068-41cd-bcc7-b01becc7a3c6  ONLINE       0     0     0

errors: No known data errors
root@sofx1010nas3012:~#

root@sofx1010nas3012:~# smartctl -i /dev/sdb
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.20-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red
Device Model:     WDC WD40EFZX-68AWUN0
Serial Number:    WD-WXB2DA1PYTEC
LU WWN Device Id: 5 0014ee 214d4ab02
Firmware Version: 81.00B81
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database 7.3/5528
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon May 27 07:13:44 2024 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

root@sofx1010nas3012:~# smartctl -i /dev/sdg
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.20-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     SPCC Solid State Disk
Serial Number:    YTAL230300975
Firmware Version: V1031C1
User Capacity:    1,024,209,543,168 bytes [1.02 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        Not in smartctl database 7.3/5528
ATA Version is:   ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon May 27 07:14:03 2024 EEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

root@sofx1010nas3012:~#

As far as I know the block device size matters, so here it is:

root@sofx1010nas3012:~# parted /dev/sdb print
Warning: Not all of the space available to /dev/sdb appears to be used, you can fix the GPT to use all of the space (an extra 7 blocks) or continue with the current setting?
Fix/Ignore? Ignore
Model: ATA WDC WD40EFZX-68A (scsi)
Disk /dev/sdb: 4001GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name  Flags
 1      65.5kB  2148MB  2147MB
 2      2148MB  4001GB  3999GB  zfs

root@sofx1010nas3012:~#

root@sofx1010nas3012:~# parted /dev/sdg print
Model: ATA SPCC Solid State (scsi)
Disk /dev/sdg: 1024GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name                  Flags
 1      1049kB  1024GB  1024GB  zfs          zfs-23331cef21e6dce8
 9      1024GB  1024GB  8389kB

root@sofx1010nas3012:~#

And more details about the pools:

root@sofx1010nas3012:~# zpool get all zpool-hdd-01
NAME          PROPERTY                       VALUE                          SOURCE
zpool-hdd-01  size                           5.45T                          -
zpool-hdd-01  capacity                       42%                            -
zpool-hdd-01  altroot                        /mnt                           local
zpool-hdd-01  health                         ONLINE                         -
zpool-hdd-01  guid                           3710994753107656689            -
zpool-hdd-01  version                        -                              default
zpool-hdd-01  bootfs                         -                              default
zpool-hdd-01  delegation                     on                             default
zpool-hdd-01  autoreplace                    off                            default
zpool-hdd-01  cachefile                      /data/zfs/zpool.cache          local
zpool-hdd-01  failmode                       continue                       local
zpool-hdd-01  listsnapshots                  off                            default
zpool-hdd-01  autoexpand                     on                             local
zpool-hdd-01  dedupratio                     1.00x                          -
zpool-hdd-01  free                           3.13T                          -
zpool-hdd-01  allocated                      2.31T                          -
zpool-hdd-01  readonly                       off                            -
zpool-hdd-01  ashift                         0                              default
zpool-hdd-01  comment                        -                              default
zpool-hdd-01  expandsize                     -                              -
zpool-hdd-01  freeing                        0                              -
zpool-hdd-01  fragmentation                  17%                            -
zpool-hdd-01  leaked                         0                              -
zpool-hdd-01  multihost                      off                            default
zpool-hdd-01  checkpoint                     -                              -
zpool-hdd-01  load_guid                      11853978348440641617           -
zpool-hdd-01  autotrim                       off                            default
zpool-hdd-01  compatibility                  off                            default
zpool-hdd-01  bcloneused                     248K                           -
zpool-hdd-01  bclonesaved                    248K                           -
zpool-hdd-01  bcloneratio                    2.00x                          -
root@sofx1010nas3012:~# zpool get all zpool-ssd-01
NAME          PROPERTY                       VALUE                          SOURCE
zpool-ssd-01  size                           1.86T                          -
zpool-ssd-01  capacity                       30%                            -
zpool-ssd-01  altroot                        /mnt                           local
zpool-ssd-01  health                         ONLINE                         -
zpool-ssd-01  guid                           11270999927152498569           -
zpool-ssd-01  version                        -                              default
zpool-ssd-01  bootfs                         -                              default
zpool-ssd-01  delegation                     on                             default
zpool-ssd-01  autoreplace                    off                            default
zpool-ssd-01  cachefile                      /data/zfs/zpool.cache          local
zpool-ssd-01  failmode                       wait                           default
zpool-ssd-01  listsnapshots                  off                            default
zpool-ssd-01  autoexpand                     off                            default
zpool-ssd-01  dedupratio                     1.00x                          -
zpool-ssd-01  free                           1.29T                          -
zpool-ssd-01  allocated                      579G                           -
zpool-ssd-01  readonly                       off                            -
zpool-ssd-01  ashift                         12                             local
zpool-ssd-01  comment                        -                              default
zpool-ssd-01  expandsize                     -                              -
zpool-ssd-01  freeing                        0                              -
zpool-ssd-01  fragmentation                  13%                            -
zpool-ssd-01  leaked                         0                              -
zpool-ssd-01  multihost                      off                            default
zpool-ssd-01  checkpoint                     -                              -
zpool-ssd-01  load_guid                      3967385267720250817            -
zpool-ssd-01  autotrim                       off                            default
zpool-ssd-01  compatibility                  off                            default
zpool-ssd-01  bcloneused                     0                              -
zpool-ssd-01  bclonesaved                    0                              -
zpool-ssd-01  bcloneratio                    1.00x                          -

The most interesting part, the tests:

root@sofx1010nas3012:/mnt/zpool-ssd-01# #fio --name TEST --eta-newline=5s --filename=fio-tempfile.dat --rw=rw --size=32g --io_size=1500g --blocksize=128k --iodepth=16 --direct=1 --numjobs=16 --runtime=120 --group_reporting --output=/mnt/zpool-ssd-01/fiotest-AppPool.txt
root@sofx1010nas3012:/mnt/zpool-ssd-01#
root@sofx1010nas3012:~# cat /mnt/zpool-ssd-01/fiotest-AppPool.txt
TEST: (g=0): rw=rw, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=psync, iodepth=16
...
fio-3.33
Starting 16 processes
TEST: Laying out IO file (1 file / 32768MiB)

TEST: (groupid=0, jobs=16): err= 0: pid=218168: Mon May 27 08:55:08 2024
  read: IOPS=3463, BW=433MiB/s (454MB/s)(50.7GiB/120017msec)
    clat (usec): min=7, max=97032, avg=841.16, stdev=3168.41
     lat (usec): min=7, max=97032, avg=841.39, stdev=3168.51
    clat percentiles (usec):
     |  1.00th=[   13],  5.00th=[   17], 10.00th=[   21], 20.00th=[   26],
     | 30.00th=[   30], 40.00th=[   33], 50.00th=[   36], 60.00th=[   39],
     | 70.00th=[   42], 80.00th=[   46], 90.00th=[   56], 95.00th=[ 8717],
     | 99.00th=[16188], 99.50th=[16909], 99.90th=[18482], 99.95th=[19006],
     | 99.99th=[32113]
   bw (  KiB/s): min=46848, max=16004145, per=100.00%, avg=444849.73, stdev=105765.83, samples=3824
   iops        : min=  366, max=125032, avg=3475.35, stdev=826.28, samples=3824
  write: IOPS=3463, BW=433MiB/s (454MB/s)(50.7GiB/120017msec); 0 zone resets
    clat (usec): min=12, max=137650, avg=3770.66, stdev=5296.50
     lat (usec): min=13, max=137651, avg=3774.02, stdev=5297.75
    clat percentiles (usec):
     |  1.00th=[   16],  5.00th=[   19], 10.00th=[   25], 20.00th=[   30],
     | 30.00th=[   33], 40.00th=[   37], 50.00th=[  208], 60.00th=[ 2089],
     | 70.00th=[ 4948], 80.00th=[ 8717], 90.00th=[13304], 95.00th=[15401],
     | 99.00th=[17171], 99.50th=[17695], 99.90th=[19006], 99.95th=[19530],
     | 99.99th=[32637]
   bw (  KiB/s): min=61440, max=16054378, per=100.00%, avg=444778.39, stdev=106022.68, samples=3824
   iops        : min=  480, max=125424, avg=3474.80, stdev=828.29, samples=3824
  lat (usec)   : 10=0.02%, 20=7.63%, 50=59.60%, 100=3.27%, 250=0.92%
  lat (usec)   : 500=0.93%, 750=0.77%, 1000=0.72%
  lat (msec)   : 2=2.47%, 4=3.98%, 10=9.38%, 20=10.27%, 50=0.03%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=0.25%, sys=1.86%, ctx=257505, majf=7, minf=215
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=415678,415635,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=433MiB/s (454MB/s), 433MiB/s-433MiB/s (454MB/s-454MB/s), io=50.7GiB (54.5GB), run=120017-120017msec
  WRITE: bw=433MiB/s (454MB/s), 433MiB/s-433MiB/s (454MB/s-454MB/s), io=50.7GiB (54.5GB), run=120017-120017msec
root@sofx1010nas3012:~#
root@sofx1010nas3012:~# cat /mnt/zpool-hdd-01/fiotest-StorePool.txt
TEST: (g=0): rw=rw, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=psync, iodepth=16
...
fio-3.33
Starting 16 processes
TEST: Laying out IO file (1 file / 32768MiB)

TEST: (groupid=0, jobs=16): err= 0: pid=227812: Mon May 27 08:59:58 2024
  read: IOPS=20.7k, BW=2587MiB/s (2712MB/s)(303GiB/120002msec)
    clat (usec): min=7, max=103076, avg=40.91, stdev=294.93
     lat (usec): min=7, max=103076, avg=41.03, stdev=294.99
    clat percentiles (usec):
     |  1.00th=[   12],  5.00th=[   13], 10.00th=[   14], 20.00th=[   15],
     | 30.00th=[   15], 40.00th=[   16], 50.00th=[   17], 60.00th=[   19],
     | 70.00th=[   21], 80.00th=[   27], 90.00th=[   38], 95.00th=[   45],
     | 99.00th=[  586], 99.50th=[ 1156], 99.90th=[ 3359], 99.95th=[ 3621],
     | 99.99th=[ 9765]
   bw (  MiB/s): min=  442, max=14130, per=100.00%, avg=2589.30, stdev=119.93, samples=3824
   iops        : min= 3540, max=113040, avg=20713.99, stdev=959.41, samples=3824
  write: IOPS=20.7k, BW=2591MiB/s (2717MB/s)(304GiB/120002msec); 0 zone resets
    clat (usec): min=13, max=148982, avg=727.12, stdev=737.44
     lat (usec): min=14, max=148984, avg=728.85, stdev=739.13
    clat percentiles (usec):
     |  1.00th=[   24],  5.00th=[   34], 10.00th=[   39], 20.00th=[  519],
     | 30.00th=[  553], 40.00th=[  570], 50.00th=[  586], 60.00th=[  611],
     | 70.00th=[  635], 80.00th=[  742], 90.00th=[ 1287], 95.00th=[ 2278],
     | 99.00th=[ 3359], 99.50th=[ 3490], 99.90th=[ 3785], 99.95th=[ 6063],
     | 99.99th=[16909]
   bw (  MiB/s): min=  504, max=14175, per=100.00%, avg=2593.28, stdev=119.91, samples=3824
   iops        : min= 4032, max=113403, avg=20745.84, stdev=959.29, samples=3824
  lat (usec)   : 10=0.07%, 20=34.20%, 50=20.42%, 100=0.84%, 250=0.34%
  lat (usec)   : 500=2.05%, 750=32.00%, 1000=3.00%
  lat (msec)   : 2=3.98%, 4=3.04%, 10=0.05%, 20=0.01%, 50=0.01%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=0.82%, sys=7.68%, ctx=2410085, majf=0, minf=193
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=2483180,2487072,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=2587MiB/s (2712MB/s), 2587MiB/s-2587MiB/s (2712MB/s-2712MB/s), io=303GiB (325GB), run=120002-120002msec
  WRITE: bw=2591MiB/s (2717MB/s), 2591MiB/s-2591MiB/s (2717MB/s-2717MB/s), io=304GiB (326GB), run=120002-120002msec
root@sofx1010nas3012:~#

So how I can get that speed by my HDDs and this low speed by the SSDs…

LSI 2008 is a PCIe 2.0 device. It is fine for HDDs, but you may want to upgrade to the 3.0 generation (2308, 3008) for SSDs.

2 Likes

Ah, stupid me…

So you say it will be much smarter to simply replace the disks places.
I see the pools were created by using UUID so simple physical disk movement will do the job, Am I right?

Yes, if you have free SATA 6 Gbs ports on the motherboard, you can just move your SSDs there and test again.

There are no free SATA ports, but I can flip the place of the disks, just attach the HDD disks to the HBA and SSD disks directly on the motherboard.