Unexpectedly low special vdev usage with `special_small_blocks` #16852

intelfx · 2024-12-09T21:31:17Z

intelfx
Dec 9, 2024

I have a fresh new pool that has just finished restoring from a backup, and I'm observing behavior that is not consistent with my understanding of how ZFS works.

The pool consists of 4× HDDs in a 4-way raidz1, plus 2× SSDs in a mirror as the special vdev. The pool was created with ashift=12; all datasets were created with recordsize=1M and special_small_blocks=128K:

# zpool get all htank | awk '$2 !~ /feature@/ && $4 !~ /-|default/ { print }'
NAME   PROPERTY              VALUE       SOURCE
htank  ashift                12          local

# zfs get -s local all htank
NAME   PROPERTY              VALUE       SOURCE
htank  recordsize            1M          local
htank  mountpoint            /mnt/htank  local
htank  checksum              sha256      local
htank  compression           zstd-11     local
htank  atime                 off         local
htank  xattr                 on          local
htank  dnodesize             auto        local
htank  acltype               posix       local
htank  relatime              off         local
htank  special_small_blocks  128K        local

According to zdb -bbb (the cumulative PSIZE column of the histogram), the pool has ~87G of blocks with size <= 128 KiB:

# zdb -Lbbbs htank
<...>
 33.4M  30.1T   27.3T   36.4T   1.09M    1.11   100.00  Total
 1.48M   154G   6.82G   14.1G   9.54K   22.65     0.04  Metadata Total

Block Size Histogram

  block   psize                lsize                asize
   size   Count   Size   Cum.  Count   Size   Cum.  Count   Size   Cum.
    512:  1.41M   722M   722M  1.41M   722M   722M      0      0      0
     1K:   107K   113M   835M   107K   113M   835M      0      0      0
     2K:  42.3K   117M   952M  42.3K   117M   952M      0      0      0
     4K:  1.46M  5.85G  6.78G   118K   529M  1.45G  1.60M  6.40G  6.40G
     8K:   182K  1.56G  8.34G  46.6K   517M  1.95G  1.48M  12.2G  18.6G
    16K:  85.0K  1.85G  10.2G   279K  4.82G  6.77G   188K  3.37G  22.0G
    32K:   119K  5.33G  15.5G  66.1K  3.08G  9.85G   117K  5.19G  27.2G
    64K:   163K  14.8G  30.3G  64.6K  5.58G  15.4G   180K  16.1G  43.3G
   128K:   317K  56.3G  86.6G  1.21M   156G   172G   199K  41.3G  84.6G
   256K:  1.90M   852G   939G  35.2K  12.5G   184G   391K   148G   233G
   512K:  4.19M  2.98T  3.89T  27.7K  19.3G   204G  3.82M  2.84T  3.07T
     1M:  23.4M  23.4T  27.3T  29.9M  29.9T  30.1T  25.4M  33.4T  36.4T
     2M:      0      0  27.3T      0      0  30.1T      0      0  36.4T
     4M:      0      0  27.3T      0      0  30.1T      0      0  36.4T
     8M:      0      0  27.3T      0      0  30.1T      0      0  36.4T
    16M:      0      0  27.3T      0      0  30.1T      0      0  36.4T

However, zpool list -v only reports ~42G allocated from the special vdev:

# zpool list -v htank
NAME                  SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
htank                58.5T  33.2T  25.3T        -         -     1%    56%  1.00x    ONLINE  -
  raidz1-0           58.2T  33.2T  25.1T        -         -     1%  57.0%      -    ONLINE
    htank-1          14.6T      -      -        -         -      -      -      -    ONLINE
    htank-2          14.6T      -      -        -         -      -      -      -    ONLINE
    htank-3          14.6T      -      -        -         -      -      -      -    ONLINE
    htank-4          14.6T      -      -        -         -      -      -      -    ONLINE
special                  -      -      -        -         -      -      -      -         -
  mirror-2            254G  41.6G   212G        -         -     7%  16.4%      -    ONLINE
    htank-special-1   256G      -      -        -         -      -      -      -    ONLINE
    htank-special-2   256G      -      -        -         -      -      -      -    ONLINE
logs                     -      -      -        -         -      -      -      -         -
  htank-log-1        7.98G    32K  7.50G        -         -     0%  0.00%      -    ONLINE
cache                    -      -      -        -         -      -      -      -         -
  htank-cache-1       128G  2.55G   125G        -         -     0%  1.99%      -    ONLINE

According to my understanding, this configuration should have resulted in at least 100G of special vdev usage, because all 87G of blocks whose PSIZE is 128K or less should have ended up on the special vdev.

Is either of these reports wrong, or am I misunderstanding the mechanics of this feature?

Answered by amotin

Dec 12, 2024

As I can see from int bin = highbit64(BP_GET_PSIZE(bp)) - 1, the histograms in zdb are rounding block sizes down to the nearest power of 2. So the bins for 128K actually include blocks with size 128K <= size < 256K. Same time special_small_blocks=128K really means size <= 128KB. May be the rounding in zdb could benefit from a closer look.

View full answer

amotin · 2024-12-12T20:11:36Z

amotin
Dec 12, 2024
Collaborator

As I can see from int bin = highbit64(BP_GET_PSIZE(bp)) - 1, the histograms in zdb are rounding block sizes down to the nearest power of 2. So the bins for 128K actually include blocks with size 128K <= size < 256K. Same time special_small_blocks=128K really means size <= 128KB. May be the rounding in zdb could benefit from a closer look.

2 replies

intelfx Dec 13, 2024
Author

Yeah. I've found that out by reading zdb.c — just forgot to report back. So it actually is a case of "reports being wrong" (or, rather, misleading).

I have a few patches against zdb in a branch starting with this commit (intelfx/zfs@de6bc7a) that I made while investigating this issue. Would this be worth sending as a PR?

amotin Dec 13, 2024
Collaborator

That's what PRs are for -- to ask for other people's comments about what you believe is right.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpectedly low special vdev usage with `special_small_blocks` #16852

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Unexpectedly low special vdev usage with special_small_blocks #16852

intelfx Dec 9, 2024

Replies: 1 comment · 2 replies

amotin Dec 12, 2024 Collaborator

intelfx Dec 13, 2024 Author

amotin Dec 13, 2024 Collaborator

Unexpectedly low special vdev usage with `special_small_blocks` #16852

intelfx
Dec 9, 2024

Replies: 1 comment 2 replies

amotin
Dec 12, 2024
Collaborator

intelfx Dec 13, 2024
Author

amotin Dec 13, 2024
Collaborator