Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LVM resize fails with calculation error - pv metadata unaccounted for #1320

Open
plimptm opened this issue Nov 25, 2024 · 5 comments
Open

LVM resize fails with calculation error - pv metadata unaccounted for #1320

plimptm opened this issue Nov 25, 2024 · 5 comments
Assignees
Labels

Comments

@plimptm
Copy link

plimptm commented Nov 25, 2024

I have been troubleshooting the use of this library in the ansible linux-system-roles collection and have traced an issue with the resizing capabilities to what I believe to be a size calculation bug in blivet. I searched the issues and didn't see anything related, so I'm opening this here.

Note: I first encountered this bug while trying to use blivet to resize a logical volume (after resizing the underlying disk) but have been able to reproduce the issue by simply calling the LVMLogicalVolumeDevice.resize() method even when there is no extra space available.

Expected behaviors:

  • calling the resize method with unallocated PEs in the parent volume group will expand the logical volume to fill those unallocated PEs
  • calling the resize method with no unallocated PEs in the parent volume group will see that the logical volume and volume groups sizes already match and return something indicating there are no available extents

Actual behavior:

>>> b = blivet.Blivet()
>>> lv = b.lvs[9]
>>> lv
LVMLogicalVolumeDevice instance (0x7f630064f880) --
  name = vg01-lv_app01  status = True  id = 354
  children = [] 
  parents = ['existing 10 GiB lvmvg vg01 (350)']
  uuid = Kgmu9m-dPFg-PdEJ-6KmH-5FPs-CRBx-Q2lVxa  size = 9.99 GiB
  format = existing ext4 filesystem
  major = 0  minor = 0  exists = True  protected = False
  sysfs path = /sys/devices/virtual/block/dm-9
  target size = 9.99 GiB  path = /dev/mapper/vg01-lv_app01
  format args = []  original_format = ext4  target = None  dm_uuid = None  VG device = LVMVolumeGroupDevice instance (0x7f62ff5018b0) --
  name = vg01  status = True  id = 350
  children = ['existing 9.99 GiB lvmlv vg01-lv_app01 (354) with existing ext4 filesystem']
  parents = ['existing 10 GiB disk sdb (341) with existing lvmpv']
  uuid = 9pDOvI-CXzn-0Xce-XRka-0TtE-Ut7E-1qJNFE  size = 10 GiB  
  format = existing None
  major = 0  minor = 0  exists = True  protected = False
  sysfs path =
  target size = 9.99 GiB  path = /dev/vg01
  format args = []  original_format = None  free = 0 B  PE Size = 4 MiB  PE Count = 2558
  PE Free = 0  PV Count = 1
  modified = False  extents = 2559  free space = 4 MiB
  free extents = 1  reserved percent = 0  reserved space = 0 B
  PVs = ['existing 10 GiB disk sdb (341) with existing lvmpv']
  LVs = ['existing 9.99 GiB lvmlv vg01-lv_app01 (354) with existing ext4 filesystem']
  segment type = linear percent = 0
  VG space used = 9.99 GiB
>>> lv.vg.size
Size (9.99609375 GiB)
>>> lv.size
Size (9.9921875 GiB)
>>> lv.size == lv.vg.size 
False
>>> lv.size = lv.vg.size
>>> lv.resize()
Traceback (most recent call last):
  File "/usr/lib64/python3.9/site-packages/gi/overrides/BlockDev.py", line 1092, in wrapped
    ret = orig_obj(*args, **kwargs)
  File "/usr/lib64/python3.9/site-packages/gi/overrides/BlockDev.py", line 605, in lvm_lvresize
    return _lvm_lvresize(vg_name, lv_name, size, extra)
gi.repository.GLib.GError: g-bd-utils-exec-error-quark: Process reported exit code 5:   Insufficient free space: 1 extents needed, but only 0 available
 (0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.9/site-packages/blivet/threads.py", line 53, in run_with_lock
    return m(*args, **kwargs)
  File "/usr/lib/python3.9/site-packages/blivet/devices/lvm.py", line 2513, in decorated
    return meth(self, *args, **kwargs)  # pylint: disable=not-callable
  File "/usr/lib/python3.9/site-packages/blivet/devices/lvm.py", line 2773, in resize
    blockdev.lvm.lvresize(self.vg.name, self._name, self.size)
  File "/usr/lib64/python3.9/site-packages/gi/overrides/BlockDev.py", line 1114, in wrapped
    raise transform[1](msg)
gi.overrides.BlockDev.LVMError: Process reported exit code 5:   Insufficient free space: 1 extents needed, but only 0 available

LVM specs:

# pvdisplay /dev/sdb
  --- Physical volume ---
  PV Name               /dev/sdb
  VG Name               vg01
  PV Size               <10.00 GiB / not usable 3.00 MiB
  Allocatable           yes (but full)
  PE Size               4.00 MiB
  Total PE              2558
  Free PE               0
  Allocated PE          2558
  PV UUID               m3yU1i-MHFE-yAkj-B7hu-gTfG-bhGq-IFv8rl

# vgdisplay vg01
  --- Volume group ---
  VG Name               vg01
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  4
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               9.99 GiB
  PE Size               4.00 MiB
  Total PE              2558
  Alloc PE / Size       2558 / 9.99 GiB
  Free  PE / Size       0 / 0
  VG UUID               9pDOvI-CXzn-0Xce-XRka-0TtE-Ut7E-1qJNFE

Server OS: RHEL 9.2

@plimptm
Copy link
Author

plimptm commented Nov 25, 2024

I've noticed an inconsistency in how blivet reports the size of the underlying pv.

The blivet vg size property appears to reach straight back to read the size of the underlying block device read_current_size and subtracts only the volume group metadata, completely ignoring the pv metadata (and any unusable blocks)

If I subtract a total of 8 MiB (4 MiB for pv metadata + 4 MiB for vg metadata) from 10 GiB, the math adds up:

>>> pv.size - vg.lvm_metadata_space*2
Size (9.9921875 GiB)
>>> pv.size - vg.lvm_metadata_space*2 == lv.size
True

I would expect pvs to be handled by blivet with similar properties to other lvm objects, with an accounting for metadata, but it appears they are handled as DiskDevice type objects.

>>> vg.lvm_metadata_space
Size (4 MiB)
>>> pv.lvm_metadata_space
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'DiskDevice' object has no attribute 'lvm_metadata_space'

@vojtechtrefny vojtechtrefny self-assigned this Nov 26, 2024
@vojtechtrefny
Copy link
Member

Hi, thank you for the report. First of all I am not saying there is no bug the code or that our calculation of metadata sizes in LVM is perfect. I know it's not and we had more than a few issues with it in the past.

There is one problem with your example -- you are not supposed to call the resize function directly, but either use the Blivet.resize_device helper function or register the ActionResizeDevice action directly (unfortunately our API is not very user friendly and our documentation is not the best in the world). Using the actions might prevent some of the problems you see (there are some additional fail safe mechanisms in the actions that sometimes fix/hide problems from the lower layer).

Expected behaviors:

* calling the `resize` method with unallocated PEs in the parent volume group will expand the logical volume to fill those unallocated PEs

* calling the `resize` method with no unallocated PEs in the parent volume group will see that the logical volume and volume groups sizes already match and return something indicating there are no available extents

That's not how the resize function works. resize will simply try to resize the device to its target_size. Which for existing devices is just their size.

The reason for this is that resize is a generic function that works the same for all devices -- so resize for partition and LV is the same function that (in general) is called when processing the ActionResizeDevice action.

The question is why resize actually tried to resize the LV, it shouldn't do that by itself, I suspect the reason is some rounding and alignment for PE size we do and that we actually rounded the size up somewhere resulting in trying to resize the LV.

The blivet vg size property appears to reach straight back to read the size of the underlying block device read_current_size and subtracts only the volume group metadata, completely ignoring the pv metadata (and any unusable blocks)

In our code there is a difference between size and free space. For VG the size is the sum of the PV sizes minus the PV metadata calculated in lvm_metadata_space. The VG metadata (including things like pmspare) and other unusable space is accounted for in free_space (for us VG metadata is still part of VG so part of its size) so free_space should be used when resizing (or to be more precise LV max_size should be used when checking if the LV can be resized).

I would expect pvs to be handled by blivet with similar properties to other lvm objects, with an accounting for metadata, but it appears they are handled as DiskDevice type objects.

Because the device itself is disk, the format is LVM PV so for LVM PV specific values you need to look at pv.format.

The low level API in blivet (especially for LVM) is horribly complicated. The upper level API (helper functions in the Blivet object) hide most of these things and I definitely do not recommend using the low level API to anyone.


But all the complicated explanations above aside -- we definitely do have a bug somewhere, because we think there is one free extent in your VG (free space = 4 MiB free extents = 1) when there isn't one (Free PE / Size 0 / 0). So that's definitely not right.

Can you please share the entire blivet log from your system? If you run the code below it will save the log to /tmp/blivet.log:

from blivet import util
util.set_up_logging()
import blivet; b=blivet.Blivet(); b.reset()

if you don't want to share the entire log publicly you can send it to me to vtrefny AT redhat.com, thank you.

@plimptm
Copy link
Author

plimptm commented Dec 4, 2024

Thanks for the updates and information - that sheds a lot of light on what I was looking at. It looks like the blivet ansible module is using the max_size to check before attempting resize, so I guess that is encouraging.

I'm attaching the blivet log file you requested. Let me know if you need anything else.
blivet.log

@vojtechtrefny
Copy link
Member

Thank you for the logs (and sorry for taking so long to respond). The problem here is not in the VG metadata or LV size calculation, the problem is the PV size. From the logs the size of the sdb drive is 10737418240 B, but the size of the PV format is only 10729029632 B (as reported by pvs). That's 8 MiB difference andit should be only 4 MiB. This tells me, the problem is in the PV grow operation in the storage role (or more likely in blivet which handles that one as well), which for some reason failed to resize the PV format fully after the disk was resized.

Unfortunately we don't really check the PV format size in blivet when calculating VG free space -- we check just the size of the underlying block device size and expect the PV format has the same size. So in this case we think there is extra 4 MiB of free space in the VG, because from out point of view the PV is 4 MiB bigger than it actually is. That's definitely something we should fix.
The other question is why the grow_to_fill option in storage role didn't fully grow the PV, I wasn't able to reproduce that part of this issue yet.

vojtechtrefny added a commit to vojtechtrefny/storage that referenced this issue Jan 15, 2025
We need to set the `grow_to_fill` property on the LVMPV format not
on the block device and the target size for the resize needs to be
size of the PV, 'self._device' is the pool (VG) in this case.

Related: storaged-project/blivet#1320

Resolves: RHEL-73244
@vojtechtrefny
Copy link
Member

I've posted a patch for storage role fixing the PV grow feature, see linux-system-roles/storage#502. I'll keep this open until the blivet part is fixed (using real PV format size in VG size and free space calculation), but the storage role fix by itself should be enough to fix your issue (without the PV not being fully resized, you shouldn't hit the blivet issue).

vojtechtrefny added a commit to vojtechtrefny/storage that referenced this issue Jan 17, 2025
We need to set the `grow_to_fill` property on the LVMPV format not
on the block device and the target size for the resize needs to be
size of the PV, 'self._device' is the pool (VG) in this case.

Related: storaged-project/blivet#1320

Resolves: RHEL-73244
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants