Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add block-info module #37

Merged
merged 1 commit into from
Dec 21, 2023
Merged

Conversation

srivathsa729
Copy link
Member

block-info prints scsihosts, scsi-device information and pending in-flight IOs

Authored-by: Gulam Mohamed [email protected]
Co-authored-by: Srivathsa Dara [email protected]

Copy link

Thank you for your pull request and welcome to our community! To contribute, please sign the Oracle Contributor Agreement (OCA).
The following contributors of this PR have not signed the OCA:

  • PR author: srivathsa729

To sign the OCA, please create an Oracle account and sign the OCA in Oracle's Contributor Agreement Application.

When signing the OCA, please provide your GitHub username. After signing the OCA and getting an OCA approval from Oracle, this PR will be automatically updated.

If you are an Oracle employee, please make sure that you are a member of the main Oracle GitHub organization, and your membership in this organization is public.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Required At least one contributor does not have an approved Oracle Contributor Agreement. label Nov 30, 2023
@oracle-contributor-agreement oracle-contributor-agreement bot added OCA Verified All contributors have signed the Oracle Contributor Agreement. and removed OCA Required At least one contributor does not have an approved Oracle Contributor Agreement. labels Nov 30, 2023
@srivathsa729 srivathsa729 reopened this Nov 30, 2023
@srivathsa729 srivathsa729 force-pushed the block-info branch 2 times, most recently from b303671 to e0072a4 Compare November 30, 2023 14:39
@biger410 biger410 self-requested a review November 30, 2023 17:34
@biger410 biger410 self-assigned this Nov 30, 2023
@biger410
Copy link
Member

Please give an output of this command. And also describe how you test it.

@srivathsa729
Copy link
Member Author

srivathsa729 commented Dec 1, 2023

=======
[root@lse-india-drgn drgn-tools-github]# python -m drgn_tools.corelens /drgn_tools_lse_india/vmcores/5.15.0-104.119.4.2-uek7/vmcore --debuginfo /drgn_tools_lse_india/vmcores/5.15.0-104.119.4.2-uek7/ -M blockinfo
====== MODULE block-info ======



SCSI HOSTS
==========
HOST                    NAME
0xffff88d9e1971000      host0
0xffff88d9e1a12000      host1
0xffff88d9e1a10000      host2


MAJOR      MINOR      TIMEOUT              #BLOCKS              GENDISK              NAME
8          0          8265251542           50010783744          0xffff88d9c0e5fe00   sda
8          16         8265256479           53687091200          0xffff88d9c0e55600   sdb
8          1          8265251542           104857600            0xffff88d9c0e5fe00   sda1
8          2          8265251542           1073741824           0xffff88d9c0e5fe00   sda2
8          3          8265251542           48830087168          0xffff88d9c0e5fe00   sda3
8          17         8265256479           42949672960          0xffff88d9c0e55600   sdb1
252        0          0                    38088474624          0xffff88d9c0e62000   dm-0
252        1          0                    10737418240          0xffff88d9c1c11c00   dm-1
8          18         8265256479           2147483648           0xffff88d9c0e55600   sdb2
8          19         8265256479           1073741824           0xffff88d9c0e55600   sdb3


device               hwq                  request              op
flags                offset               len                  inflight-time(ms)
sdb                  0xffff88d9c2a2ec00     0xffff88da10354c80     REQ_OP_WRITE
0b100011000011000010 23134208             1048576              3172
sdb                  0xffff88d9c2a2ec00     0xffff88da103550c0     REQ_OP_WRITE|__REQ_NOMERGE
0b100011000011000010 23136256             1048576              3171
sdb                  0xffff88d9c2a2ec00     0xffff88da1035c380     REQ_OP_WRITE|__REQ_SYNC
0b100011000011000010 89131040             24576                0
sdb                  0xffff88d9c2a2ec00     0xffff88da10355500     REQ_OP_WRITE|__REQ_NOMERGE
0b100011000011000010 23138304             1048576              3171
sdb                  0xffff88d9c2a2ec00     0xffff88da103572c0     REQ_OP_WRITE|__REQ_SYNC
0b100011000011000010 42541392             49152                616
sdb                  0xffff88d9c2a2ec00     0xffff88da10355940     REQ_OP_WRITE|__REQ_NOMERGE
0b100011000011000010 23140352             1048576              3171
sdb                  0xffff88d9c2a2ec00     0xffff88da10355d80     REQ_OP_WRITE|__REQ_NOMERGE
0b100011000011000010 23142400             1048576              3171
sdb                  0xffff88d9c2a2ec00     0xffff88da103561c0     REQ_OP_WRITE|__REQ_NOMERGE
0b100011000011000010 23144448             1048576              3171
sdb                  0xffff88d9c2a2ec00     0xffff88da10356600     REQ_OP_WRITE|__REQ_NOMERGE
0b100011000011000010 23146496             1048576              3171
sdb                  0xffff88d9c2a2ec00     0xffff88da10356a40     REQ_OP_WRITE|__REQ_NOMERGE
0b100011000011000010 23148544             1048576              3171
sdb                  0xffff88d9c2a2ec00     0xffff88da1034d040     REQ_OP_WRITE
0b100011000011000010 23150592             1048576              616
sdb                  0xffff88d9c2a2ec00     0xffff88da10353b80     REQ_OP_WRITE|__REQ_NOMERGE
0b100011000011000010 23126016             1048576              3172
sdb                  0xffff88d9c2a2ec00     0xffff88da10353fc0     REQ_OP_WRITE|__REQ_NOMERGE
0b100011000011000010 23128064             1048576              3172
sdb                  0xffff88d9c2a2ec00     0xffff88da10354400     REQ_OP_WRITE|__REQ_NOMERGE
0b100011000011000010 23130112             1048576              3172
sdb                  0xffff88d9c2a2ec00     0xffff88da10354840     REQ_OP_WRITE|__REQ_NOMERGE
0b100011000011000010 23132160             1048576              3172
sdb                  0xffff88d9c2a2ec00     0xffff88da103394c0     REQ_OP_WRITE|__REQ_SYNC
0b100011000011000010 85985312             24576                616


(int)16 inflight IOs found

@srivathsa729
Copy link
Member Author

srivathsa729 commented Dec 1, 2023

We tested this module by running it on uek5, uek6 and uek7 vmcores and validated the output with crash.

Copy link
Member

@brenns10 brenns10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code overall looks pretty good, I have a few comments but nothing too crazy.

My main gripe is the code organization:

  1. scsiHosts and scsidevs are bad Python naming style. And both modules are rather small. I think it makes sense to put all that code into a single module, drgn_tools/scsi.py.
  2. I don't think we need a separate file drgn_tools/blockinfo.py. Corelens modules don't need to be in a file with the same name. In this case, it would be better for you to add this module to the drgn_tools/block.py file where the rest of the block-layer helpers are.

Thank you!

Comment on lines 50 to 60
def print_scsi_hosts(prog: Program) -> None:
"""
Prints scsi host information
"""
shostlist = _list_of_scsi_host(prog)
print("SCSI HOSTS\n==========\nHOST\t\t\tNAME")
for x in range(len(shostlist)):
# print(f"{shostlist[x].hostt.name}\n")
print(
f"{hex(shostlist[x].value_())}\thost{shostlist[x].host_no.value_()}"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try using the drgn_tools.table.print_table() function to help this. It's the standard way to print these things so they get formatted properly.

Comment on lines 19 to 25
try:
class_in_private = prog.cache["knode_class_in_device_private"]
except KeyError:
class_in_private = prog.type("struct device_private").has_member(
"knode_class"
)
prog.cache["knode_class_in_device_private"] = class_in_private
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't bother caching this. Looking up a type, and testing whether the type has a member, is not a very expensive task. Plus, this function isn't called repeatedly in a performance-sensitive area, so caching doesn't really make sense.

Comment on lines 14 to 18
def _list_of_scsi_host(prog: Program) -> List[Object]:
"""
Iterates through all scsi hosts and returns a
list of scsi devices
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is marked private (starts with an _), but it looks useful. Could you name it such that it's not private? And also, be sure to document the C type of the returned objects in the docstring. EG:

Suggested change
def _list_of_scsi_host(prog: Program) -> List[Object]:
"""
Iterates through all scsi hosts and returns a
list of scsi devices
"""
def for_each_scsi_host(prog: Program) -> List[Object]:
"""
Iterates through all scsi hosts and returns a
list of scsi devices
:returns: list of ``struct Scsi_Host *``
"""

)
for disk in for_each_disk(prog):
partno = 0
has_bdev_struct = is_blkdev_hdpart(prog)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_blkdev_hdpart() could be expensive (iterates over every disk) and you're calling it inside the loop. Please move this call outside the loop to ensure we don't get an O(N^2) situation.

Comment on lines 52 to 55
print(
"%-10s %-10s %-20s %-20s %-20s %-20s"
% ("MAJOR", "MINOR", "TIMEOUT", "#BLOCKS", "GENDISK", "NAME")
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this is a good place for print_table()

)
)
else:
while disk.part_tbl.part[partno]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the part_tbl has a len field. Try this instead:

for partno in range(disk.part_tbl.len):

Then you shouldn't need to do the FaultError check. And you can remove the statements adjusting partno and the declaration.

drgn_tools/block.py Outdated Show resolved Hide resolved
Comment on lines 48 to 51
def print_scsi_devs_info(prog: Program) -> None:
"""
Prints the scsi device information
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I'm trying to understand what is scsi-specific here. As far as I can tell, this is simply printing block device info?

If that's the case, then it really doesn't belong in a module named with scsi. Probably the drgn_tools.block module, and it should be named something like print_block_devs_info().

@srivathsa729
Copy link
Member Author

Thanks for the review Stephen, will update PR keeping your comments in mind.

num_inflight_ios = 0
for disk in for_each_disk(prog):
num_inflight_ios += get_inflight_io_nr(prog, disk)
print(f"{num_inflight_ios} inflight IOs found")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please extend helper "get_inflight_io_nr" with an extra parameter, if it points to a special disk, then return the inflight-io for that disk, otherwise return the inflight-io for all the disk in the system, then you can just print the value anywhere, no need a special print function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the better way is define a new helper to get the inflight io number for all the disk, and have the new helper invoking get_inflight_io_nr() to get per disk inflight io number.

drgn_tools/scsi.py Outdated Show resolved Hide resolved
print("\n\nInFlight I/Os\n=============")
dump_inflight_io(prog, "all")
print("\n")
print_total_inflight_ios(prog)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This BlockInfo module did multiple things, I think we should separated it, make it only do the things like "dev -d", and move scsi host dump to scsi.py using new corelen module, finally have InflightIOModule module do the inflight dump thing?
Regarding all the partition related helper, i think we can integrate through a new pull request, that pull request should provide a helper which can list partition for a particular disk or all disks, it should include name, offset, length, and also whether the partition is read-only.

drgn_tools/block.py Show resolved Hide resolved
has_bdev_struct = False

return has_bdev_struct

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sound too heavy to tell whether gendisk using hd_struct or block_device, i guess there will be better way. Can we check the type of part_tbl or part0? Please ask Stephen if you are not sure how to improve it.

@brenns10
Copy link
Member

A couple notes to add on to Junxiao's review:

  1. For the next iteration, please rebase on the current upstream/main branch. As it stands, your branch has conflicts with upstream/main.
  2. Please remember, each time you update the pull request on Github, make sure you also push a to Gitlab branch named github/pull-request-XX (where XX is the number, in this case 37). This will run additional tests that I need to see pass before I can merge.

Thanks!

@srivathsa729
Copy link
Member Author

As suggested by junxiao, removed the code corresponding to scsi and partitions. This module just prints block device related information. Will create a separate PRs for scsiinfo and partitioninfo.

@srivathsa729
Copy link
Member Author

sample output:

[sridara@lse-india-drgn drgn-tools-github]$ python -m drgn_tools.corelens /drgn_vmcore/uek6/5.4.17-2136.325.5.1-nvme/vmcore --debuginfo /drgn_vmcore/uek6/5.4.17-2136.325.5.1-nvme/ -M blockinfo
====== MODULE blockinfo ======
MAJOR  GENDISK             NAME     REQUEST_QUEUE       Inflight I/Os
259    0xffff925d55482000  nvme1n1  0xffff925cee3f3a50  0
259    0xffff925d55567000  nvme0n1  0xffff925cee3eeae8  24
8      0xffff925d55a80800  sda      0xffff925cee3926e0  0

@vashanmu
Copy link
Contributor

vashanmu commented Dec 18, 2023 via email

"""
output = [["MAJOR", "GENDISK", "NAME", "REQUEST_QUEUE", "Inflight I/Os"]]
for disk in for_each_disk(prog):
mjr = int(disk.major)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'major' sounds a better name.

name = disk.disk_name.string_().decode("utf-8")
rq = hex(disk.queue.value_())
ios = int(get_inflight_io_nr(prog, disk))
output.append([str(mjr), gendisk, name, rq, str(ios)])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need converting 'mjr' and 'ios'?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, pre_commit complained that the list has different types of objects.

I have use int conversion for major because it has "(int 32)" prefix, and for IOs as you mentioned it does not need int conversion, i will remove that. Converted both of them to str to make pre_commit happy.

gendisk = hex(disk.value_())
name = disk.disk_name.string_().decode("utf-8")
rq = hex(disk.queue.value_())
ios = int(get_inflight_io_nr(prog, disk))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'get_inflight_io_nr()' already return int, convert is not required.

@biger410
Copy link
Member

I’m working on scsiinfo, will move the list_of_scsi_host() / for_each_scsi_host() to generic module that could be used by other modules. Rajan From: SRIVATHSA DARA @.> Date: Monday, December 18, 2023 at 01:53 To: oracle-samples/drgn-tools @.> Cc: Subscribed @.***> Subject: Re: [oracle-samples/drgn-tools] Add block-info module (PR #37) As suggested by junxiao, removed the code corresponding to scsi and partitions. This module just prints block device related information. Will create a separate PRs for scsiinfo and partitioninfo.

Hi @vashanmu , @srivathsa729 already posted a basic scsiinfo pull request, i know it's pulled from your scsiinfo helpers, would you like to merge that one and then you can use new pull request to extend it or you would like leaving the whole scsiinfo for you to work on? https://github.com/oracle-samples/drgn-tools/pull/50/files

@vashanmu
Copy link
Contributor

vashanmu commented Dec 18, 2023 via email

@biger410
Copy link
Member

Have an offline discussion with Rajan, we will go ahead to merge scsiinfo, and Rajan will extend it further using new pull request. This way Gulam can move forward his iscsi helpers in parallel.

@srivathsa729
Copy link
Member Author

this is the error pre_commit is reporting when all the elements of the list are not strings

drgn_tools/block.py:526: error: List item 0 has incompatible type "int"; expected "str"  [list-item]
drgn_tools/block.py:526: error: List item 4 has incompatible type "int"; expected "str"  [list-item]
Found 2 errors in 1 file (checked 1 source file)

@srivathsa729
Copy link
Member Author

Removed int conversion for inflight-IOs, but found that some times get_inflight_io_nr() is returning non-int values. See output below:

[sridara@lse-india-drgn drgn-tools-github]$ python -m drgn_tools.corelens /drgn_vmcore/uek7/5.15.0-104.119.4.2-uek7/vmcore --debuginfo /drgn_vmcore/uek7/5.15.0-104.119.4.2-uek7/ -M block-info
====== MODULE blockinfo ======
MAJOR  GENDISK             NAME  REQUEST_QUEUE       Inflight I/Os
8      0xffff88d9c0e5fe00  sda   0xffff88d9e2c31330  (int)0
8      0xffff88d9c0e55600  sdb   0xffff88d9e2c31cc8  (int)16
252    0xffff88d9c0e62000  dm-0  0xffff88da10109cc8  0
252    0xffff88d9c1c11c00  dm-1  0xffff88d9e2c42ff8  0
[sridara@lse-india-drgn drgn-tools-github]$ python -m drgn_tools.corelens /drgn_vmcore/uek6/5.4.17-2136.325.5.1-nvme/vmcore --debuginfo /drgn_vmcore/uek6/5.4.17-2136.325.5.1-nvme/ -M blockinfo
====== MODULE blockinfo ======
MAJOR  GENDISK             NAME     REQUEST_QUEUE       Inflight I/Os
259    0xffff925d55482000  nvme1n1  0xffff925cee3f3a50  0
259    0xffff925d55567000  nvme0n1  0xffff925cee3eeae8  24
8      0xffff925d55a80800  sda      0xffff925cee3926e0  0


class BlockInfo(CorelensModule):
"""
Corelens Module for scsi-devs-info
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix this comment, it's block device info.

@biger410
Copy link
Member

Removed int conversion for inflight-IOs, but found that some times get_inflight_io_nr() is returning non-int values. See output below:

[sridara@lse-india-drgn drgn-tools-github]$ python -m drgn_tools.corelens /drgn_vmcore/uek7/5.15.0-104.119.4.2-uek7/vmcore --debuginfo /drgn_vmcore/uek7/5.15.0-104.119.4.2-uek7/ -M block-info
====== MODULE blockinfo ======
MAJOR  GENDISK             NAME  REQUEST_QUEUE       Inflight I/Os
8      0xffff88d9c0e5fe00  sda   0xffff88d9e2c31330  (int)0
8      0xffff88d9c0e55600  sdb   0xffff88d9e2c31cc8  (int)16
252    0xffff88d9c0e62000  dm-0  0xffff88da10109cc8  0
252    0xffff88d9c1c11c00  dm-1  0xffff88d9e2c42ff8  0
[sridara@lse-india-drgn drgn-tools-github]$ python -m drgn_tools.corelens /drgn_vmcore/uek6/5.4.17-2136.325.5.1-nvme/vmcore --debuginfo /drgn_vmcore/uek6/5.4.17-2136.325.5.1-nvme/ -M blockinfo
====== MODULE blockinfo ======
MAJOR  GENDISK             NAME     REQUEST_QUEUE       Inflight I/Os
259    0xffff925d55482000  nvme1n1  0xffff925cee3f3a50  0
259    0xffff925d55567000  nvme0n1  0xffff925cee3eeae8  24
8      0xffff925d55a80800  sda      0xffff925cee3926e0  0

Then please fix get_inflight_io_nr() to make it always return int.

blockinfo prints block device information

Authored-by: Gulam Mohamed <[email protected]>
Co-authored-by: Srivathsa Dara <[email protected]>
Copy link
Member

@biger410 biger410 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@brenns10 brenns10 merged commit bf9fe54 into oracle-samples:main Dec 21, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants