-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rbd: added rbd info to validateRBDImageCount func #4938
Conversation
e2e/rbd.go
Outdated
framework.Failf( | ||
"backend images not matching kubernetes resource count,image count %d kubernetes resource count %d"+ | ||
"\nbackend image Info:\n %v", | ||
"\nbackend image Info:\n %v\n images information and status%v", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
status%v
could use a space befor the %v
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
e2e/rbd_helper.go
Outdated
} | ||
err = json.Unmarshal([]byte(stdOut), &imgStatus) | ||
if err != nil { | ||
return imgStatus, fmt.Errorf("unmarshal failed: %w. raw buffer response: %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't return the buffer in the error message, as that may get printed many more times when callers log the error. Instead, use framework.Logf()
to report details like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e2e/rbd_helper.go:1125:3: k8s.io/kubernetes/test/e2e/framework.Logf does not support error-wrapping directive %w
Failed to run container: https://url.corp.redhat.com/cc13cdb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
framework.Logf()
does not support %w
for logging errors. Use %v
while logging, but when %w
with fmt.Errorf()
or errors.New()
.
e2e/rbd_helper.go
Outdated
fmt.Sprintf("rbd status %s %s --format json", rbdOptions(poolName), imageName), | ||
rookNamespace) | ||
if err != nil { | ||
return imgStatus, fmt.Errorf("failed to get rbd status: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It helps to give this err != nil
and the below stdErr != ""
errors a slightly different message. When something goes wrong, it can be identified more precisely if the error message is unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
/test ci/centos/mini-e2e-helm/k8s-1.30 |
e2e/rbd.go
Outdated
return | ||
} | ||
// Collecting image details for printing | ||
imageDetails = append(imageDetails, fmt.Sprintf("Pool Name: %s, Image Name: %s, Img Info: %v, Img Status: %v", pool, image, imgInfo, imgStatus)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
golangci-lint complains that this line is too log: https://github.com/ceph/ceph-csi/actions/runs/11613610012/job/32379913964?pr=4938#step:3:820
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
71d09c0
to
00ee8b1
Compare
e2e/rbd.go
Outdated
for _, image := range imageList { | ||
imgInfo, err := getImageInfo(f, image, pool) | ||
if err != nil { | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to log error and continue as this is for debuggin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
e2e/rbd.go
Outdated
} | ||
imgStatus, err := getImageStatus(f, image, pool) | ||
if err != nil { | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to log error and continue as this is for debugging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
90b18cb
to
ffae514
Compare
/test ci/centos/mini-e2e-helm/k8s-1.30 |
ffae514
to
0e15981
Compare
/test ci/centos/mini-e2e-helm/k8s-1.30 |
0e15981
to
da3d8b6
Compare
Output from failure job: https://url.corp.redhat.com/e9aa53d
@Madhu-1 @nixpanic Is this the desired output? When I ran it manually I got the following results:
|
How about if we just return the stdout and log it, i think it will be useful for debugging instead of unmarshal output. i tlooks like the above logs is not much helpful, @nixpanic WDYT? |
Logging stdout is sufficient for me too. It may also be easier to find the right timestamp/location of errors when the logs contain pretty unique words like |
da3d8b6
to
e19572a
Compare
I return |
It would be good if we just return the stdout and log the stdout in the logs that helps for debugging, actual struct is missing few details. |
And what should be done with this function? It relies on the imgInfo struct. Lines 1132 to 1133 in cea8bf8
|
we could have a base function that returns the string and getImageStatus can use the base function and later do unmarshal and from the validation we will call the base function that returns the string data and print it. |
0c69545
to
4fc6960
Compare
4fc6960
to
c9c7e80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OdedViner changes LGTM , please squash the commits into 1
c9c7e80
to
383ff1d
Compare
/test ci/centos/mini-e2e/k8s-1.31 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
@Mergifyio rebase |
Signed-off-by: Oded Viner <[email protected]>
✅ Branch has been successfully rebased |
80a7d36
to
2e85719
Compare
@Mergifyio queue |
🛑 The pull request has been removed from the queue
|
/test ci/centos/k8s-e2e-external-storage/1.30 |
/test ci/centos/mini-e2e-helm/k8s-1.30 |
/test ci/centos/k8s-e2e-external-storage/1.29 |
/test ci/centos/mini-e2e/k8s-1.30 |
/test ci/centos/mini-e2e-helm/k8s-1.29 |
/test ci/centos/k8s-e2e-external-storage/1.31 |
/test ci/centos/mini-e2e/k8s-1.29 |
/test ci/centos/mini-e2e-helm/k8s-1.31 |
/test ci/centos/mini-e2e/k8s-1.31 |
/test ci/centos/upgrade-tests-cephfs |
/test ci/centos/upgrade-tests-rbd |
/retest ci/centos/mini-e2e/k8s-1.29 |
@OdedViner, a CI job failed and resulted in the following output:
It looks like |
|
I see this happening as well, This happens when there is a delay in volume deletion where the volume is present in first get and gets deleted when we try to query in details (due to async deletion) |
@Madhu-1 is there a method to fix it? or only add sleep before first |
This is very corner case there i no 100 % solution to fix it but we need more than sleep. we need to take tasks in to consideration when we do get call for the count. |
Describe what this PR does
Added rbd info and rbd status on Failure message
Is there anything that requires special attention
Do you have any questions?
Is the change backward compatible?
Are there concerns around backward compatibility?
Provide any external context for the change, if any.
For example:
Related issues
Mention any github issues relevant to this PR. Adding below line
will help to auto close the issue once the PR is merged.
Fixes: #issue_number
#4547
Future concerns
List items that are not part of the PR and do not impact it's
functionality, but are work items that can be taken up subsequently.
Checklist:
guidelines in the developer
guide.
Request
notes
updated with breaking and/or notable changes for the next major release.
Show available bot commands
These commands are normally not required, but in case of issues, leave any of
the following bot commands in an otherwise empty comment in this PR:
/retest ci/centos/<job-name>
: retest the<job-name>
after unrelatedfailure (please report the failure too!)