-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge release/2.6 into google/2.6 #15725
Conversation
mjmac
commented
Jan 13, 2025
- DAOS-16784 build: Tag 2.6.2 tb2 (DAOS-16784 build: Tag 2.6.2 tb2 #15461)
- DAOS-9355 doc: DAOS 2.6.2 release notes (DAOS-9355 doc: DAOS 2.6.2 release notes #15560)
- SRE-2525 ci: Fix Trivy scan upload to the Security tab (SRE-2525 ci: Fix Trivy scan upload to the Security tab #15394)
- DAOS-16350 test: decrease pool size for ior_per_rank (DAOS-16350 test: decrease pool size for ior_per_rank #15183) (DAOS-16350 test: decrease pool size for ior_per_rank (#15183) #15403)
- DAOS-16265 test: Split erasurecode/multiple_failure.py (DAOS-16265 test: Split erasurecode/multiple_failure.py #15355) (DAOS-16265 test: Split erasurecode/multiple_failure.py (#15355) #15369)
- DAOS-16096 test: Add retry loop for comparing free pool space (DAOS-16096 test: Add retry loop for comparing free pool space #15289) (DAOS-16096 test: Add retry loop for comparing free pool space (#15289) #15411)
- DAOS-16825 test: Support register cleanup for all Test classes (DAOS-16825 test: Support register cleanup for all Test classes #15530) (DAOS-16825 test: Support register cleanup for all Test classes (#15530) #15540)
- DAOS-16709 test: Handle decoding empty json output (DAOS-16709 test: Handle decoding empty json output #15397) (DAOS-16709 test: Handle decoding empty json output (#15397) #15410)
- DAOS-12859 test: use pool and container labels (pass 3) (DAOS-12859 test: use pool and container labels (pass 3) #13210) (DAOS-12859 test: use pool and container labels (pass 3) (#13210) #15253)
- DAOS-16670 test: container/multiple_delete.py - Increase SCM leftover… (DAOS-16670 test: container/multiple_delete.py - Increase SCM leftover… #15420) (DAOS-16670 test: container/multiple_delete.py - Increase SCM leftover… #15457)
- DAOS-16702 rebuild: restart rebuild for a massive failure case (DAOS-16702 rebuild: restart rebuild for a massive failure case #15406)
- DAOS-16100 test: Fix stopping daos_test during timeout (DAOS-16100 test: Fix stopping daos_test during timeout #15275) (DAOS-16100 test: Fix stopping daos_test during timeout (#15275) #15603)
- DAOS-16167 test: update soak test to use internal job scheduler (DAOS-16167 test: update soak test to use internal job scheduler #14775) (DAOS-16167 test: update soak test to use internal job scheduler (#14775) #15595)
- DAOS-16865 cq: update flake8 to 7.1.1 (DAOS-16865 cq: update flake8 to 7.1.1 #15575) (DAOS-16865 cq: update flake8 to 7.1.1 (#15575) #15597)
- DAOS-16276 doc: Address engine unavailability (DAOS-16276 doc: Address engine unavailability #15456) (DAOS-16276 doc: Address engine unavailability (#15456) #15496)
- DAOS-16572 object: refine sc_ec_agg_active flag setting (DAOS-16572 object: refine sc_ec_agg_active flag setting #15352)
- DAOS-16812 cart: read after free cid 2556737 (DAOS-16812 cart: read after free cid 2556737 #15517) (DAOS-16812 cart: read after free cid 2556737 (#15517) #15600)
- DAOS-16875 cq: fix flake8 xargs usage (DAOS-16875 cq: fix flake8 xargs usage #15608) (DAOS-16875 cq: fix flake8 xargs usage (#15608) #15614)
- SRE-2171 ci: Big refactor of GHA workflows (SRE-2171 ci: Big refactor of GHA workflows #15556) (SRE-2171 ci: Big refactor of GHA workflows (#15556) #15588)
- DAOS-16621 build: Fix Go versions in rpm/deb packaging (DAOS-16621 build: Fix Go versions in rpm/deb packaging #15174) (DAOS-16621 build: Fix Go versions in rpm/deb packaging (#15174) #15255)
- DAOS-16170 cart: do not release completed RPC reference repeatedly - b26 (DAOS-16170 cart: do not release completed RPC reference repeatedly - b26 #15477)
- DAOS-623 client: Address java security scan (DAOS-623 client: Address java security scan #15542) (DAOS-16880 client: Address java security scan (#15542) #15615)
- DAOS-16826 build: Fix compiling issues in gcc 14 (DAOS-16826 build: Fix compiling issues in gcc 14 #15531) (DAOS-16826 build: Fix compiling issues in gcc 14 (#15531) #15607)
- DAOS-16833 cq: update GHA ubuntu (DAOS-16833 cq: update GHA ubuntu #15538) (DAOS-16833 cq: update GHA ubuntu (#15538) #15563)
- DAOS-16662 test: update some tests to use unique dfuse mount (DAOS-16662 test: update some tests to use unique dfuse mount #15242) (DAOS-16662 test: update some tests to use unique dfuse mount (#15242) #15598)
- DAOS-16873 cq: remove workflows/version-checks.yml (DAOS-16873 cq: remove workflows/version-checks.yml #15601) (DAOS-16873 cq: remove workflows/version-checks.yml (#15601) #15627)
- DAOS-16787 utils: Suppress NLT valgrind false positives (DAOS-16787 utils: Suppress NLT valgrind false positives #15478) (DAOS-16787 utils: Suppress NLT valgrind false positives (#15478) #15637)
- DAOS-13292 build: Don't need UCX libraries … (DAOS-13292 build: Don't need UCX libraries … #15016) (DAOS-13292 build: Don't need UCX libraries … (#15016) #15623)
- DAOS-15964 test: verify daos_server_helper on server (DAOS-15964 test: verify daos_server_helper on server #15503) (DAOS-15964 test: verify daos_server_helper on server (#15503) #15599)
- DAOS-16645 cart: Bump file descriptor limit (DAOS-16645 cart: Bump file descriptor limit #15224) (DAOS-16645 cart: Bump file descriptor limit (#15224) #15366)
- DAOS-16889 build: fix finding protobuf (DAOS-16889 build: fix finding protobuf #15625) (DAOS-16889 build: fix finding protobuf (#15625) #15641)
- DAOS-16263 cq: merge yamllint and clang-format into linting (DAOS-16263 cq: merge yamllint and clang-format into linting #14803) (DAOS-16263 cq: merge yamllint and clang-format into linting (#14803) #15642)
- DAOS-16802 tests: properly set fault_injection for CR test - b26 (DAOS-16802 tests: properly set fault_injection for CR test - b26 #15510)
- DAOS-16500 build: Move to Leap 15.6 (DAOS-16500 build: Move to Leap 15.6 #15561) (DAOS-16500 build: Move to Leap 15.6 (#15561) #15688)
- DAOS-15677 cq: add copyright GHA and remove required-githooks (DAOS-15677 cq: add copyright GHA #15552) (DAOS-15677 cq: add copyright GHA and remove required-githooks (#15552) #15655)
- DAOS-16917 cq: update copyright for HPE (DAOS-16917 cq: update copyright for HPE #15678) (DAOS-16917 cq: update copyright for HPE (#15678) #15686)
- DAOS-16876 vos: set cont parameter when deregister modification from DTX - b26 (DAOS-16876 vos: set cont parameter when deregister modification from DTX - b26 #15658)
- DAOS-16477 mgmt: return suspect engines for pool healthy query (DAOS-16477 mgmt: return suspect engines for pool healthy query #15458) (DAOS-16477 mgmt: return suspect engines for pool healthy query (#15458) #15512)
- DAOS-16732 client: version check in glibc libraries names (DAOS-16732 client: version check in glibc libraries names #15549)
- DAOS-16830 pool: Allow setting DAOS_POOL_RF to 0 (DAOS-16830 pool: Allow setting DAOS_POOL_RF to 0 #15564) (DAOS-16830 pool: Allow setting DAOS_POOL_RF to 0 (#15564) #15692)
- DAOS-16838 control: Fix dmg storage query usage with emulated NVMe (DAOS-16838 control: Fix dmg storage query usage with emulated NVMe #15545) (DAOS-16838 control: Fix dmg storage query usage with emulated NVMe (#… #15702)
- DAOS-16881 control: Fix daos_server scm prep for single missing ns (DAOS-16881 control: Fix daos_server scm prep for single missing ns #15632) (DAOS-16881 control: Fix daos_server scm prep for single missing ns (#… #15701)
- DAOS-16639 object: fix assertion (DAOS-16639 object: fix assertion #15329) (DAOS-16639 object: fix assertion (#15329) #15681)
- DAOS-16386 utils: Add DDB Feature and RM_POOL Support (DAOS-16386 utils: Add DDB Feature and RM_POOL Support #15062) (DAOS-16386 utils: Add DDB Feature and RM_POOL Support (#15062) #15474)
- DAOS-16903 build: Update golang.org/x/net to 0.33.0 (DAOS-16903 build: Update golang.org/x/net to 0.33.0 #15650) (DAOS-16903 build: Update golang.org/x/net to 0.33.0 (#15650) #15704)
- DAOS-16872 cq: Bump GHA versions (DAOS-16872 cq: Bump GHA versions #15693)
Tag second test build for 2.6.2. faults-enabled: false Signed-off-by: Phil Henderson <[email protected]>
2.6.2 release notes document Signed-off-by: Phil Henderson <[email protected]>
- Enable write access to the Security section of Github project - Use GHA cache to avoid Trivy scan failures due to overuse of CVEs database results in database download failure Upgrade `trivy-action` to version 0.28.0 where the caching mechanism is enabled by default. Enable debug option in Trivy to be prepared for detail scan failures analysis Signed-off-by: Tomasz Gromadzki <[email protected]>
Test deployment/ior_per_rank fails with 'No space' on some CI clusters. Reduce the requested pool size to accommodate nodes with smaller storage capacity. Signed-off-by: James A. Nunez <[email protected]>
Split the erasurecode/multiple_failure.py into two separate tests to reduce the possibility of a large number of ERR messages in the server log file from preventing other test variants from failing dure to out of space errors. Signed-off-by: Phil Henderson <[email protected]>
#15411) Loop retrying the check for the pool free space after destroying half of the containers. If the check doesn't pass within 60 seconds, then fail the test. Signed-off-by: Phil Henderson <[email protected]>
… (#15540) Support calling register cleanup methods for tests based upon the Test and TestWithoutServers classes. Also remove stopping agents as part of calling TestWithServers.stop_servers() since DAOS-6873 is no longer an issue. Signed-off-by: Phil Henderson <[email protected]>
Do not raise an exception if parsing empty json output. Signed-off-by: Phil Henderson <[email protected]>
) Signed-off-by: Dalton Bohning <[email protected]>
#15420) (#15457) The object placement algorithm was changed by DAOS-16445. As a result, data are written to targets more uniformly while the amount of leftover data after container destroy/garbage collection in each target remains the same. i.e., Data are written to more targets while the cleanup method in each target hasn't been improved, which results in higher aggregate leftover data. To handle larger amount of leftover data in SCM, increase the threshold to 1.5MB. Signed-off-by: Makito Kano <[email protected]>
In special massive failure case - 1. some engines down and triggered rebuild. 2. one engine participated the rebuild, not finished yet, it down again, the #failures exceeds pool RF and will not change pool map. 3. That engine restarted by administrator. In that case should recover the rebuild task on the engine, to simplify it now just abort and retry the global rebuild task. No such issue by the typical recover approach that restart the whole system including the PS leader. another backport commit - 947c76d DAOS-16175 container: fix a case for cont_iv_hdl_fetch (#15395) Signed-off-by: Xuezhao Liu <[email protected]>
Fix stopping timed out processes run by a JobManager class by only searching for and killing the command executable being run by clush, orterun, mpirun, etc. Add a new harness/cmocka.py test to verify the stopping of the processes with a test timeout. Signed-off-by: Phil Henderson <[email protected]>
…) (#15595) Update soak to support using an internal job scheduler. Signed-off-by: Maureen Jean <[email protected]> Co-authored-by: mjean308 <[email protected]>
Update flake8 to 7.1.1. Adjust githook to work with newer flake8. Also tested to be backwards compatible with flake8<6 Signed-off-by: Dalton Bohning <[email protected]>
Add a section on handling unavailable engines. Signed-off-by: Li Wei <[email protected]>
clear the sc_ec_agg_active flag more proactively. Signed-off-by: Xuezhao Liu <[email protected]>
- If failed to reply, skip rpc early buffer release Signed-off-by: Alexander A Oganezov <[email protected]>
Use -r so if no scons or non-scons files are grep'ed, flake8 does not run. Signed-off-by: Dalton Bohning <[email protected]>
Add the use of reusable workflows and actions to reduce the amount of duplicated code in this repository as well as dependency repositories. Run Bullseye workflow on schedule (#15574) Saturdays at midnight, UTC. Accept and propagate a run-gha variable (#15576) For the case where daos is being used as a downstream test. Test inputs context before trying to use it. Fixes: SRE-2570 DAOS-16262 Signed-off-by: Brian J. Murrell <[email protected]>
- Set Go minimum version to 1.21 in rpm and debian packaging spec files. - Update scons Go version check to use version in go.mod. - Add a reminder in go.mod file so we remember the packaging files when bumping the minimum Go version in the future. - Update Ubuntu 22.04 Dockerfile to get an appropriate version of Go. Signed-off-by: Kris Jacque <[email protected]>
…b26 (#15477) For collective RPC, when handle failure cases during crt_req_send(), its reference may has been released via crt_rpc_complete_and_unlock() that is triggered by crt_corpc_complete(). Under such case, we should check whether the RPC is completed or not before calling RPC_DECREF() to avoid releasing the RPC reference repeatedly. The patch also initializes some local variable for CHK RPC to avoid accessing invalid DRAM when handle failed collective CHK RPC. Some enhancement for CR test logic. Signed-off-by: Fan Yong <[email protected]>
Update netty-buffer to 4.1.115 Signed-off-by: Jeff Olivier <[email protected]> Co-authored-by: Jeff Olivier <[email protected]>
* Fix compiling issues in gcc 14 Signed-off-by: Jinshan Xiong <[email protected]> Co-authored-by: Dalton Bohning <[email protected]> Co-authored-by: Jeff Olivier <[email protected]>
Update mantic (EOL) to oracular. Update 22.04 LTS to 24.04 LTS. Signed-off-by: Dalton Bohning <[email protected]>
…#15598) Update some tests to use unique dfuse mount directory by letting the framework generate one. Remove mount_dir from run_ior_multiple_variants since it is no longer needed and this level of fine control should be handled per test ideally. Signed-off-by: Dalton Bohning <[email protected]>
Remove workflows/version-checks.yml now that dependabot checks this. Signed-off-by: Dalton Bohning <[email protected]>
) * Add a suppression for Go runtime function racefuncenter. * Add suppression for rt0_go CGo malloc Signed-off-by: Kris Jacque <[email protected]>
At build time any more, as of e01970d. Signed-off-by: Brian J. Murrell <[email protected]>
verify daos_server_helper on server instead of the runner/client. misc cleanup Signed-off-by: Dalton Bohning <[email protected]>
With tcp provider, using many sockets can cause significant file descriptor usage. Bump the soft limit, if possible and warn if it appears insufficient. Valgrind sets hard limit to soft limit, so work around that in NLT. Signed-off-by: Jeff Olivier <[email protected]>
Add a requirement to protobufc for building daos control binaries. Signed-off-by: Dalton Bohning <[email protected]>
…15642) merge yamllint and clang-format into linting workflow so all lint checks are grouped together. Make yaml-lint required but clang-format optional until stable. Signed-off-by: Dalton Bohning <[email protected]>
) Signed-off-by: Fan Yong <[email protected]>
Test with Leap 15.6 instead of Leap 15.5. To support building Leap 15.5 DAOS RPMs and testing them on Leap 15.6 the Functional on Leap 15.6 stage needs to explicitly specify the Leap 15.6 image for node provisioning. Combination of #15561, #15684 Signed-off-by: Phil Henderson <[email protected]> Signed-off-by: Dalton Bohning <[email protected]>
#15655) Add GHA to check for copyright update. Move core logic from update-copyright githook into check_update_copyright.sh so the logic is shared between the githook and GHA. Remove required-githooks watermark since it did not work in all scenarios and the GHA checks are more secure than client-side githooks. Combination of #15552, #15596, #15636, #15639 Signed-off-by: Dalton Bohning <[email protected]>
Add a new HPE copyright line for modified files. Update HPE copyright instead of Intel. Signed-off-by: Dalton Bohning <[email protected]>
…DTX - b26 (#15658) As long as the container is not destroyed, then anytime want to deregister a modification from related active DTX entry (that is usually triggered for vos discard or aggregation), the caller needs to offer container handle to vos_dtx_deregister_record() for locating the DTX entry in active DTX table. Otherwise, if the caller offers empty container handle, then it will cause dangling reference in related DTX entry as to data corruption in subsequent DTX commit or abort. On the other hand, if the container will be destroyed, then all related DTX entries for such container will be useless any more. We need to destroy DTX table firstly to avoid generating dangling DTX references during destroying the container. Signed-off-by: Fan Yong <[email protected]>
… (#15512) After significant failures, the system may leave behind some suspect engines that were marked as DEAD by the SWIM protocol, but were not excluded from the system to prevent data loss. An administrator can bring these ranks back online by restarting them. This PR aims to provide an administrative interface for querying suspect engines following a massive failure. These suspect engines can be retrieved using the daos/dmg --health-only command. An example of output of dmg pool query --health-only: Pool 6f450a68-8c7d-4da9-8900-02691650f6a2, ntarget=8, disabled=2, leader=3, version=4, state=Degraded Pool health info: - Disabled ranks: 1 - Suspect ranks: 2 - Rebuild busy, 0 objs, 0 recs Signed-off-by: Wang Shilong <[email protected]> Co-authored-by: Phil Henderson <[email protected]>
Also use D_ASPRINTF instead of asprintf Signed-off-by: Lei Huang <[email protected]>
Since 0 is the minimum RF, we should allow setting it to 0. We can revisit naming later. Signed-off-by: Jeff Olivier <[email protected]>
…15545) (#15702) Fix a regression which prevents dmg storage query usage from enumerating devices backed with emulated (AIO file or kdev) NVMe. Signed-off-by: Tom Nabarro <[email protected]>
…15632) (#15701) When single socket is missing a pmem namespace on dual-socket host a confusing no-space error can be returned from daos_server scm prepare. The previously required workaround is to specify --socket. Fix this issue by adding NumaNode in fall-back case where ndctl region idset overflow requires matching of numa/socket via ipmctl region info instead. Also add unit test cases to cover the situation. Signed-off-by: Tom Nabarro <[email protected]>
Invalid hole extent might be left by process_hole_ult(), so let's skip it. Signed-off-by: Di Wang <[email protected]>
This PR enhances the DDB functionality for CR purposes with the following updates: 1. Pool Behavior Control: Administrators can now control certain vos pool behaviors, such as skipping vos pool loading or setting a vos pool to immutable mode. 2. Manual Pool Shard Removal: A new command ddb rm_pool <vos_pool> has been introduced, allowing administrators to manually remove pool shards. 3. SPDK Environment Initialization Bug Fix: Fixed an issue where spdk_env_init() would fail during reinitialization. These updates aim to improve system flexibility and stability, providing administrators with more robust management capabilities. Signed-off-by: Wang Shilong <[email protected]>
- Update go.mod. - Update vendored dependencies. - Exclude Go vendored dependencies from codespell githook. - Fix codespell skip handling. Signed-off-by: Kris Jacque <[email protected]> Signed-off-by: Dalton Bohning <[email protected]> Co-authored-by: Dalton Bohning <[email protected]>
- Bump github/codeql-action from 3.24.9 to 3.27.7 (#15589) - Bump github/codeql-action from 3.27.7 to 3.27.9 (#15618) - Bump github/codeql-action from 3.27.9 to 3.28.0 (#15662) - Bump thollander/actions-comment-pull-request from 2 to 3 (#15590) - Bump aquasecurity/trivy-action from 0.28.0 to 0.29.0 (#15591) - Bump codespell-project/actions-codespell to latest (#15592) - Bump EnricoMi/publish-unit-test-result-action from 1.17 to 2.7 (#15593) - Bump EnricoMi/publish-unit-test-result-action from 2.7.0 to 2.18.0 (#15660) - Bump isort/isort-action from 1.1.0 to 1.1.1 (#15594) - Bump phoenix-actions/test-reporting from 10 to 15 (#15617) - Bump actions/setup-python from 5.1.0 to 5.3.0 (#15661) Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: Dalton Bohning <[email protected]>
Errors are component not formatted correctly,Ticket number prefix incorrect,PR title is malformatted. See https://daosio.atlassian.net/wiki/spaces/DC/pages/11133911069/Commit+Comments,Unable to load ticket data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scorecard found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15725/1/execution/node/1220/log |