Skip to content

Commit

Permalink
Increase hangalert time to 10 hours on 1-CPU systems to avoid TEST-E-…
Browse files Browse the repository at this point in the history
…HANG failures

The msreplic_H_1/max_connections subtest took > 6 hours to run in an in-house test run
on an ARMV6L box. This was while a system backup was already pounding on the disk.
And the test framework allowed for only a 5-hour max timeout. So it issued a
TEST-E-HANG message in the hangalert.out file which eventually caused the non-zero
diff file. Other than this TEST-E-HANG message, there was no diff meaning the test
passed otherwise.

This commit tries to prevent such failures by allowing for a bigger hangalert max
timeout (10 hours instead of 5 hours) for 1-CPU systems.
  • Loading branch information
nars1 committed May 14, 2018
1 parent eea3e7c commit 34fd6b9
Showing 1 changed file with 8 additions and 4 deletions.
12 changes: 8 additions & 4 deletions com/gtm_test_watchdog.csh
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,15 @@ set shorthost = $HOST:r:r:r:r
set format="%Y.%m.%d.%H.%M.%S.%Z"
set timestart = `date +"$format"`
if (! $?gtm_test_hang_alert_sec) then
if ("HOST_LINUX_ARMVXL" != $gtm_test_os_machtype) then
set gtm_test_hang_alert_sec = 9000 # A subtest running for 2.5 hours is suspected to be hung
if ($gtm_test_singlecpu || ) then
# 1-CPU armv7l/armv6l/x86_64 box. Set a high hang alert for 1-CPU systems (slow boxes)
set gtm_test_hang_alert_sec = 36000 # A subtest running for 10 hours on a 1-CPU system is suspected to be hung
else if ("HOST_LINUX_ARMVXL" != $gtm_test_os_machtype) then
# Multi-CPU x86_64 box
set gtm_test_hang_alert_sec = 9000 # A subtest running for 2.5 hours on x86_64 boxes is suspected to be hung
else
# ARM boxes are usually slower so give them a bigger timeout for the hang alert
set gtm_test_hang_alert_sec = 18000 # A subtest running for 5.0 hours is suspected to be hung
# Multi-CPU armv6l/armv7l boxes are not as IO capable so give them a slightly bigger timeout for the hang alert
set gtm_test_hang_alert_sec = 18000 # A subtest running for 5.0 hours on a multi-CPU ARM is suspected to be hung
endif
endif
set mailinterval = 1800 # Send mail to the user ever 30 minutes
Expand Down

0 comments on commit 34fd6b9

Please sign in to comment.