-
Notifications
You must be signed in to change notification settings - Fork 483
Description
Since migrating our build infrastructure to CNCF-managed runners (#2277), the following test targets on the ASAN build have been consistently hitting their timeouts:
//src/vizier/services/agent/shared/manager:heartbeat_test//src/vizier/services/agent/shared/manager:registration_test//src/carnot/builtins:collections_test
BuildBuddy history shows both tests running up against the original 2-minute timeout on CNCF infrastructure, even though they complete quickly on my dev machine. As a result, the timeout was temporarily increased in #2294 to unblock other work.
While the avg test time isn't near the 2m threshold, I've seen timeouts to the BEP API so my anecdotal evidence is that these timeouts are happening more than BuildBuddy is reporting.
//src/vizier/services/agent/shared/manager:heartbeat_test TIMEOUT in 120.5s
/github/home/.cache/bazel/_bazel_root/56ec069a32c4abebc78228236a835895/execroot/px/bazel-out/k8-dbg/testlogs/src/vizier/services/agent/shared/manager/heartbeat_test/test.log
//src/vizier/services/agent/shared/manager:registration_test TIMEOUT in 120.5s
/github/home/.cache/bazel/_bazel_root/56ec069a32c4abebc78228236a835895/execroot/px/bazel-out/k8-dbg/testlogs/src/vizier/services/agent/shared/manager/registration_test/test.log
Executed 296 out of 296 tests: 294 tests pass and 2 fail locally.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.
INFO: Build completed, 2 tests FAILED, 1530 total actions
INFO: Build completed, 2 tests FAILED, 1530 total actions
ERROR: The Build Event Protocol upload timed out. com.google.common.util.concurrent.TimeoutFuture$TimeoutFutureException: Timed out: NonCancellationPropagatingFuture@6ce6bba6[status=PENDING, info=[delegate=[SettableFuture@29e4285e[status=PENDING]]]]
Bazel returned code 38, ignoring...
This issue tracks investigating the performance regression, implementation of the underlying fix, and reverting the temporary timeout increase once the issue is resolved.
App information (please complete the following information):