Commit 3c9c4bd
authored
Use
Summary: Use `px/agent_status_diagnostics` script within px cli to
detect missing kernel headers
This PR leverages the script added in #2064 to detect missing kernel
headers during cli deploys and `px collect-logs` commands. This solves
2/3 of the use cases I was hoping to identify for #2051 (the last being
helm installs).
A recent example of this problem is
#1986, where a Go TLS tracing
bug went undiagnosed for months (August to December). Amazon Linux
2023's headers are different enough that it breaks Go TLS tracing when
pixie's pre-packaged headers are used. The tooling in this PR would have
provided a few opportunities for this to be caught.
Relevant Issues: #2051
Type of change: /kind feature
Test Plan: Verified the following scenarios
<details><summary>Test cases</summary>
- [x] `px collect-logs` works against a cloud that doesn't have a
`px/agent_status_diagnostics` script
```
$ bazel run -c opt --stamp src/pixie_cli:px -- collect-logs
WARN[0006] healthcheck script detected the following warnings: error="Unable to detect if the cluster's nodes have the distro kernel headers installed (vizier too old to perform this check). Please ensure that the kernel headers are installed on all nodes."
Logs written to pixie_logs_20241223165214.zip
# zip file contains px/agent_status output
$ cat px_agent_diagnostics.txt
{"_tableName_":"output","agent_id":"07fb4d26-3b53-4ba7-9bb7-f2cb10a1e63d","asid":79,"hostname":"gke-dev-ddelnano1-default-pool-b099382d-30mu","ip_address":"","agent_state":"AGENT_STATE_HEALTHY","create_time":"2024-12-18T12:43:44.41952403Z","last_heartbeat_ns":4303060450,"kernel_headers_installed":true}
```
- [x] `px collect-logs` works against a cloud that does have a
`px/agent_status_diagnostics` script
```
$ bazel run src/pixie_cli:px -- collect-logs
INFO: Analyzed target //src/pixie_cli:px (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //src/pixie_cli:px up-to-date:
bazel-bin/src/pixie_cli/px_/px
INFO: Elapsed time: 4.240s, Critical Path: 3.89s
INFO: 3 processes: 1 internal, 2 linux-sandbox.
INFO: Build completed successfully, 3 total actions
INFO: Running command line: bazel-bin/src/pixie_cli/px_/px collect-logs
Pixie CLI
*******************************
* ENV VARS
* PX_CLOUD_ADDR=testing.getcosmic.ai:443
*******************************
Logs written to pixie_logs_20241218164734.zip
$ cat px_agent_diagnostics.txt
{"_tableName_":"output","headers_installed_percent":1}
```
- [x] `px collect-logs` identifies when kernel headers are missing when
`px/agent_status_diagnostics` present
```
$ Logs written to pixie_logs_20241223165214.zip
$ bazel run -c opt --stamp src/pixie_cli:px -- --bundle https://csmc-io.github.io/pxl-scripts/pxl_scripts/bundle.json collect-logs
[ ... ]
WARN[0012] healthcheck script detected the following warnings: error="Detected missing kernel headers on your cluster's nodes. This may cause issues with the Pixie agent. Please install kernel headers on all nodes."
$ cat px_agent_diagnostics.txt
{"_tableName_":"output","headers_installed_percent":0.5}
```
- [x] Artificially forcing context deadline (timeout) results in an
error
```
$ git diff
diff --git a/src/pixie_cli/pkg/vizier/script.go b/src/pixie_cli/pkg/vizier/script.go
index 7d3b7e0..c957b8943 100644
--- a/src/pixie_cli/pkg/vizier/script.go
+++ b/src/pixie_cli/pkg/vizier/script.go
@@ -317,7 +317,7 @@ func RunSimpleHealthCheckScript(br *script.BundleManager, cloudAddr string, clus
execScript = br.MustGetScript(script.AgentStatusScript)
}
- ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+ ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second)
$ bazel run src/pixie_cli:px -- collect-logs
WARN[0012]src/pixie_cli/pkg/vizier/logs.go:135 px.dev/pixie/src/pixie_cli/pkg/vizier.(*LogCollector).CollectPixieLogs() failed to run health check script error="context deadline exceeded"
Logs written to pixie_logs_20241218165033.zip
```
- [x] `px collect-logs` prompts auth flow when credentials don't match
current cloud
```
$ PX_CLOUD_ADDR=new-cloud bazel run src/pixie_cli:px -- collect-logs
*******************************
* ENV VARS
* PX_CLOUD_ADDR=new-cloud
*******************************
Failed to authenticate. Please retry `px auth login`.
```
- [x] `px deploy` on pre v0.14.14 (older) vizier with existing bundle
warns that kernel headers should be installed
```
# Additional flags provided to speed up vizier bootstrapping
$ bazel run -c opt --stamp src/pixie_cli:px -- deploy --pem_flags='PL_STIRLING_SOURCES=kNone' --deploy_key='<deploy key>' --deploy_olm=false --olm_namespace=olm --bundle=https://csmc-io.github.io/pxl-scripts/pxl_scripts/bundle.json
```
- [x] `px deploy` on pre v0.14.14 (older) vizier with latest bundle
warns that kernel headers should be installed
```
# Additional flags provided to speed up vizier bootstrapping
$ bazel run -c opt --stamp src/pixie_cli:px -- deploy --pem_flags='PL_STIRLING_SOURCES=kNone' --deploy_key='<deploy key>' --deploy_olm=false --olm_namespace=olm --bundle=https://csmc-io.github.io/pxl-scripts/pxl_scripts/bundle.json
[ ... ]
Waiting for Pixie to pass healthcheck
✔ Wait for PEMs/Kelvin
✔ Wait for PEMs/Kelvin
✕ Wait for healthcheck ERR: Unable to detect if the cluster's nodes have the distro kernel headers installed (vizier too old to perform this check). Please ensure that the kernel headers are installed on all nodes.
Pixie healthcheck detected the following warnings: error=Unable to detect if the cluster's nodes have the distro kernel headers installed (vizier too old to perform this check). Please ensure that the kernel headers are installed on all nodes.
[ ...]
```
- [x] `px deploy` on v0.14.14 vizier with latest bundle warns
appropriate when kernel headers are missing
```
$ bazel run -c opt --stamp src/pixie_cli:px -- deploy --pem_flags='PL_STIRLING_SOURCES=kNone' --deploy_key=<deploy key> --bundle=https://csmc-io.github.io/pxl-scripts/pxl_scripts/bundle.json -v 0.14.14-pre-r1.0
[ ... ]
Waiting for Pixie to pass healthcheck
✔ Wait for PEMs/Kelvin
✕ Wait for healthcheck ERR: Detected missing kernel headers on your cluster's nodes. This may cause issues with the Pixie agent. Please install kernel headers on all nodes.
Pixie healthcheck detected the following warnings: error=Detected missing kernel headers on your cluster's nodes. This may cause issues with the Pixie agent. Please install kernel headers on all nodes.
```
</details>
Changelog Message: Enhanced the `px` cli's `deploy` and `collect-logs`
commands to surface when kernel headers aren't installed. This is a
common source of bugs that can only be addressed by installing your
distro's kernel headers.
Signed-off-by: Dom Del Nano <ddelnano@gmail.com>px/agent_status_diagnostics script within px cli to detect missing kernel headers (#2065)1 parent a1a1d0e commit 3c9c4bd
File tree
8 files changed
+329
-146
lines changed- src
- pixie_cli/pkg
- cmd
- vizier
- utils
- script
8 files changed
+329
-146
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
| 30 | + | |
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
45 | | - | |
| 45 | + | |
46 | 46 | | |
47 | 47 | | |
48 | 48 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
26 | 25 | | |
27 | 26 | | |
28 | 27 | | |
| |||
72 | 71 | | |
73 | 72 | | |
74 | 73 | | |
| 74 | + | |
75 | 75 | | |
76 | 76 | | |
77 | 77 | | |
| |||
106 | 106 | | |
107 | 107 | | |
108 | 108 | | |
| 109 | + | |
109 | 110 | | |
110 | 111 | | |
111 | 112 | | |
| |||
604 | 605 | | |
605 | 606 | | |
606 | 607 | | |
607 | | - | |
608 | | - | |
609 | | - | |
610 | | - | |
611 | | - | |
612 | | - | |
613 | | - | |
614 | | - | |
615 | | - | |
616 | | - | |
617 | | - | |
618 | | - | |
619 | | - | |
620 | | - | |
621 | | - | |
622 | | - | |
623 | | - | |
624 | | - | |
625 | | - | |
626 | | - | |
627 | | - | |
628 | | - | |
629 | | - | |
630 | | - | |
631 | | - | |
632 | | - | |
633 | | - | |
634 | | - | |
635 | | - | |
636 | | - | |
637 | | - | |
638 | | - | |
639 | | - | |
640 | | - | |
641 | | - | |
642 | | - | |
643 | | - | |
644 | | - | |
645 | | - | |
646 | | - | |
647 | | - | |
648 | | - | |
649 | | - | |
650 | | - | |
651 | | - | |
652 | | - | |
653 | | - | |
654 | | - | |
655 | | - | |
656 | | - | |
657 | | - | |
658 | | - | |
659 | | - | |
660 | | - | |
661 | | - | |
662 | 608 | | |
663 | 609 | | |
664 | 610 | | |
| |||
668 | 614 | | |
669 | 615 | | |
670 | 616 | | |
671 | | - | |
| 617 | + | |
672 | 618 | | |
673 | 619 | | |
674 | 620 | | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
675 | 626 | | |
676 | 627 | | |
677 | 628 | | |
| |||
691 | 642 | | |
692 | 643 | | |
693 | 644 | | |
694 | | - | |
695 | | - | |
696 | | - | |
697 | | - | |
698 | | - | |
699 | | - | |
700 | | - | |
| 645 | + | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
701 | 656 | | |
702 | 657 | | |
703 | 658 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
203 | 203 | | |
204 | 204 | | |
205 | 205 | | |
206 | | - | |
207 | 206 | | |
208 | 207 | | |
209 | 208 | | |
| |||
245 | 244 | | |
246 | 245 | | |
247 | 246 | | |
248 | | - | |
| 247 | + | |
249 | 248 | | |
250 | 249 | | |
251 | 250 | | |
| |||
254 | 253 | | |
255 | 254 | | |
256 | 255 | | |
257 | | - | |
| 256 | + | |
258 | 257 | | |
259 | 258 | | |
260 | 259 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| 28 | + | |
28 | 29 | | |
29 | 30 | | |
30 | 31 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
0 commit comments