Skip to content

Conversation

@philrhurst
Copy link
Contributor

These are the first round of kuttl tests migrated to chainsaw. The structure is a single test for each kubectl-pgo command

  • backup
  • create
  • delete
  • show
  • version

@philrhurst philrhurst marked this pull request as ready for review August 27, 2025 16:10
Copy link
Contributor

@tjmoore4 tjmoore4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. A few questions.

- name: Sleep for 10s
try:
- sleep:
duration: 10s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Is 10s enough for these sleeps? I wonder if the step length or something else is still coming into play here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set timeouts on different steps to give longer timeouts for the confirm steps or whatever that is taking so long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I would still like to investigate getting away from Sleeps and use some form of Get/Wait or Get/Assert to give underlying Resources enough time to be created before checking if they have completed.

apiVersion: chainsaw.kyverno.io/v1alpha1
kind: Test
metadata:
name: check-backup-command-longer-options
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test feels like it's testing that we pass through the options from the command line to the spec. If that's true, I wonder if we could do this as a unit test in the future, and skip the actual backup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - I think several of the kuttl tests are candidates for other implementations. But I still wanted to migrate them to chainsaw for completeness and discussion.

Comment on lines +557 to +567
- description: Compare annotations
assert:
timeout: 30s
resource:
apiVersion: v1
kind: ConfigMap
metadata:
name: compare-annotations
data:
(key1 != key2): true
(key2 != 'not-found'): true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know you could do that with Chainsaw, that's interesting.

🤔 I wonder what a step would look like that just compared the two strings in bash...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that would be another way to do it - using ConfigMaps to pass test data around is an interesting paradigm I've seen used with Chainsaw. I wrote this test this way to learn more about it. Open to changing the format though.

Copy link
Collaborator

@benjaminjb benjaminjb Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you like it better this way?

I'm open to either -- the cm way was a bit of a shock, but happy to use it going forward. But I'd like being a little consistent if we can. I wonder if there's anything we can't do with the CM version or if the bash version looks messier

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of like this way because it makes the assertion more clear (it's not in a bash script...although this would be a simple one).

apiVersion: chainsaw.kyverno.io/v1alpha1
kind: Test
metadata:
name: check-backup-command-no-flags
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this test demonstrating?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the conversion of 08--backup-with-just-trigger.yaml which is triggering a backup from the PGO CLI with no options and checking the annotations

use:
template: '../templates/confirm-created.yaml'

- name: Sleep for 30s
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need these sleep steps, I think it would be fine to name them all Sleep or something without the time -- it's easy enough to look at the duration to see how long and it avoids drift when we change those durations.

More interesting to me would be why we chose those durations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the Sleep 30s started at Sleep 10. I bumped them up because running all the test simultaneously was causing issues. I left the descriptive "Sleep for 30s" as I was debugging the tests to know how much time I should wait before the next step.

apiVersion: chainsaw.kyverno.io/v1alpha1
kind: Test
metadata:
name: check-restore-confirm-yes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be faster to create one cluster and run several tests on that -- restore without a name, restore no, restore yes...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is, but each "test" would then be a separate "step". It becomes an issue of clarity and preference. I.e. do we have a single Restore Test that does multiple things. Or multiple Restore Tests, each doing one single thing.

One thing I like about Chainsaw is that each test is namespace-scoped. So it's easy to separate out tests into discrete units.

(contains($stderr, 'Apply failed')): true
(contains($stderr, '2 conflicts')): true

- name: run 'restore' with confirm 'yes' and --force-conflicts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you break this up into a few steps? Maybe

  • set the annotation
  • get the annotation (maybe the same step as the one above)
  • issue command
  • check annotation differs
    ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I will look into this. I think I cribbed more of the original kuttl test on this particular one than the others.


- name: 'Confirm cluster is created'
use:
template: '../templates/confirm-created.yaml'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea is that we can use the same template here because the command did nothing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's correct

- sleep:
duration: 30s

- name: run 'stop cluster' with confirm 'y' and force-conflicts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the start test runs the stop command, I think we can collapse this into one test, yeah?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If memory serves, there are some interesting behaviors that happen when you try to stop a cluster that was created from a manifest rather than the PGO CLI. I think the --force-conflicts is really getting tested here to make sure the cluster will stop and then be started.

The original kuttl test (and this conversion) may be worth revisiting.

name: version
spec:
steps:
- name: step-00
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get a more descriptive step name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure :)

verbs:
- create
- update
- delete
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why delete?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was having some issues getting it to work. This Github comment suggested delete.

The way we defined the selector for backups was quite terse. This refactor simplifies the logic.
Copy link
Contributor

@tjmoore4 tjmoore4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates look good to me. Not sure if @benjaminjb is still tracking any blockers, but I think this is good to merge.

Comment on lines +10 to +11
- name: postgresVersion
value: 16
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have a central source for pg version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we do; but the bash for this particular test has already hard-coded 16 in it. I didn't want to refactor to interrupt migrating the tests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is not a blocker for me; I just hope when we're testing in the far future, when 16 is a problem or dropped, we remember to change it here too (if we're not pulling from that central location)

metadata:
name: kuttl-support-monitoring-cluster
spec:
postgresVersion: 16
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and if we have a central location for the pg version (or even a local version), better to use it

metadata:
name: kuttl-support-instrumentation
spec:
postgresVersion: 16
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

requests:
storage: 1Gi
replicas: 1
postgresVersion: 16
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Collaborator

@benjaminjb benjaminjb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really no blockers, but re-reading these tests -- many of which I probably had a hand in -- makes me want to revise these tests. Please make a ticket so we don't forget to!

(What I really want is to shift tests left, e.g., run as much as we can as unit tests) and also combine a few of these tests if we can to run them faster. I know I have created a few tickets re: separating out the cluster work from some CLI work, so it won't just be test changes I'm envisioning in the future.)

@philrhurst philrhurst merged commit db3197c into CrunchyData:main Sep 19, 2025
12 checks passed
@philrhurst philrhurst deleted the dev branch September 19, 2025 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants